Advanced Bovine Depilation

This article was published 1 year, 3 months ago. Due to the rapidly evolving nature of web development, some concepts may no longer be applicable.

I hope you packed some snacks. This is going to be a bit of a long trip…

The Programmer’s ABCs

Over the past year, I’ve been fortunate to have the opportunity to give a handful of talks at various conferences. While each talk has been different, they have all followed a general theme of diving deep, really deep, to look under the covers at how the software we write actually works. If we happen to fix some problems along the way, all the better!

Most recently, I just returned from Lyon where I gave one of these talks at the wonderful RuLu Conference. Afterward, talking to some of the conference participants, there was a familiar refrain to the conversation. Many of the questions I was asked could best be summarized as: “How can I learn to dive deep into code like that?”

As an answer, I had a slide that listed the programmer’s ABCs: Always Be Curious. When others consider a ticket finished because a test passes or a requirement is met, that’s when you should start asking, “…but why does this work that way?” When everyone else is packing up their bags and heading home for the weekend, that’s your cue that it’s time for some exploration.

Well, that’s a nice platitude and all, but what does this look like in practice? How does one turn curiosity into learning? A few weekends ago, I set off on yet another dive deep into Ruby, far deeper than was necessary, far deeper than was probably sane. I came up having learned something useful and new, though there was no way I could have guessed that I would learn what I did.

So, if you will allow, I’d like to present the story of that weekend adventure as an extended answer to the question “How do I get good at understanding low level topics in Ruby (and programming in general)?”

Down the Rabbit Hole

This story begins, as so many do, with a relatively simple bug. In a recent project we built using Padrino and Sequel, it was discovered that a failure to send a welcome email was preventing new user accounts from being saved. I was quickly able to trace the issue to a mailer being delivered synchronously within a model’s after_save callback. In the event that delivery failed, this setup was causing the entire save operation to fail.

Ideally, email delivery would be handled asynchronously and failures would result in a retry after some delay. At the stage this project was at, we were not ready to bring in an asynchronous worker solution, and we didn’t mind if some emails were never delivered. What we couldn’t tolerate was the failure to save a newly created account. The simple solution, then, was to rescue any delivery failures, log them, and ensure that the after_save callback returned true regardless of delivery success. Oh, we also added a ticket to our backlog reminding us to review mail delivery and asynchronous workers at a later date. Problem solved!

Wait…two paragraphs in and the bug is solved? Yup! That’s usually how it begins. As it happens, this was the last bug on my plate on a Friday, and since the “simple solution” was quickly put in place and tested, I had a little extra time on my hands. I decided to take this time to dive a bit deeper into mail delivery. “Perhaps,” I pondered, “the mailer libraries in Padrino have an alternative, more elegant solution for this problem?”

Using bundle open, via the excellent vim-bundler plugin for Vim, I started by inspecting the source for the padrino-mailer gem. I guess this would be a good time to mention that before you even worry about being curious, you should get into the habit of reading code. At the very least, you should be reading more code than you write on a daily basis. Getting good at the practice of reading code will make the sort of deep dive on which we’re about to embark that much faster and easier.

Starting with the lib/padrino-mailer/helpers.rb file, I found the deliver method that extracts a message from a collection of registered mailers. Eventually, the message is called and the deliver method is invoked on the object returned. What are the contents of registered_mailers? Before leaving the helpers file, I took one last look at the mailer method that is used to create mailers for Padrino, and I noticed that this method populates registered_mailers by instantiating Padrino::Mailer::Base objects.

The definition of Padrino::Mailer::Base is in lib/padrino-mailer/base.rb, where I found something that looks like a DSL. The initializer ends with an instance_eval, and indeed the contents of our Padrino mailer contain calls to an email method. This email method creates a Ruby Proc that instantiates a Mail::Message object and returns it. So, looking back two paragraphs, we see that Padrino’s deliver method ultimately turns into a call to Mail::Message#deliver.

Moving along just a tad quicker now, I again “bundle open“-ed the mail gem and found the lib/mail/message.rb file where the deliver method is defined. There’s some error checking, some triggering of callbacks, but it all wraps up with a call to do_delivery where I discovered that a delivery_method is doing the actual delivering. The delivery_method method is a clever combined-getter/setter, but the punchline is actually in the lib/mail/configuration.rb file where the Mail::Configuration#lookup_delivery_method method chooses what concrete implementation is actually going to be delivering our message.

Here’s the complete contents of that method:

def lookup_delivery_method(method)
  case method
  when nil
  when :smtp
  when :sendmail
  when :exim
  when :file
  when :smtp_connection
  when :test

At this point, two things are apparent. First, none of these options look like they will provide an out-of-the box “async send” mechanism that might allow me to forgo having to implement a background worker. Second, that’s a long case statement!

A Challenge Appears!

I don’t know about you, but as soon as I saw that case statement my controversy alarm went off. The task of turning one value into another based on a series of rules can be handled a few different ways. Using case, as here, is the traditional procedural solution. A more object oriented solution would involve creating a class that encapsulates the lookup logic. Finally, the functional solution would involve looking up values in a map using fetch like so:

def lookup_delivery_method(method)
    nil => Mail::SMTP,
    :smtp => Mail::SMTP,
    :sendmail => Mail::Sendmail,
    :exim => Mail::Exim,
    :file => Mail::FileDelivery,
    :smtp_connection => Mail::SMTPConnection,
    :test => Mail::TestMailer
  }.fetch(method, method)

If you’re interested in more of the discussion around this topic, I’d recommend Jeff Kreeftmeijer’s recent review on the death of “ifs” (because, really, case is just a convenience around if).

In this case, though, I was less interested in which solution was more elegant, more object oriented, or more understandable. Rather, I was curious which one would be faster. Aside from being more than a little obsessed with performance, I was curious because, while style is subjective, benchmark results are hard to argue with (and somewhat less prone to bike-shedding). So, I constructed a simple benchmark script:

require 'benchmark'

class TestBench
  VALUES = { first: "A",
             second: "B",
             third: "C",
             fourth: "D" }

  def check_with_map(val)
    VALUES.fetch(val, "not found")

  def check_with_case(val)
    case val
    when :first
    when :second
    when :third
    when :fourth
      "not found"

Benchmark.bmbm do |x|
  t =
  vals = %w|first second third fourth fifth|.map(&:to_sym)"With Map") do
    1_000_000.times { t.check_with_map(vals.sample) }
  end"With Case") do
    1000000.times { t.check_with_case(vals.sample) }

Before even running the benchmark, it is possible to make an educated guess as to which mechanism will be quicker. Looking up a value in a hash is a constant-time operation. That is, given a key, to look up a corresponding value in a hash table the key’s hash value is calculated, and then the table is checked for any values at the corresponding position in the table. Typically, there will be a final equality check between the key given and the key in the table to protect against the possibility of hash collisions.

On the other hand, the case statement in Ruby utilizes the === operator of each item given in the corresponding when clauses, in turn, checking to see if the value passed to the case matches. Obviously, this seems a lot slower than looking up a key in a hash. In particular, in the situation where the item being checked does not match one of the keys in the hash or one of the when clauses in a case, falling through to the default should be much quicker with a hash (which simply has to check that the item’s hash value is not present in the table) than with a case (which still must check against every when clause first).

So, having considered the benchmark, the only thing left to do is run it…

> ruby case_bench.rb 
Rehearsal ---------------------------------------------
With Map    0.420000   0.000000   0.420000 (  0.423642)
With Case   0.340000   0.000000   0.340000 (  0.340239)
------------------------------------ total: 0.760000sec

                user     system      total        real
With Map    0.420000   0.000000   0.420000 (  0.423254)
With Case   0.350000   0.000000   0.350000 (  0.341927)

Well…that was unexpected! Despite what a logical analysis of the code suggested, Ruby’s case beats out a hash lookup. We could do more with benchmarking to explore what’s happening here, and indeed I would suggest that you do (for example, how does a chain of ifs using === compare to case?). In the interest of time, I’m going to gloss over that for now.

Eventually, if we really want to understand why case in Ruby is faster than we might expect, we will have to look at how it is implemented. With Ruby, understanding methods on objects is usually a straight-forward affair, relatively speaking, as there is almost always a 1-to-1 correspondance between a method on a Ruby object and the function in the MRI C source that implements it (that is, when the method is not written in Ruby itself). Unfortunately, case is a keyword in Ruby, so to understand how it works we’ll have to go to…THE PARSER

“Why,” you might wonder, “did he capitalize ‘THE PARSER’?” I assure you, dear reader, I have done so to emphasize that, of all the files in MRI’s source, THE PARSER is perhaps the most dreaded, the most ominous. It is the one that strikes fear into the hearts of Ruby implementers everywhere.

It’s also not all that hard to grasp once you look at it the right way…


Now, if you’re browsing through the Ruby source, you might notice the conspicuous absence of any files named “THE PARSER” or even “the_parser.c”. You will find parse.y, but what the heck is a “.y” file? Ruby uses a tool known as YACC to compile this parse.y file into a corresponding C file that can then be used to parse Ruby source. The reason for this indirection is that it allows parse.y to be written in such a way that makes the grammar of Ruby easier to understand. In short, this machine translation solution is what one did before one had DSLs to solve all their problems.

Cracking open parse.y, the first thing you might notice is that it is over 11,000 lines long. You might also notice that it doesn’t look anything like any programming language you might be familiar with. Don’t panic! We’ll start out small, looking for familiar sign-posts, until we get our footing. The first sign-post we want to look for this time is case. Searching through the source, we find the first occurrence of “case” on line 642 in the form of keyword_case. (Note: I’m using the version of parse.y from Ruby v1.9.3-p429, since the line numbers will change slightly from version to version.)

At this point, it’s worth noting that Ruby’s parser works in tandem with a lexer, and it is this lexer that turns the string “case” in a Ruby source file into keyword_case. Even if you didn’t know that, though, you might notice that the neighborhood of line 642 is littered with other familiar-looking terms such as keyword_if and keyword_else. Really, it’s not surprising that parse.y would use something like keyword_case to refer to Ruby’s “case”, since C also has “case” as a keyword. If we didn’t distinguish between these two somehow, things could get (even more) confusing.

Searching parse.y now for “keyword_case”, we see that there are only 4 occurrences, significantly less than the 482 occurrences of “case”. Let’s consider each in turn. The first occurrence is the one we’ve seen just above, where keyword_case is listed along with all the other Ruby keywords in parse.y’s list of tokens. The next place it occurs, on line 1890, is in the definition for the rule “reswords”.

A full explanation of how grammars and parser generators work is outside the scope of this post (yes, there are some things I won’t cover here), but suffice to say that rules are constructed of a label, followed by a colon, followed by alternatives separated by pipes. The alternatives themselves can be tokens or other rules. Additionally, some rules are defined just for later use, while some include instructions on functions to call when the parser finds an occurrence of the rule. The “reswords” rule that we just saw is an example of the former.

Looking at the next occurrence of keyword_case we find an example of a rule that does something as well. The full definition is:

k_case : keyword_case

This does two things. First, it defines a rule called “k_case” which is just equivalent to “keyword_case”. Second, whenever keyword_case is encountered, and this rule is evaluated, we will call a C function named “token_info_push” passing it the C string “case”. Let’s keep this in mind as we peek at the last occurrence of keyword_case

That final occurrence comes on line 10,489 where we find the definition of a struct called keyword_to_name, which appears to be where parse.y is informing the lexer that the string “case” in Ruby source is, indeed, equivalent to keyword_case, just as we had surmised.

On Nodes Not Named JS

From searching for “case” and then “keyword_case”, we found two new rules defined in terms of keyword_case that could lead us to understanding how Ruby’s case statements got so fast. The first, reswords, doesn’t look all that interesting (if you don’t believe me, please do search the parse.y source for more instances of “reswords”). The second, k_case, occurs two more times in the source for parse.y, on line 2842 and line 2853. We don’t have to pick between these two for our next hop deeper, though, because they both contain calls to the same thing: NEW_CASE.

Ah! Now we’re getting somewhere. But where is this NEW_CASE defined? Having an editor or IDE with the ability to search across files in a project (or just using grep to the same effect) is very handy on these sorts of adventures. Using such a tool, we find that NEW_CASE is a preprocessor macro defined in node.h and that it, in turn, expands into a call to NEW_NODE, a preprocessor macro defined just nine lines above.

This final macro is going to call to a C function (or so it seems…more on this in a moment) named rb_node_newnode. So where is rb_node_newnode defined? Well, with a little bit of hunting we find one place it’s defined as a C function in gc.c on line 1227, but then all the way back in parse.y we find where it is also defined as a preprocessor macro on line 328. The preprocessor macro in parse.y refers to a function named node_newnode, which just so happens to also be defined in parse.y. Here it is in its entirety:

static NODE*
node_newnode(struct parser_params *parser, enum node_type type, VALUE a0, VALUE a1, VALUE a2)
    NODE *n = (rb_node_newnode)(type, a0, a1, a2);
    nd_set_line(n, ruby_sourceline);
    return n;

Wha? How? Uh…Is This What Learning Feels Like?

In short: yes.

Did you see that? Right there in the definition of node_newnode is a reference to rb_node_newnode, the very thing we were looking for a definition of, except now it’s surrounded with parentheses. What’s up with that?

What indeed? That’s the very question I asked myself when I got to this line. It helps a bit if you know about function pointers in C. Really, all you need to know about them is that they are functions that can be passed around as values, just like functions in JavaScript or Python can be (but not quite the way that detached methods work in Ruby). So, for example, if you had a function find_my_func() that returned a function pointer, and the function it pointed to took two arguments, you could write (in C): (find_my_func())(first, second);. This would get the function pointer, then call the function it points to.

That’s not quite what we have here, though. Instead of calling something that gives us a function pointer, we’re wrapping the name of the function itself in parentheses. Why not, then, just call it directly like so: rb_node_newnode(type, a0, a1, a2);? The problem with this, is that we have a preprocessor macro named “rb_node_newnode”. In C, the preprocessor runs before any code compilation happens, and it works much the same as straight string substitution, so calling the method directly would turn into node_newnode(type, a0, a1, a2); giving us a recursive call into the same method and, eventually, a stack overflow.

After staring at this bit of code for an uncomfortably long while, I thought I had figured it out, but just to prove it to myself, I wrote a little C program to test my assumptions:

#include <stdio.h>

void greet(char* name)
  printf("Hello, %s!\n", name);

#define greet(name) wrapper(name)

void wrapper(char* name)
  printf("Wrapping a method...\n");
  printf(" so much fun!\n");

int main(int argc, char* argv[])
  return 0;

Compiling and running this was all I needed to confirm my hunch…

> clang hunch.c -o hunch
> ./hunch
Wrapping a method...
Hello, Josh! so much fun!

What’s happening is that, by surrounding the name of the function to call in parentheses, we’re avoiding the preprocessor from doing its normal substitution. This allows us to create a preprocessor macro with the same name as another function, redirect it to a second function, and then within that second function call the original function. In essence: a function decorator in C!

The one caveat to this mechanism is that the preprocessor macro must be defined after the function we’re wrapping, or in another source file all together. If not, then the preprocessor will do its substitution in the function declaration, yielding confusing, cryptic compilation errors. That said, this is a neat little trick in C, and not anything that I was ever taught or would even know to look up in any reference manual anywhere.

*bzzzt* Times Up!

…and, that’s it! As I said, this was a weekend diversion for me, and this is as far as I got by bedtime on Sunday. Before we wrap up, though, let’s take a moment to reflect on the path we just traveled.

We started with a rather innocuous bug, simply solved. From there, we got curious about alternative solutions to the same bug. It took a bit of code diving to determine that there wasn’t, in fact, an easier solution, but along the way we managed to learn something about how mail delivery works in Padrino (information that will likely come in useful later). Not stopping there, we got curious about an interesting use of case in Ruby, and attempted to learn something new about case. While we still can’t say why case is so fast, knowing that it is may prove useful down the line when choosing between a case statement or some other alternative. Finally, in our effort to learn something more about case, we instead learned a neat trick in C.

One thing you should take away from this little adventure is that directed learning can only get you so far. At some point, you have to put down the text books, forget the course curriculum, and go at it on your own. Sometimes, undirected exploration can be far more fruitful than any lesson plan. That’s not to say that you have to figure everything out on your own. IRC, StackOverflow, other forums, and even various books can be a great help along the way. In this particular case, if you’re interested in the sort of underpinnings of Ruby that we explored here, I would recommend Pat Shaughnessy’s Ruby Under a Microscope.

Another thing to take from all of this is that what started as a bug, then a question, led to more questions and ultimately a number of those questions remain unresolved. Rather than view this as a failure, these unresolved questions should prove to be fertile fodder for future learning. All that you need to do is decide where you want to focus. Maybe you will be motivated to create an extension to Padrino’s mailers that provides asynchronous sending out of the box. Maybe you’ll look at why case is so fast, or why hash lookup in Ruby is so slow, and figure out how to give Ruby even more speed boosts. Maybe you’ll stare even deeper into the dark heart of THE PARSER and be motivated to write a Ruby implementation, or whole new programming language, of your own.

The sky’s the limit…

2 Responses to “Advanced Bovine Depilation”

  1. Warbo

    Just a couple of comments; computational complexity (ie. maps being constant and case being linear) only tells us about the scaling behaviour, not about the absolute performance. A useful test would be to compare the implementations on a bigger and bigger set of cases.

    Also, I’m not a Ruby programmer but the use of “case” you criticised for being procedural looks just as functional as your map-lookup. Functional and procedural are not mutually-exclusive (most code is probably both), but functional and *imperative* usually are.

    Functional code is characterised by its absence of statements; it’s been referred to as “value-oriented”, which explains its fondness for anonymous functions (functions as values).

    Imperative code is characterised by its absence of expressions. Completely imperative code (where everything is done via commands) is only really found in assembly these days, and even that will contain some expressions (eg. constant values). It could be argued that FORTH is completely imperative (but compare it to Joy, which is purely functional!).

    The way “case” is being used above, it is choosing between expressions. It looks somewhat like the following pseudocode, which is functional (chooses between expressions for bar):

    bar = switch (foo) {
    case a: x;
    case b: y;
    case c: z;

    It does not look like the following pseudocode, which is imperative (chooses between statements which will assign bar):

    switch (foo) {
    case a: bar = x;
    case b: bar = y;
    case c: bar = z;

    Note that assignment itself doesn’t make something imperative. Single-assignment code is not imperative, it’s just less heavily nested than otherwise. I’d guess that most programmers follow a loose form of single-assignment, except when they can’t (eg. loop variables).

    1. Joshua Ballanco

      Regarding the complexity of hash lookup, you’re absolutely right. In fact, in my own benchmarking I tried for a number of different values of N, but in all cases (up to 50 hash members/when clauses), “case” still came out ahead. I even artificially handicapped the benchmark so that all invocations would fall through to the default case, and still “case” came out ahead.

      If you do benchmark “case” versus chained “if/elsif/else” statements, you’ll find that the later ends up closer to the runtime of the hash solution. So I suspect that the culprit is actually Ruby’s dog-slow method dispatch, and that the compilation of “case” in Ruby short-circuits this.

      As for functional vs. procedural vs. imperative, you’re correct that “imperative” was probably the better term for what I was describing. As for which style is better…well, I intentionally didn’t make any value judgments on style in the post, and I’m not about to start now.