How to prevent database contention in continuous integration

We’ve used a few different continuous integration stacks for Rails over the last year at work—first CruiseControl.rb, which we found a little too complex to administer, then a custom bash script (which worked well, but took a lot of tweaking to get just right). When we eventually switched to git last year, we took the opportunity to try Integrity, a cute lil’ Sinatra app.

Integrity mostly “just works,” and it’s been a happy switch. One thing we lost in the move, though, was code that protected against resource contention when two builds are running at once. This is definitely a problem with Rails, since a typical database.yml tells Active Record to use the same database for all test runs. So you’ve got multiple builds hitting the database at once, dropping tables, creating records, and so on. Yikes.

Our old bash script used a filesystem lock and a queue to only run one build at a time, in order. In theory, this is the most sound approach, but hey—our build server has 8 cores and 16 GB of RAM, plenty of room for parallelism. During a pair-programming session this week with Jared Grippe, we decided that the best approach is to solve the contention issues and allow multiple simultaneous builds. We figured that’d keep the rapid feedback up-to-speed when the commits are flying in.

  1. Stop putting code in that little “build script” box in Integrity’s configuration page. Instead, drop it in a rake task, so it’s versioned and kept safe.
  2. In your build script, set a unique database name for the current build, and use ERB in the server’s database.yml to interpolate it in.
  3. Have the build script run rake db:create and rake db:drop, so that your databases are created and cleaned-up automatically.

Here’s an example script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
desc "Run continuous integration suite"
task :build do
  ENV["RAILS_ENV"] = RAILS_ENV = "test"

  # use ERB in config/database.yml to make this the database name:
  # database: <%= ENV["DB_NAME"] %>
  now = Time.now.utc
  identifier = "#{Process.pid}#{now.to_i}#{now.usec}"
  ENV["DB_NAME"] = "myapp_test_#{identifier}"

  begin
    Rake::Task["db:create"].invoke
    Rake::Task["db:test:load"].invoke
    Rake::Task["default"].invoke
  ensure
    Rake::Task["db:drop"].invoke
  end
end

Appending the process ID and time in microseconds to the database name is about as unique as you can get, without generating a UUID or something. Note how we wrap the build in a block, and perform db:drop in the ensure section: that way, the database is removed even if the build fails (which would normally abort your rake task).

Keep in mind that the database might not be the only shared resource used by your build—watch out for filesystem use, in particular. You can probably use a similar strategy to solve that problem.

The steamroller

One thing I do here and there at Rupture is interview Rubyists who have applied to work with us. The Mythical Man and his Month notwithstanding, we could use a few more pairs of able hands. You know, so I can work less and blog more.

Our interview process has two parts: pair programming, and then some mostly-unstructured talk about the non-code aspects of software projects. Today I had the pleasure of talking with Josh Ferguson, and we ended up in a conversation about Rails’ new direction. Josh was concerned that as Rails matures, it becomes less opinionated, and therefore less attractive to the people who identified with the movement in the first place.

I’m still thinking about that, but the whole thing reminded me of a note I submitted to Obie Fernandez for inclusion in The Rails Way. It’s been more than a year since his book was published, and I never got around to cross-posting here… Obie had asked for contributors to answer the question: “what does The Rails Way mean to you?”

There’s a line at the beginning of the SICP lectures—a famous course on functional programming—where Harold Abelson is defining “computer science” for his class:

“Computer science” is a terrible name for this business. First of all, it’s not a science; it might be engineering, or it might be art… It’s also not very much about computers, in the same sense that physics is not really about particle accelerators, and biology is not really about microscopes and Petri dishes, and geometry is not really about using surveying instruments.

The reason that we think that computer science is about computers is pretty much the same reason that the Egyptians thought that measuring their plots after the flooding of the Nile was about surveying instruments, and not geometry: when some field is just getting started, and you don’t really understand it very well, it’s very easy to confuse the essence of what you’re doing with the tools that you use.

In the same way, my sense of “the Rails way” is not really about Rails, Ruby, or any of these new tools we’re using. Its “essence” is an uncompromising drive to optimize for productivity and happiness. It’s a hard-learned pragmatism: processes for people who refuse to solve the same problem twice, who are annoyed enough by the speed bumps their tools sometimes introduce that they happily gas up the steamroller.

A necessary corollary to the idea that Rails’ secret sauce is distinct from the code frozen to our vendor directories is that one day, a better instantiation of these practices will come along. I love Ruby, Rails, and their communities, but I know that we’ll all move on at some point. When that day comes, and some new 10-minute screencast makes us squeal like kids, we should have the sense to jump into it head-first, with the same abandon with which we dropped all that stuff we used to do for a living.

Anyway, if you’re interested in the SICP lectures, they’re pretty much canonical. (Nathan Sobo introduced me to them, and made me think about functional programming in general, when we were living in Venice. I’m a lucky guy.) A good way to get started is the set of lecture videos recorded at HP in 1986 by Ableson and Sussman, who started the course. Their SICP book is available online for free, too.

irb tip: get last returned value

Just use underscore.

1
2
3
4
>> "Chunky bacon"
=> "Chunky bacon"
>> _
=> "Chunky bacon"

The fruits of an investigation I started when Jon Baudanza at Rupture offered a brownie bite for the first person to figure it out. I’ll do pretty much anything for a brownie bite…

Update: you can also use conf.last_value, as well as its aliases context.last_value and irb_context.last_value.