Monday, August 6, 2012

Sidekiq, Ruby's non-atomic require, and multithreading

I just moved the workers for my newest project from Resque to Sidekiq, and it's working beautifully. I'm saving memory and enjoying a performance boost. This is part of my overall goal to use multi-threading in Ruby as much as possible (I'm also moving the web component of this project from Unicorn to Puma). I've never liked EventMachine and I've been great influenced to favor threads over processes and over EM by David Bryant Copeland, who gave this excellent talk about multithreaded Ruby at RubyNation 2012.

But today I stumbled onto an interesting, tricky bug, which exemplifies one of the downsides of multithreaded programming.


I had created a new worker that used the mechanize gem for webscraping. The worker was complicated and used several different classes to get the work done. I had to require "mechanize" in a few different files, mainly so I could reference Mechanize::Error in a couple of exception handlers. This was super well-tested code that worked great on my dev machine, but things went to hell in production.

The Bug

This worker would just get stuck with zero information in the log files - the whole thread would just deadlock. Sidekiq has a TTIN signal handler that helps you figure out where your code is stuck, but unfortunately the workers run on Heroku, and Heroku does not let you send arbitrary signals to your processes, so I couldn't use it. Instead I had to insert a bunch of logging probes in my code to see exactly what line of code was causing things to freeze.

It turns out my code was freezing on a require statement, where I required the first class which required the mechanize gem. I remembered that in Ruby require is not atomic, so I was able to zero in on the problem.

The Solution

Once I moved the require "mechanize" statement into an initialization step, before my workers were loaded, everything performed beautifully.

Lesson Learned

Quoting this Stack Overflow answer, because of the potential for require to cause deadlocks like this:
"require everything you need before starting a thread if there's any potential for deadlock in your app."


Luis Urraca said...

What do you mean by 'moved the require "mechanize" statement into an initialization step, before my workers were loaded'? I actually building a web crawler too and I'm interested in this matter.

Mike Subelsky said...

Hi Luis, what I mean is when I run sidekiq, I make it require an init file which includes "require 'mechanize'". This forces Sidekiq to include the library before threading.

Jack R-G said...
This comment has been removed by the author.
Jack R-G said...

I read about the "thread_safe!" configuration option and started using that because I had problems with AR classes being halfway-loaded when another thread called into them. Do you prefer adding 'require' commands to your initializers to eager-load classes over using 'thread_safe!' and, if so, why?