But today I stumbled onto an interesting, tricky bug, which exemplifies one of the downsides of multithreaded programming.
Background
I had created a new worker that used the mechanize gem for webscraping. The worker was complicated and used several different classes to get the work done. I had to
require "mechanize"
in a few different files, mainly so I could reference Mechanize::Error in a couple of exception handlers. This was super well-tested code that worked great on my dev machine, but things went to hell in production.The Bug
This worker would just get stuck with zero information in the log files - the whole thread would just deadlock. Sidekiq has a TTIN signal handler that helps you figure out where your code is stuck, but unfortunately the workers run on Heroku, and Heroku does not let you send arbitrary signals to your processes, so I couldn't use it. Instead I had to insert a bunch of logging probes in my code to see exactly what line of code was causing things to freeze.
It turns out my code was freezing on a
require
statement, where I required the first class which required the mechanize gem. I remembered that in Ruby require is not atomic, so I was able to zero in on the problem.The Solution
Once I moved the
require "mechanize"
statement into an initialization step, before my workers were loaded, everything performed beautifully.Lesson Learned
Quoting this Stack Overflow answer, because of the potential for
require
to cause deadlocks like this:"require everything you need before starting a thread if there's any potential for deadlock in your app."
4 comments:
What do you mean by 'moved the require "mechanize" statement into an initialization step, before my workers were loaded'? I actually building a web crawler too and I'm interested in this matter.
Hi Luis, what I mean is when I run sidekiq, I make it require an init file which includes "require 'mechanize'". This forces Sidekiq to include the library before threading.
I read about the "thread_safe!" configuration option and started using that because I had problems with AR classes being halfway-loaded when another thread called into them. Do you prefer adding 'require' commands to your initializers to eager-load classes over using 'thread_safe!' and, if so, why?
Post a Comment