Memory leak in Ruby 1.8.6 String class

So I ran into this memory leak about a year ago but had forgotten about it so I’m going to document it in the hopes I check my own blog next time I find it. I came across the leak again when looking into why one of my apps was getting so bloated after running for a few days. The leak has been reported in numerous places but doesn’t appear to have been patched in 1.8.6.

This should demonstrate it:

# Just a helper method to show the memory usage output
# @NOTE: Won't work on Windows
def log
  leak='fix'
  ps = %x(ps u -p #{Process.pid}).strip.split(/\n/).last.split(/\s+/)
  puts "#{ps[4]}     #{ps[5]}"
end
 
# This leaks memory
def bad
  "ruby+memory+leak".split('+')
end
 
# Defining a variable before the String#split
# fixes the leak
def good
  rm = '+'
  "ruby+memory+leak".split(rm)
end
 
 
puts "VSZ       RSS"
500_000.times do |i|
  good
  log if i%100000 == 0 
end
 
puts "\nWatch me leak!"
500_000.times do |i|
  bad
  log if i%100000 == 0 
end

So the moral of the story is, make sure to define a variable in methods that use String#split, String#gsub and the like. This doesn’t leak in ruby 1.8.4. I haven’t checked 1.9, but I’ve heard it’s fixed there too.
Here is a more complete script demonstrating the issue with both class and instance methods (since some reporters have mentioned it being strictly a class method problem).

ThinkingSphinx Rails plugin fork

I’ve been working on a fork of the ThinkingSphinx Rails plugin that has now been in production for over a week without issue.  The Freelancing Gods have done a great job writing easy-to-read code that’s fairly easy to extend.  Most areas of the code are solid, but we needed to squeeze more out of the plugin to be able to use it on our environment.  In Validclick, we deal with a massive database of keywords that need indexing very frequently.  The table in question is nearly 50mil rows and replicated out to dozens of MySQL servers, therefore schema changes are best avoided.

The original delta indexing method used in the TS plugin requires you add a tinyint column on all tables requiring delta indexing.  The code also automatically fires the reindexing process on each model update.  To address these 2 issues, I forked the original plugin and came up with some solutions.  So far they’ve worked great in production, your mileage may vary.

Complex Delta’ing

For the column issue, I stole some ideas from Evan Weaver’s UltraSphinx plugin.  While Evan has definitely contributed some amazing code (much of which I use daily), trying to extend his Sphinx plugin was a nightmare.  Maybe he’s just too smart for me, but that code gives me a headache when I read it!  The one great thing about his plugin is that you can specify a delta on any column, instead of just a boolean.  I stole that idea and put it in TS.  So now, instead of adding a tinyint/boolean column to your table and adding this in your model:

class User < ActiveRecord::Base
  define_index do
    indexes :name
    set_property :delta
  end
end

You could do this, which will use the table’s existing updated_on field to create the delta index of anything that has been changed in the past day (requiring no altering of your table):

class User < ActiveRecord::Base
  define_index do
    indexes :name
    set_property :delta => {:field => :updated_on, :threshold => 1.day}
  end
end

The code only supports datetime fields now, but that could likely be extended. The original boolean-based deltas are still supported.

Offline Indexing

Having the Sphinx indexing process fire off on each record update just doesn’t scale.  If you have an API or a multi-record edit interface, you could ostensibly incur thousands of (nearly) simultaneous updates which would obviously cause issue on your Sphinx server.
Even just simply having multiple mongrel process and (even worse) multiple servers running could cause indexing collisions or high server load.  So I added configuration setting that allows you to override the default delta indexing functionality.  Just slap this in your environment.rb and TS won’t call the indexer every time a model is changed:

ThinkingSphinx.offline_indexing = true

Note that this requires you to run the meta indexing on your own.  We run it through cron every 20 minutes.

Extra rake tasks

Another snag from the UltraSphinx codebase was some rake tasks.  They’re a bit different than in US, but the same general idea:

rake thinking_sphinx:index         # Index data for all or 1 Sphinx indexes.
rake thinking_sphinx:index:all     # Index data for all Sphinx indexes.
rake thinking_sphinx:index:delta   # Index data for all or 1 Sphinx deltas.
rake thinking_sphinx:index:merge   # Merges the core and delta indexes for all or 1 Sphinx deltas.

All of the tasks except for index:all allow you to set a MODEL environment variable to denote which model you want to operate on.  The default is all pertinent indexes.  To reindex just the Account data, you could issue the following:

MODEL=account rake thinking_sphinx:index

I haven’t found any bugs yet, but there is surely room for improvement in the code.  I haven’t pulled in all of Freelancing Gods’ latest changes, so that’s definitely on the list.  If you feel like following or contributing, my fork is on GitHub:

git clone git://github.com/bassnode/thinking-sphinx.git

Mongrel memory usage

We run around 100 mongrels between all our servers. 90% of them serve hundreds of XML requests a second, around the clock. We’ve been following the ’scale-out’ methodology, but it can only go so far before you have to look at squeezing more performance out of your hardware and software before buying that next $3,000 server.

The problem we were having was, without a restart, 5 mongrels will eat all the memory (2GB) on any of our quad-core, 64bit servers in about 30 hours. Note that this is not necessarily a generalization about mongrel but in my current application of it. The mongrels in question are running rails code that creates around 1GB of text in the production.log per hour. The bigger the production.log gets, the more memory mongrel/rails eats. Clear the log, mongrel gets back its memory - no restart needed. This doesn’t happen on our front-end GUI mongrels, just the ones creating all the logs.
Graph of available memory over the course of 1 day (Ruby 1.8.4 / Mongrel 1.1.1)

Rails production.log grows to around 1GB in size. production.log only truncated at 1AM:

free_mem_ruby184_mongrel111-no_rolloff.png

Rails production.log truncated every hour:

free_mem_ruby184_mongrel111-with_rolloff.png

My first thought was that it was something in my code that was obviously leaking memory. That was until I realized that just truncating the production.log freed the memory. That leads me to believe the issue is either in rails and/or the Logger class.

UPDATE (2008/03/20): Tom Werner discovered leaks in Logger

I’ve tried different versions of things: the situation looks only slightly better on ruby 1.8.6 than 1.8.4. I’ve also tried older versions of mongrel without any luck. I recently added the rails patches by the guys at Pluron which helped out quite a bit with CPU usage (25% drop!) but not much with memory.

I’m mostly posting this for historical purposes. I hope to have some time soon to get down and dirty with Valgrind or something similar to see where the memory is really going. Hopefully I’ll have an update soon about how I solved the problem! Any insight is welcome.

I am a workaholic

The passionate worker doesn’t show up because she’s afraid of getting in trouble, she shows up because it’s a hobby that pays.
I agree completely.  I try to explain this to my wife and she’s hard-pressed to understand.  But it’s true - I love what I do.

An intro of sorts…

This blog has more or less languished over the past couple years.  I never really got into the daily blogging rhythm, so there’s not much here to speak of.  That said, I’ve obtained a lot of experience and learned a lot over the past 2 years at my current job.  I figure I might as well try to blog about it since it could prove useful to someone else.  And, if not, it will at least be a way for me to supplement my horrible memory.

So, a bit of history.  Over the past (almost) 2 years, I’ve worked for Litmus Media as a ruby developer.  We are a web company specializing in click fraud protection.  We were acquired by Think Partnership about the time I came onboard and not much has changed.  I was brought on to write an web-based advertising platform that would allow advertisers to create marketing campaigns to be distributed through our click-protected network.  The platform is called DirectAds and includes a web UI, XML service (for serving the ads) and click processor. 

I choose the Ruby on Rails framework for various reasons.  The biggest reason was that I was tired of my current tools (perl, PHP, and tons of scattered libraries) and Rails looked like it was sizing up to be a great framework.  Plus, ruby just looked so sexy - how could I deny it?  So, 2 years later I can’t complain at all.  Although, like I mentioned, there were certainly hurdles to overcome when using this new, fancy-pants technology.  I plan to expand on my experiences in future posts.