Mongrel memory usage

We run around 100 mongrels between all our servers. 90% of them serve hundreds of XML requests a second, around the clock. We’ve been following the ’scale-out’ methodology, but it can only go so far before you have to look at squeezing more performance out of your hardware and software before buying that next $3,000 server.

The problem we were having was, without a restart, 5 mongrels will eat all the memory (2GB) on any of our quad-core, 64bit servers in about 30 hours. Note that this is not necessarily a generalization about mongrel but in my current application of it. The mongrels in question are running rails code that creates around 1GB of text in the production.log per hour. The bigger the production.log gets, the more memory mongrel/rails eats. Clear the log, mongrel gets back its memory - no restart needed. This doesn’t happen on our front-end GUI mongrels, just the ones creating all the logs.
Graph of available memory over the course of 1 day (Ruby 1.8.4 / Mongrel 1.1.1)

Rails production.log grows to around 1GB in size. production.log only truncated at 1AM:

free_mem_ruby184_mongrel111-no_rolloff.png

Rails production.log truncated every hour:

free_mem_ruby184_mongrel111-with_rolloff.png

My first thought was that it was something in my code that was obviously leaking memory. That was until I realized that just truncating the production.log freed the memory. That leads me to believe the issue is either in rails and/or the Logger class.

UPDATE (2008/03/20): Tom Werner discovered leaks in Logger

I’ve tried different versions of things: the situation looks only slightly better on ruby 1.8.6 than 1.8.4. I’ve also tried older versions of mongrel without any luck. I recently added the rails patches by the guys at Pluron which helped out quite a bit with CPU usage (25% drop!) but not much with memory.

I’m mostly posting this for historical purposes. I hope to have some time soon to get down and dirty with Valgrind or something similar to see where the memory is really going. Hopefully I’ll have an update soon about how I solved the problem! Any insight is welcome.