Mysql, OSX Leopard and Ruby

My MacBook had a blowout the other day. Sent it to Apple and they had to replace the hard drive. That said, I’ve spent most of the day reinstalling everything. I tried to use the ruby that comes with OSX, but was having too many issues so I went with a basic install from source.

The problem I had was when I tried to MySQL working. I installed MySQL via the 64-bit .pkg they have on their site. That took fine. The issue was installing the mysql ruby gem. I kept getting errors from extconf.rb about missing headers, etc. After an hour of googling and trying everything from an OSX Ports install and compiling against the 32-bit MySQL files, I finally got it.

The golden combo in the end was to install MySQL from source (mysql-5.1.33-osx10.5-x86_64.tar) but then don’t try to use the gem command to install the mysql gem. Instead, I grabbed the latest source version of the gem here and simply followed the install instructions

% ruby ./setup.rb
% ruby ./test.rb localhost root
# ruby ./install.rb

And it works!
Hopefully this helps someone else. I know I’ll probably forget and need this in a year or less…

E.V.A.C. - Dj Mix @ The Church, Denver 2002

I haven’t posted much music-related in awhile so I figured I’d get back into it with a free download.
Of the 4 people who read this blog, probably none of them know that I was in an electronic much project named E.V.A.C. The “group” consisted of me and Jeremy Goldstein. We released a few records (vinyl!) and played quite a few shows locally and nationally. Towards the end of my stint, we gladly sold our soul to some ad agencies. Hey, gotta pay the bills.

The mixset I’m posting today is from a show we played with Tipper and Salim Rafiq (DJ Wreck) when they were on their Fuel Records tour in 2002. Tipper is one of the producers we admired since Day 1, so playing alongside him was pretty outstanding. We decided that since we were playing with some fairly uncompromising musicians, that’d we’d bring The Thunder. Generally when we DJ’d a club, we would hold back somewhat when choosing which tracks to play. We definitely favored the dark, difficult and warped…and that’s not always what people want to hear after 3 Appletinis. This time though, we basically went all out.

Disclaimer: This isn’t the best DJ work we’ve done. To shorten a potentially long story, we were scolded by the promoter about 10 minutes into our set with him yelling “You guys better stop playing all that dark shit or you’ll never play here again!”. So instead of bowing to his asinine request, we decided to milk the open tab we had at the bar (that’s all we got “paid” that night) and continue to play what we planned on playing. That said, we got pretty tipsy during the set. As fun as it was, some of the mixes definitely show signs of intoxication.

At any rate, hopefully you can appreciate the music even if the technically mastery wasn’t all there.
Download the mix


Tracklisting

  1. Andrea Parker - The Swamp
  2. Carl Finlow - Hardwired
  3. Electronic Corporation - Elektronimechanik
  4. Scuba Z - Hip Bounce (BLIM rmx)
  5. Cause for Concern - Shiver (@33 RPM)
  6. Tipper - SuperSport (Barge Charge’s Super Rollers rmx)
  7. Subphonic - Vega Beach Party (Tipper rmx)
  8. Wayward Soul - Electric Man
  9. B.L.I.M. - Earth Man
  10. A.S.A.P - Heavy Water
  11. Adam Beyer - Remainings III (DK remix)
  12. Dobrag - The Counterattack Part 2 (Jaws of Mylor) Groundloop remix

Memory leak in Ruby 1.8.6 String class

So I ran into this memory leak about a year ago but had forgotten about it so I’m going to document it in the hopes I check my own blog next time I find it. I came across the leak again when looking into why one of my apps was getting so bloated after running for a few days. The leak has been reported in numerous places but doesn’t appear to have been patched in 1.8.6.

This should demonstrate it:

# Just a helper method to show the memory usage output
# @NOTE: Won't work on Windows
def log
  leak='fix'
  ps = %x(ps u -p #{Process.pid}).strip.split(/\n/).last.split(/\s+/)
  puts "#{ps[4]}     #{ps[5]}"
end
 
# This leaks memory
def bad
  "ruby+memory+leak".split('+')
end
 
# Defining a variable before the String#split
# fixes the leak
def good
  rm = '+'
  "ruby+memory+leak".split(rm)
end
 
 
puts "VSZ       RSS"
500_000.times do |i|
  good
  log if i%100000 == 0 
end
 
puts "\nWatch me leak!"
500_000.times do |i|
  bad
  log if i%100000 == 0 
end

So the moral of the story is, make sure to define a variable in methods that use String#split, String#gsub and the like. This doesn’t leak in ruby 1.8.4. I haven’t checked 1.9, but I’ve heard it’s fixed there too.
Here is a more complete script demonstrating the issue with both class and instance methods (since some reporters have mentioned it being strictly a class method problem).

Painful install of do_mysql ruby gem

I spent too long today trying to get merb running on a CentOS box because I had to jump through a lot of hoops to get the do_mysql-0.9.6 gem installed. I’m not really sure how to even sum up all the dead ends and changes I had to make, but I’ll try in the hopes of helping someone else (maybe myself in a month when I forget all this).I was trying for a long time to get the gem installed with a gem install, but ended up having to pull down the source and manually compile and install.Since we have our mysql libs in a non-standard dir on this box, I tried passing them as args to the initial install: 

gem install -- --with-mysql-include=/usr/local/src/mysql-5.0.62/include/ --with-mysql-lib=/usr/local/src/mysql-5.0.62/lib/

 but got something like this:

gcc -I. -I. -I/usr/lib/ruby/1.8/x86_64-linux -I. -DHAVE_MYSQL_H -DHAVE_MYSQL_QUERY -DHAVE_MYSQL_SSL_SET  -I/usr/include/mysql -g -pipe -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fPIC -g -O2 -Wall   -c do_mysql_ext.cdo_mysql_ext.c: In function `cConnection_initialize':do_mysql_ext.c:464: error: `MYSQL_OPT_RECONNECT' undeclared (first use in this function)do_mysql_ext.c:464: error: (Each undeclared identifier is reported only oncedo_mysql_ext.c:464: error: for each function it appears in.)make: *** [do_mysql_ext.o] Error 1

I found a few forum posts online and decided to statically set the MYSQL_OPT_RECONNECT variable on line 464 of ext/do_mysql_ext.c to TRUE which got me past that error, but no glory quite yet. I then had to edit line 46 of the generated Makefile to look in my mysql include dir since that didn’t seem to take from my initial gem call:

CPPFLAGS = -DHAVE_MYSQL_H -DHAVE_MYSQL_QUERY -DHAVE_MYSQL_SSL_SET  -I/usr/local/src/mysql-5.0.62/include -g -pipe -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing

Things seemed to be ok, but another install attempt failed because of some missing hoe tasks referenced by the Rakefile. Apparently they are only in the full data_objects code (tasks/hoe.rb)…so the next step was to pull down the complete data_objects source.A straight rake install didn’t work even after the Makefile and do_mysql_ext.c edits. I was getting this error:

undefined method `add_development_dependency' for #<gem></gem>

Thanks to this page, I discovered it was related to my “old” version of rake (1.1.1). So I upgraded to rake ver 1.3 and re-ran the install - still no go!I decided to just run each data_objects pkg one at a time. I first did this:

cd data_objects &amp;&amp; rake install

…which worked!Then:

cd ../do_mysql &amp;&amp; rake install

Which also worked!  So, a recap:

  1. Download full data_objects source
  2. Upgrade rake to at least 1.2
  3. Install just data_objects gem
  4. Try to install do_mysql, if fails, edit Makefile and do_mysql_ext.c

I know that’s pretty convoluted, so leave a comment if you’re having the same issue and want more details.

ThinkingSphinx Rails plugin fork

I’ve been working on a fork of the ThinkingSphinx Rails plugin that has now been in production for over a week without issue.  The Freelancing Gods have done a great job writing easy-to-read code that’s fairly easy to extend.  Most areas of the code are solid, but we needed to squeeze more out of the plugin to be able to use it on our environment.  In Validclick, we deal with a massive database of keywords that need indexing very frequently.  The table in question is nearly 50mil rows and replicated out to dozens of MySQL servers, therefore schema changes are best avoided.

The original delta indexing method used in the TS plugin requires you add a tinyint column on all tables requiring delta indexing.  The code also automatically fires the reindexing process on each model update.  To address these 2 issues, I forked the original plugin and came up with some solutions.  So far they’ve worked great in production, your mileage may vary.

Complex Delta’ing

For the column issue, I stole some ideas from Evan Weaver’s UltraSphinx plugin.  While Evan has definitely contributed some amazing code (much of which I use daily), trying to extend his Sphinx plugin was a nightmare.  Maybe he’s just too smart for me, but that code gives me a headache when I read it!  The one great thing about his plugin is that you can specify a delta on any column, instead of just a boolean.  I stole that idea and put it in TS.  So now, instead of adding a tinyint/boolean column to your table and adding this in your model:

class User < ActiveRecord::Base
  define_index do
    indexes :name
    set_property :delta
  end
end

You could do this, which will use the table’s existing updated_on field to create the delta index of anything that has been changed in the past day (requiring no altering of your table):

class User < ActiveRecord::Base
  define_index do
    indexes :name
    set_property :delta => {:field => :updated_on, :threshold => 1.day}
  end
end

The code only supports datetime fields now, but that could likely be extended. The original boolean-based deltas are still supported.

Offline Indexing

Having the Sphinx indexing process fire off on each record update just doesn’t scale.  If you have an API or a multi-record edit interface, you could ostensibly incur thousands of (nearly) simultaneous updates which would obviously cause issue on your Sphinx server.
Even just simply having multiple mongrel process and (even worse) multiple servers running could cause indexing collisions or high server load.  So I added configuration setting that allows you to override the default delta indexing functionality.  Just slap this in your environment.rb and TS won’t call the indexer every time a model is changed:

ThinkingSphinx.offline_indexing = true

Note that this requires you to run the meta indexing on your own.  We run it through cron every 20 minutes.

Extra rake tasks

Another snag from the UltraSphinx codebase was some rake tasks.  They’re a bit different than in US, but the same general idea:

rake thinking_sphinx:index         # Index data for all or 1 Sphinx indexes.
rake thinking_sphinx:index:all     # Index data for all Sphinx indexes.
rake thinking_sphinx:index:delta   # Index data for all or 1 Sphinx deltas.
rake thinking_sphinx:index:merge   # Merges the core and delta indexes for all or 1 Sphinx deltas.

All of the tasks except for index:all allow you to set a MODEL environment variable to denote which model you want to operate on.  The default is all pertinent indexes.  To reindex just the Account data, you could issue the following:

MODEL=account rake thinking_sphinx:index

I haven’t found any bugs yet, but there is surely room for improvement in the code.  I haven’t pulled in all of Freelancing Gods’ latest changes, so that’s definitely on the list.  If you feel like following or contributing, my fork is on GitHub:

git clone git://github.com/bassnode/thinking-sphinx.git

Mongrel memory usage

We run around 100 mongrels between all our servers. 90% of them serve hundreds of XML requests a second, around the clock. We’ve been following the ’scale-out’ methodology, but it can only go so far before you have to look at squeezing more performance out of your hardware and software before buying that next $3,000 server.

The problem we were having was, without a restart, 5 mongrels will eat all the memory (2GB) on any of our quad-core, 64bit servers in about 30 hours. Note that this is not necessarily a generalization about mongrel but in my current application of it. The mongrels in question are running rails code that creates around 1GB of text in the production.log per hour. The bigger the production.log gets, the more memory mongrel/rails eats. Clear the log, mongrel gets back its memory - no restart needed. This doesn’t happen on our front-end GUI mongrels, just the ones creating all the logs.
Graph of available memory over the course of 1 day (Ruby 1.8.4 / Mongrel 1.1.1)

Rails production.log grows to around 1GB in size. production.log only truncated at 1AM:

free_mem_ruby184_mongrel111-no_rolloff.png

Rails production.log truncated every hour:

free_mem_ruby184_mongrel111-with_rolloff.png

My first thought was that it was something in my code that was obviously leaking memory. That was until I realized that just truncating the production.log freed the memory. That leads me to believe the issue is either in rails and/or the Logger class.

UPDATE (2008/03/20): Tom Werner discovered leaks in Logger

I’ve tried different versions of things: the situation looks only slightly better on ruby 1.8.6 than 1.8.4. I’ve also tried older versions of mongrel without any luck. I recently added the rails patches by the guys at Pluron which helped out quite a bit with CPU usage (25% drop!) but not much with memory.

I’m mostly posting this for historical purposes. I hope to have some time soon to get down and dirty with Valgrind or something similar to see where the memory is really going. Hopefully I’ll have an update soon about how I solved the problem! Any insight is welcome.

I am a workaholic

The passionate worker doesn’t show up because she’s afraid of getting in trouble, she shows up because it’s a hobby that pays.
I agree completely.  I try to explain this to my wife and she’s hard-pressed to understand.  But it’s true - I love what I do.

An intro of sorts…

This blog has more or less languished over the past couple years.  I never really got into the daily blogging rhythm, so there’s not much here to speak of.  That said, I’ve obtained a lot of experience and learned a lot over the past 2 years at my current job.  I figure I might as well try to blog about it since it could prove useful to someone else.  And, if not, it will at least be a way for me to supplement my horrible memory.

So, a bit of history.  Over the past (almost) 2 years, I’ve worked for Litmus Media as a ruby developer.  We are a web company specializing in click fraud protection.  We were acquired by Think Partnership about the time I came onboard and not much has changed.  I was brought on to write an web-based advertising platform that would allow advertisers to create marketing campaigns to be distributed through our click-protected network.  The platform is called DirectAds and includes a web UI, XML service (for serving the ads) and click processor. 

I choose the Ruby on Rails framework for various reasons.  The biggest reason was that I was tired of my current tools (perl, PHP, and tons of scattered libraries) and Rails looked like it was sizing up to be a great framework.  Plus, ruby just looked so sexy - how could I deny it?  So, 2 years later I can’t complain at all.  Although, like I mentioned, there were certainly hurdles to overcome when using this new, fancy-pants technology.  I plan to expand on my experiences in future posts.

Sebastian Tellier vs. Radiohead

A friend sent me a Sebastian Tellier song the other day that I quite liked. Though, immediately upon hearing it I realized it sounded a lot like something off the new Radiohead album. You be the judge:

Sebastian Tellier - La Ritournelle (Sessions, 2005)
YouTube
flash version w/ drums

Radiohead - Reckoner (In Rainbows, 2007)
YouTube

Not a direct rip either way, but very similar. Also, I’m not sure who wrote their’s first. I’ve read reports online saying Radiohead played a version of Reckoner as far back as 2001. At any rate, it’s a good riff!

Carlton breaks it down

Oh my goodness
A Carlton breakdance instructional? I need it!

Yes, like the Carlton from The Fresh Prince of Bel-Air.