this is totally gonna work… » Ruby

You Put Merb In My Jetty!

February 11th, 2009

In the latest update of The Chronicles of Stuff Alex Figures out at Work, our intrepid hero figures out how to run Merb inside an embedded Jetty instance!

Now you may ask yourself, “for the love of God, why would you want to do something like this?” Well, at work we do a lot of internal web services. For my particular team, we’ve found a real sweet-spot by using an embedded Jetty server sitting right next to a BDB instance. There are no extra processes or packages to manage (e.g. apache or a RDBMS). However we were becoming dissatisfied with our current web layer which is a homegrown REST framework that sits on top of the Servlet API. So in a fit of rage, I decided to see if I could stuff Merb in the middle of this mess.

You may also be asking yourself, “why not use the jruby-rack gem directly?” The answer is that the jruby-rack gem makes a lot of assumptions about how you want to run your application. First it assumes that you’re cool with packaging things up as a WAR (which I’m not) and, secondly, that your application is primarily a Rails/Merb application. In my case, for better or worse, our app is really a BDB application with a Merb app glommed onto the side for web visibility.

The Solution

I can’t take complete credit for this solution. If I hadn’t found Jan Berkel’s post on putting Rails in Jetty I would have never figured out how to stuff Merb in there. To give yourself some context, take a look at that post first. Then take a look at the “Merb-ified” version of the same recipe below. Both solutions assume that you’re configuring Jetty within JRuby.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
server = org.mortbay.jetty.Server.new
thread_pool = org.mortbay.thread.QueuedThreadPool.new
thread_pool.min_threads  = 5  # adjust as needed
thread_pool.max_threads  = 50
server.set_thread_pool(thread_pool)
context = Context.new(nil, "/", Context::NO_SESSIONS)
context.add_filter("org.jruby.rack.RackFilter", "/*", Handler::DEFAULT)
context.set_resource_base(Environment.resolve)
context.add_event_listener(MerbServletContextListener.new)
context.set_init_params(java.util.HashMap.new('merb.root'=>; Environment.resolve,
    'merb.environment' => 'production',
    'public.root' => Environment.resolve('public'),
    'gem.path' => Environment.resolve('gems'),
    'org.mortbay.jetty.servlet.Default.relativeResourceBase' => '/public',
    'jruby.max.runtimes' => '1'))
context.add_servlet(ServletHolder.new(DefaultServlet.new), "/")
server.set_handler(context)
server.start

Tweaking

At first blush our performance seemed to be pretty lacking. This required two tweaks: putting Merb in “production” mode and dealing with poor I/O due to logging. In the previous snippet you will notice that we set the merb.environment to production. Yes we lose the quick dev turnaround, but since there is a lot of Java in this project we usually have to recompile anyway which requires a restart anyway (phooey).

As for the I/O issue, a little digging revealed that shutting up Merb as much as possible would help reduce the amount of JRuby-level IO. In our config/init.rb we configure logging like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
Merb::Config.use { |c|
c[:environment]         = 'production',
c[:framework]           = {},
c[:log_level]           = :warn,
c[:log_file]            = Merb.root / "logs" / "merb.log",
c[:use_mutex]           = false,
c[:session_store]       = 'cookie',
c[:session_id_key]      = '_facet-store_session_id',
c[:session_secret_key]  = '49411912879b879e13f89a9280c0f6aaa2e3ab58',
c[:exception_details]   = true,
c[:reload_classes]      = false,
c[:reload_templates]    = false
}

Here we set the environment to “production” again (yes, you need to do both). Also we upped the log level to “warn” which significantly reduced the amount of logging merb does. With these tweaks in place we found that the Merb port of our service was operating within about 80% of the level of performance we were getting from our pure-Java solution.

Benchmarking was done by running httperf tests against the resources we expose and comparing both the number of requests per second and the average response time. Given that the options for generating XML, HTML and JSON were all so much easier than what we were doing in the servlet version, we were willing to live with the performance hit.

Opening The Gates…of Hell!!

January 26th, 2009

…umm, no, actually not so much.

hellboy.jpgInstead, this is just a humble little notice about a humble little gem I put together today. It’s called daemon-spawn and despite its simply terrifying name, it’s really here to help all mankind. You see, I’ve been working like mad to stuff Merb smack-dab in the middle of an embedded Jetty project I’ve been working on. One of the last things I needed was a decent daemon-launcher/management gem-thingie to make it happen.

I cast about for an existing solution and found each a little lacking. The daemons gem had the executable name hard-wired to the output log name and didn’t give me a clean way to specify additional arguments to JRuby (unless I wrote another wrapper script, to which I say “boo, hiss”). Then I looked at simple-daemon which seemed really promising. It was really really close to what I wanted but didn’t extend very well as it required more and more class-methods. Yuck. I looked at daemon_generator, but it was very Rails-y and wanted to generate a bunch of code for me, which I didn’t need. So I did what any honest, hard-working Ruby-dork does, and made my own!

It’s simple—dead simple. Wanna see how simple? Here’s a real-live echo server with daemon support:

    1 #!/usr/bin/env ruby
    2
    3 require 'daemon-spawn'
    4 require 'socket'
    5
    6 class EchoServer << DaemonSpawn::Base
    7
    8   attr_accessor :server_socket
    9
   10   def start(args)
   11     port = args.empty? ? 0 : args.first.to_i
   12     self.server_socket = TCPServer.new('127.0.0.1', port)
   13     port = self.server_socket.addr[1]
   14     puts "EchoServer started on port #{port}"
   15     loop do
   16       begin
   17         client = self.server_socket.accept
   18         while str = client.gets
   19           client.write(str)
   20         end
   21       rescue Errno::ECONNRESET => e
   22         STDERR.puts "Client reset connection"
   23       end
   24     end
   25   end
   26
   27   def stop
   28     puts "Stopping EchoServer..."
   29     self.server_socket.close if self.server_socket
   30   end
   31 end
   32
   33 EchoServer.spawn!(:working_dir => File.join(File.dirname(__FILE__), '..'),
   34                   :log_file => '/tmp/echo_server.log',
   35                   :pid_file => '/tmp/echo_server.pid',
   36                   :sync_log => true,
   37                   :singleton => true)

But what if you have non-Ruby code you want to daemonize? Well my friends, that’s what Kernel#exec is for and it works like a champ. See the README for the full details. And of course to view the README, you have to install the gem which means you have daemon-spawn in the bowels of your machine! Mwaaa haa haa haa! Oops…I’ve said too much…

In all seriousness though, I would like to thank the powers-that-be at work who were very gracious to let me open-source this. You should start seeing more of this kind of stuff from Evri soon. As always, your feedback, comments, critiques and patches are welcome.

Posted in Ruby | No Comments »

Ruby Threads Suck…Just Not The Way You Think They Do

January 25th, 2009

At work, we do a lot of scheduled tasks in which we process a “chunk” of data within a particular time-period. For example, we may tail log files, parse the lines and publish summary statistics “up-stream” on a fixed schedule of, say, ten minutes. Similarly, last week we were working on a Ruby wrapper script that launches memcached and maintains a registration lease within a home-grown registration service we run. The script needs to launch memcached, then periodically check it and renew its registration lease.

We have a RubyGem written to handle registration and renewal that hides the HTTP and XML message bodies away from the user. You simply create a client, setup your initial registration and tell it to keep you registered.

require "rubygems"
require "radar_love"
client = Radar::Client.new("http://radar-dev")
service = client.create('foobar', 'http://foobar:1234')
service.keep_registered # fires up background thread

That last line is implemented with a Ruby thread that loops indefinitely, sleeping and then renewing the registration lease. But a funny thing happened while implementing this. When we just fired up irb and tried to run this part (without doing any other work), the re-registration thread never executed. Man, I had heard that MRI threads were “broken”, but this is completely non-functioning!

Then I remembered a very handy page I ran across once about MRI Threading. This page is worth spending a little time with, but essentially because MRI threads are so-called “green threads” they aren’t really giving you true concurrent processing. Instead they are merely a context-switching mechanism, and the circumstances under which those switches can happen are described in that spec page.

In the case of our little irb session, the re-registration thread only began executing when we did something in the main thread. We weren’t executing anything in the main thread that triggered one of these context-switches (remember, we’re merely sitting at an irb prompt waiting for the next line). So getting the runtime to execute a context-switch merely required us to do something in the main thread:

loop do
  puts "Howdy!"  
  sleep 5
end

You may shake your head and mutter something derisive about this “hack”. However, in reality, requiring your main thread to do something isn’t terribly burdensome. If you didn’t have any work to do in your main thread, you’d have to ask yourself why you created a separate thread in the first place!

You may also think that since MRI threads don’t provide true concurrency, they’re worthless. One major limitation of green threads in MRI is that no matter how many you start, they will only execute on one processor. If you have a large multi-core machine, MRI threads will not be able to take advantage of them.

However, that doesn’t mean threads don’t have their place in MRI environments. In the first example I mentioned (tailing logs and publishing summaries) we use a separate thread for the publishing activity. We could have done this entire action in a single loop, but the major downside is that we would essentially be relying on new lines in our log to appear to “crank” the mechanism forward. If we go a long time before we see another log line, our summary task will fail to execute.

IO.popen("tail -F /var/log/some.log").each do |line|
  update_statistics(line)
  if Time.now >= next_report_time
    # we might not get here for a while unless the
    # log lines keep coming
    report_statistics
  end
end

It would certainly be possible to read from the file with a timeout that is based on how much time is left before the next reporting period. However at that point the code starts to get a little cluttered, so we go with the threaded approach only to take advantage of its context-switching properties. In our case, this is a perfect solution for what we’re trying to accomplish.

If you come from a Java or .NET background, you may find that MRI threads fail to measure up to threading in those environments. It’s absolutely true that MRI does not provide the same robust threading mechanisms that those languages do (JRuby, and perhaps IronRuby, being special cases). It doesn’t mean that threads in MRI are worthless, you just need to understand them properly to know when to use them.

Posted in Ruby | 2 Comments »

clip version 1.0.1 has been released!

January 6th, 2009

You like command-line parsing, but you hate all of the bloat. Why
should you have to create a Hash, then create a parser, fill the Hash
out then throw the parser away (unless you want to print out a usage
message) and deal with a Hash? Why, for Pete’s sake, should the parser
and the parsed values be handled by two different objects?

Changes:

### 1.0.1 / 2009-01-06

* Fixed a bug where generating help resulted in an infinite-loop

*

Posted in Ruby | No Comments »

ActiveRecord, Associations and Counters

January 4th, 2009

Maybe this is old hat to all you grizzled vets out there, but today I thought I’d post about my experience with ActiveRecord’s counter caches and the tricks I had to pull to get it working. Let me first set the stage with what I was trying to accomplish.

In moochbot, your main transaction screen has a standard tabbed-interface. Each tab is a different view of all your transactions. In a tabbed display you can only show one view at a time so it helps the user when you can provide some hints in the non-selected tabs. Anything that helps them figure out whether or not they want to click on something without actually having to click on it is, in my opinion, a great help.

Moochbot Tabs

So I wanted to add a number in the tab to indicate how many items were there. The most naïve way to implement this would be to issue three different SQL statements for the counts. However somewhere, in the back of my mind, I remembered that ActiveRecord has a feature known as the counter cache. The basic idea is to hook some additional code into the lifecycle of ActiveRecord’s associations to update a single column in the parent record as you add and remove child records.

Like an iceberg, the bulk of this feature lay deep below the surface. The actual view-layer changes were minimal, but I had to jump through some hoops to get the counter-cache working correctly.

In moochbot, a User model object has multiple Transaction records. Each Transaction points at two separate User records: one for the lender and one for the borrower. All of a user’s transactions are stored in the TRANSACTIONS table, each record differentiated by the STATE column.

In ActiveRecord-land we express these relationships in the model with three has_many relations for the User: one for items they are lending, one for items they are borrowing and all closed transactions. The first two are fairly straight-ahead:

class User < ActiveRecord::Base

  has_many(:lent_items,
           :class_name => "Transaction",
           :foreign_key => "lender_id",
           :conditions => ["state IN (?)",
                           %w(started lent returned overdue disputed)])

  has_many(:borrowed_items,
           :class_name => "Transaction",
           :foreign_key => "borrower_id",
           :conditions => ["state IN (?)",
                           %w(started lent returned overdue disputed)])

  has_many(:completed_items,
           :class_name => "Transaction",
           :finder_sql => 'SELECT * ' +
           'FROM transactions ' +
           'WHERE state IN (\'aborted\', \'finished\') AND ' +
           '(borrower_id = #{id} OR lender_id = #{id})')
end

However the third relationship requires some custom SQL because we want all records that are in either the “finished” or “aborted” state and where the user is either the lender or the borrower. I looked into doing this with a simple :conditions option on the has_many relationship, but couldn’t figure out how to specify the ID of the user.

One really important thing to recognize here is that the SQL is quoted with single-quotes. If the SQL is specified in double-quotes, the interpolation is evaluated too early and the id value is not the user record. Putting it in single-quotes defers evaluation until the proper time. I wish this were documented a little better because I was completely stuck until I stumbled across this post on the RailsBlaster blog. I had to enable debug-level logging for ActiveRecord to see that I was getting skooky IDs in my final SQL string.

To add a counter cache to the User record, you declare a :counter_cache option on the reciprocal belongs_to relationship. This seemed counter-intuitive to me since if I didn’t already have one in place, I’d have to add one. It seemed more obvious to me to put it in the has_many relationship but that ain’t the way ActiveRecord rolls. So the next step was to update the belongs_to mappings in the Transaction class:

class Transaction < ActiveRecord::Base

  belongs_to(:lender,
             :class_name => "User",
             :foreign_key => "lender_id",
             :counter_cache => :lent_items_count)

  belongs_to(:borrower,
             :class_name => "User",
             :foreign_key => "borrower_id",
             :counter_cache => :borrowed_items_count)
end

The final step was to create a migration that would add the counter cache columns to the USERS table. Note that not only do we add the columns, but we also update everyone’s counters.

class AddCounterCacheToUsers < ActiveRecord::Migration
  COLUMNS = [:lent_items_count,
             :borrowed_items_count,
             :completed_items_count]

  def self.up
    COLUMNS.each do |c|
      add_column :users, c, :integer, :default => 0
    end

    User.reset_column_information

    User.find(:all).each do |user|
      User.update_counters(user.id,
                           :lent_items_count => user.lent_items.length,
                           :borrowed_items_count => user.borrowed_items.length,
                           :completed_items_count => user.completed_items.length)
    end
  end

  def self.down
    COLUMNS.each do |c|
      remove_column :users, c
    end
  end
end

So there it is. Hopefully that helps the next poor sod that runs into that same problem.

Asynchronous Mail with DelayedJob, God & Daemons

November 5th, 2008

Slowly but surely I’ve been pecking away at a little Rails-based side-project for the last four or five months. I’m this close to flipping the on switch—but in the meantime I’ve still got some “i”s to dot and “t”s to cross. One of those was switching from in-request mail delivery to asynchronous mail delivery. The app I’ve been working on involves two parties marching a particular transaction through a variety of state transitions, each of which usually sends an email to either or both parties.

Like a good boy I started out with the simplest thing that could work which was to simply call mailers in my model. However, I wanted to limit the number of activities performed during a request to keep the app feeling responsive. So I decided that asynchronous mail delivery was a “pre-launch” feature that I had to have.

I looked at a variety of background processing tools, including Bj, Starling/Workling, Spawn and AP4R. Each had its strengths and weaknesses but none of them felt like the right fit. My research criteria included:

  • Job persistence via the database
  • Something that could get a Rails environment cheaply
  • Runs outside of the Rails processes
  • Minimum fuss to get it running

In the end the one that hit the sweet-spot best was delayed_job. It had the DB persistence I was looking for, but didn’t source the Rails environment for each worker and it was extremely simple to plumb it into my app.

Refactoring

The first step was creating the DelayedJob worker classes; one for each mail action. At first this turned into a big pile of five-line classes, so to keep things organized I put these all in app/models/jobs and put each class in the Jobs module namespace. This was better, but not good enough so the final step was putting all of the worker classes in a single file, app/models/jobs.rb.

The second step was to find every place in the model where I called my mailer classes directly and replace them with calls to enqueue the appropriate worker job.

Here is what things looked like at first:

    1 class UserObserver << ActiveRecord::Observer
    2   def after_create(user)
    3     unless user.current_state == :latent or user.is_a?(Admin)
    4       UserNotifier.deliver_signup_notification(user)
    5     end
    6   end
    7
    8   def after_save(user)
    9     if user.current_state == :promoted
   10       UserNotifier.deliver_signup(user)
   11     else
   12       UserNotifier.deliver_activation(user) if user.recently_activated?
   13     end
   14   end
   15 end
   16

Then the UserObserver was refactored like this:

    1 class UserObserver << ActiveRecord::Observer
    2   def after_create(user)
    3     unless user.current_state == :latent or user.is_a?(Admin)
    4       Delayed::Job.enqueue(Jobs::UserNotifierSignupNotificationJob.new(user.id))
    5     end
    6   end
    7
    8   def after_save(user)
    9     if user.current_state == :promoted
   10       Delayed::Job.enqueue(Jobs::UserNotifierSignupNotificationJob.new(user.id))
   11     else
   12       Delayed::Job.enqueue(Jobs::UserNotifierActivationJob.new(user.id)) if user.recently_activated?
   13     end
   14   end
   15 end
   16

With the following workers (abridged):

    1 module Jobs
    2   class UserNotifierDisconnectJob << Struct.new(:user_id)
    3     def perform
    4       UserNotifier.deliver_disconnect(user_id)
    5     end
    6   end
    7
    8   class UserNotifierResetPasswordJob << Struct.new(:user)
    9     def perform
   10       UserNotifier.deliver_reset_password(user)
   11     end
   12   end
   13
   14   class UserNotifierSignupNotificationJob << Struct.new(:user)
   15     def perform
   16       UserNotifier.deliver_signup_notification(user)
   17     end
   18   end
   19
   20   class UserNotifierStartDisconnectJob << Struct.new(:user_id)
   21     def perform
   22       UserNotifier.deliver_start_disconnect(user_id)
   23     end
   24   end
   25 end
   26

Testing

Prior to switching to asynchronous processing, mail delivery was triggered within my models, either via Observers or as Procs attached to state transitions (I’m using the acts_as_state_machine plugin). Therefore my tests had loads of assertions that various state changes in the model resulted in direct email delivery. In the asynchronous model of course, that changes slightly. While the state change ultimately ends in mail delivery, it only happens indirectly.

So here I had a big pile of tests that asserted that poking the model in certain ways resulted in a mail delivery. In my unit-tests I really just wanted to test the interaction between the models and DelayedJob. After all, if something went hay-wire during mail delivery the culprit would likely be my new worker classes, not my model.

However, for my integration tests I still wanted to keep the assertions about actual mail delivery since that was an important part of the stories. I could easily do this by monkey-patching the DelayedJob::enqueue method to call the worker’s perform method directly. In my unit-tests I monkey-patched the DelayedJob::enqueue method to work more like a mock object which added some inquiry methods to check that it had been invoked correctly.

In isolation this worked great, but running all the tests together resulted in a number of random failures. I’ve run into this enough times to recognize that some tests were somehow poisoning the run-time environment for the others. It turns out that my two approaches were incompatible with each other unless I was very diligent about cleaning everything up properly. I will admit with red-faced shame that I punted. I did the lamest thing one could possibly do and redefined DelayedJob::enqueue for all of my tests and kept all of my original assertions. I’m not proud of it, but it does work.

Running in Production

The next trick was getting this all running in a production environment and I needed to figure out how one or more workers would be started and kept running. While it’s great to have this decoupled from the Rails environment, it means that it’s a separate process that needs to be managed.

My solution was to use the daemons gem to create a couple of scripts. Then I used Tom Preston-Warner’s god, to monitor my process. The scripts look like this:

    1 #!/usr/bin/env ruby
    2
    3 unless ARGV.size == 1
    4   $stderr.puts "USAGE: #{0} [environment]"
    5   exit 1
    6 end
    7
    8 RAILS_ENV = ARGV.first
    9 require File.dirname(__FILE__) + '/../config/environment'
   10
   11 Delayed::Worker.new.start
   12

And the “control” script looks like this:

    1 #!/usr/bin/env ruby
    2
    3 require "rubygems"
    4 require "daemons"
    5
    6 def running?(pid)
    7   # Check if process is in existence
    8   # The simplest way to do this is to send signal '0'
    9   # (which is a single system call) that doesn't actually
   10   # send a signal
   11   begin
   12     Process.kill(0, pid)
   13     return true
   14   rescue Errno::ESRCH
   15     return false
   16   rescue ::Exception   # for example on EPERM (process exists but does not belong to us)
   17     return true
   18   end
   19 end
   20
   21 if ARGV.size == 1 and ARGV.first == "status"
   22   pidfile = "/var/run/delayed_job_worker.pid"
   23   if File.exists?(pidfile)
   24     pid = open(pidfile).readlines.first.strip.to_i
   25     if running?(pid)
   26       puts "delayed_job_worker is running (#{pid})"
   27     else
   28       puts "delayed_job_worker is NOT running (#{pid})"
   29     end
   30   else
   31     puts "delayed_job_worker is NOT running (none)"
   32   end
   33 else
   34   Daemons.run(File.dirname(__FILE__) + '/delayed_job_worker',
   35               :backtrace => true,
   36               :log_output => true,
   37               :dir => File.dirname(__FILE__) + "/../tmp/pids",
   38               :dir_mode => :normal,
   39               :multiple => false)
   40 end
   41

The first script is merely the smallest amount of chrome required to start a worker. Note that I’m using John Barnette’s version of delayed_job which gives us that nice little worker riff. The second script is essentially the daemons wrapper around my worker. For a little extra goodness I added my own “status” command which can be handy for debugging.

Getting those to work properly took a little bit of testing by hand. Fortunately, the entire solution is really a series of layers applied on top of each other. Once you convince yourself that an inner-layer is working correctly you can move on to build the outer layers.

The next step was to create a proper god configuration. I have more than one thing to monitor on my setup so I have a master god configuration that includes sub-configurations. My “main” configuration is a simple one-liner:

God.load "/etc/god/*.god"

My application-specific configuration looks like this (with a few edits for public consumption):

    1 RAILS_ROOT = "/var/www/moochbot/current"
    2
    3 God::Contacts::Email.message_settings = {
    4   # your config goes here
    5 }
    6
    7 God::Contacts::Email.server_settings = {
    8   # your config goes here
    9 }
   10
   11 God.contact(:email) do |c|
   12   # your config goes here
   13 end

After setting up some default notification details, we get into the meat of defining our “watch”:

   14
   15 God.watch do |w|
   16   w.name = "delayed_job_worker"
   17   w.interval = 10.seconds
   18   w.start = "#{RAILS_ROOT}/script/delayed_job_worker_control start -- production"
   19   w.stop = "#{RAILS_ROOT}/script/delayed_job_worker_control stop"
   20   w.restart = "#{RAILS_ROOT}/script/delayed_job_worker_control restart"
   21   w.start_grace = 10.seconds
   22   w.restart_grace = 10.seconds
   23   w.pid_file = "#{RAILS_ROOT}/tmp/pids/delayed_job_worker.pid"
   24
   25   w.uid = "deploy"
   26   w.gid = "root"
   27   w.behavior(:clean_pid_file)

Next we need to define our transitions. My first attempt at this failed because I was missing these and my watched process was stuck in the “unmonitored” state. It’s worth spending some time reading the docs on the homepage since at first glance the configuration wasn’t obvious to me.

   28
   29   # determine the state on startup
   30   w.transition(:init, { true => :up, false => :start }) do |on|
   31     on.condition(:process_running) do |c|
   32       c.running = true
   33     end
   34   end
   35
   36   # determine when process has finished starting
   37   w.transition([:start, :restart], :up) do |on|
   38     on.condition(:process_running) do |c|
   39       c.running = true
   40     end
   41
   42     # failsafe
   43     on.condition(:tries) do |c|
   44       c.times = 5
   45       c.transition = :start
   46     end
   47   end
   48
   49   # start if process is not running
   50   w.transition(:up, :start) do |on|
   51     on.condition(:process_exits)
   52   end
   53

Finally I specify some resource-consumption boundaries to make sure that my little worker daemon doesn’t take over my box:

   54   w.restart_if do |restart|
   55     restart.condition(:memory_usage) do |c|
   56       c.above = 100.megabytes
   57       c.times = [3, 5]
   58     end
   59
   60     restart.condition(:cpu_usage) do |c|
   61       c.above = 50.percent
   62       c.times = 5
   63     end
   64   end
   65
   66   w.lifecycle do |on|
   67     on.condition(:flapping) do |c|
   68       c.to_state = [:start, :restart]
   69       c.times = 5
   70       c.within = 5.minute
   71       c.transition = :unmonitored
   72       c.retry_in = 10.minutes
   73       c.retry_times = 5
   74       c.retry_within = 2.hours
   75     end
   76   end
   77 end
   78
Posted in Ruby | 3 Comments »

Proportional Code

October 26th, 2008

Few things are less sexy than command-line parsing. It is one of the most mundane tasks a programmer has to execute in their career. But, it surprises just how much code is required to do basic command-line parsing in a lot of languages, including Ruby. So I got to thinking, why does this bug me so much? I think the answer is that requiring so much code for such a relatively trivial task violates my sense of proportionality in the code. I hate having to say so much more about this teeny little task than I do about the “theme” of my code. I think it distorts the narrative of the code.

Let’s say that the processing of writing your program is like launching spacecraft. Ideally you would like to get from launch to cruising around in space as quickly as possible. The Star Trek universe solves this quite elegantly with the transporter. We don’t put you in a box and launch it, we break your atoms apart and transmit them to another location! That’s pretty cool, but maybe we’re just not quite that cool yet. Another solution is one proffered by the Star Wars universe. A ship like the Millenium Falcon can leave just about any planetary atmosphere any time it damn well pleases without the use of special equipment. It just flies away. Not bad.

Atlantis (STS-125)

However, here on earth, our primitive space craft need a tremendous amount of disposable apparatus to reach escape-velocity. The proportion of useful vehicle (the shuttle) to the orbit-busting mechanism (the rocket boosters and fuel tanks) is a staggering 5.4:1 (based on liftoff weight).

Command-line parsing in code exhibits a similar disproportion. The interesting part of your app isn’t the command-line parsing. Why should it take up such a disproportionate amount of space in your code? Those boosters quickly become “space-junk”, once the launch vehicle has left Earth; expensive trash that is never to be used again.

Space-junk is dangerous for the next guy that wants to launch into space. It’s also dangerous for the folks on the ground as it may decide to come crashing back down to earth. And those boosters have also been responsible for one of the worst disasters in NASA’s history. If we could only get rid of those damn things the whole space program just might be a little better off. Unfortunately physics, and our current space-flight capabilities currently require them.

But our code is a different story. We don’t have such physical barriers that handicap us. Any barriers we run into are usually of our own making. So why not try to reduce those as much as possible? Why say in more code, what you could say in less? Wouldn’t that cut down on bugs? Wouldn’t that ease the burden of maintenance? Wouldn’t that reduce the amount of information overhead you have to maintain each time you revisit that code?

Command-line parsing is a stupid, menial task that should require very little attention. By extension, it should only be given a stupid, menial amount of code to make it work. We have big ideas! They shouldn’t get bogged down by handling command-line options!

This is why I wrote Clip. Clip is an expression of the need to make the simple things simple, but no simpler. If you have modest command-line parsing needs, Clip rewards you with minimal investment. If you need something trickier, Clip allows you to say a little more to it and gain more benefits from it. You get to decide how much you want to engage—not the library.

This is one of the things I like about Ruby. The language is extremely flexible which gives me a lot of ways to “pack” ideas into code in a variety of ways. Having more than one way to do things isn’t all that useful by itself. But it’s essential when you want to write expressive code. Things like object literals, or one-line control-structure alternatives help me keep the lines of code proportional to the ideas they express.

This is also something I find challenging to do in Java. In languages like Java, even just creating a collection of objects requires quite a bit of code:

    1 import java.util.*;
    2
    3 public class Designer {
    4   public void makeItWork(List<Trash> trash) {
    5     // today's challenge: convert trash to wearable garments
    6     List<Garment> garments = new ArrayList<Garment>(trash.size());
    7     for (Trash t : trash) {
    8       Garment g = new Garment(t.getName());
    9       garments.add(g);
   10     }
   11
   12     submitToJudges(garments);
   13   }
   14 }

In Java we often solve this by pushing all of that code into a private method that is named something meaningful. This works pretty well, but does tend to result in an explosion of “helper methods”. Sometimes folks take the “cheap” route and simply prefix these riffs with explanatory comments like “convert each Trash into a Garment”. I’m not a real big fan of this. Generally I don’t care about the object conversion in the first place because the rest of the code is presumably doing something interesting with the Garments and I don’t give a damn about the Trash.

So let’s look at it in Ruby:

    1 class Designer
    2   def make_it_work(trash)
    3     submit_to_judges(trash.map { |t| Garment.new(t.name) })
    4   end
    5 end

By my count there are five lines in the Java example (including the comment) just to convert trash to garments. In contrast I boiled that down to one line in Ruby. OK I could have done this in two lines if you think that’s too much of a long-train. But I think there are couple of important points here:

  1. The importance of the concept being expressed diminishes from left to right
  2. The attention-span of the reader diminishes from top to bottom

The Ruby example beats Java on both counts. I don’t waste a lot of the reader’s attention span up-front on book-keeping details (in the vertical space) and I state the important thing I’m trying do (submit my top Foos) quickly (on the left). The details of which Garments I’m dealing with are merely a qualification of what I’m trying to do.

How you handle these two dimensions is greatly affected by both the language you use and the APIs you deal with. This is one of the reasons that I do not find the use of scripting languages for Java’s Swing API all that compelling. Scripting languages like JRuby or Jython help me with the horizontal space, but don’t do a damn thing for the vertical requirements. With an awful API like Swing I have to say a lot of words to make it go, regardless of the language I do it with.

Getting back to my dumb example, being on such a small scale this may not seem like a large impact. But multiplied several times over to match the size of most projects, this kind savings can really pile up. The difference in the amount of code required by these two approaches is manifested in a savings in cognitive investment required to grok these projects. This is the very essence of maintainability and sustainability. Anytime you can do more with less, you come out ahead.

At great peril to my own geek cred, I will say that this is why I find The Lord of the Rings to be such an awful piece of writing. It is so full of peripheral and non-essential information that finding the real story or characters requires extraordinary patience and concentration on the part of the reader. If Tolkien had been more concerned with the story and less with “world-building” I’ll bet he could have gotten that story boiled down to a single book.

Now I realize that a lot of folks love the Tolkien books for the very reasons I criticize it. That’s fine, that is an argument about aesthetics, not facts. However I would strongly argue that “world-building” in your code is a bad idea. I think you’re much more likely to build a decent piece of software if you pack your ideas tightly like a William Gibson novel than as a sprawling “trilogy” of epic code. Go ahead, prove me wrong. I double dog-dare ya.

OK, so by this point any credibility I may have had is gone. Look at the size of a post about saying more with less. In the hope that you might be lazy and like to skip to the end:

Do as much as you can…with as little as possible.

Make View Helpers a Little Less “Helpful”

September 23rd, 2008

I stumbled across a little bit of hidden Rails fun last night when I was trying to get the form_for method to stop wrapping error fields with extra div tags. Did you know that? Maybe you never noticed, but when you use the field helper methods, like text_field, password_field, etc, Rails will wrap fields with errors in a <div> with the class ‘fieldWithErrors’.

This was causing all sorts of grief for me in some JavaScript I was trying to write. At first I tried to go with the flow and fix my JS, but it got really hacky really fast. So I went on a little source-code spelunking to figure how to make the problem go away.

In actionpack-2.1.0 there is a Proc attached to the ActionView::Base.field_error_proc class attribute. It’s not documented in the RDoc, but this is also a writable attribute which means I can shut the damn thing up. Here’s what it looks like in the original file, GEM_HOME/actionpack-2.1.0/lib/action_view/helpers/active_record_helper.rb:

require 'cgi'
require 'action_view/helpers/form_helper'

module ActionView
  class Base
    @@field_error_proc = Proc.new{ |html_tag, instance| "<div class=\"fieldWithErrors\">#{html_tag}</div>" }
    cattr_accessor :field_error_proc
  end
  ...
end

My solution was to create a little file in config/initializers named field_errors.rb with this text:

ActionView::Base.field_error_proc = Proc.new do |html_tag, instance|
  html_tag
end

Et voila! Just a simple pass-through with no more fancy-pants markup. Putting stuff like this in separate files in the initializers directory keeps your config and environment files from getting out of hand.

Posted in Rails | No Comments »

Clip 1.0.0 Released!

September 19th, 2008

Command-line parsing made short and sweet.

1.0.0 / 2008-09-19

  • Added support for mapping dashes to underscores for flags
  • Define Clip.hash.remainder as a singleton method instead of reopening Hash
  • remainder works with Clip.hash now
  • Reimplemented Clip.hash to use a parser.

Check out the docs at http://clip.rubyforge.org.

Posted in Ruby | No Comments »

ActiveRecord Fun Thay May Stump Only Me

July 23rd, 2008

I’ve just spent the last two hours pulling my hair out trying to get Single-Table Inheritance (STI) working with associations in ActiveRecord. After essentially walking through all of the possible ActiveRecord options in this setup, I finally stumbled upon a configuration that seems to work. So this post is an attempt to help the next poor bastard who is Googling in earnest for a solution to a similar problem.

So let’s start with the domain model. I’m too spent at this point in the evening to port this to one of the standard examples. Instead I’ll expose you to the domain of my particular problem. The app I’m working on is one that tracks (non-financial) lending transactions between two individuals. The parties involved, the item in question and when it’s due are all tracked in the Transaction model (and transactions table). A Transaction has a number of states it walks through, using the acts_as_state_machine plugin. These state transitions are triggered by opaque-looking URLs that are sent via email to either party. These are one-time use actions that once consumed are no longer available. When an Action instance is created it also has a before_save callback that generates a unique ID (used in the URL) using Digest::SHA1.

So my plan was to have my Transaction class write one or more Action records for each possible action based on my state transitions. Take a look at the state diagram below:

state-transitions.png

I want to encapsulate the actual work to be performed within the Action instance the user invokes by following the link in their email. So my plan is to use STI to have different sub-classes of Action that operate on a transaction and march it forward to its next state polymorphically.

Now STI may appear to be total overkill for this problem, but here are my reasons for going this route:

  • I want to have these opaque IDs written down somewhere to associate a specific action with a URL
  • When the action is complete, I want to remove the record so it can’t be performed again
  • The state for a given Transaction can have more than one possible action. I want a separate for each action.

Whew. Okay, clear so far?So my initial code looked something like this:

require "digest/sha1"

class Action < ActiveRecord::Base

  belongs_to :transaction
  before_save :create_guid

  def create_guid
    sha1 = Digest::SHA1.new
    sha1.update transaction_id.to_s
    sha1.update type.downcase
    sha1.update DateTime.to_s
    self.guid = sha1.hexdigest
  end
end

class ReturnAction < Action
  def execute
    transaction.return!
  end
end

class AbortAction < Action
  def execute
    transaction.abort!
  end
end

class DisputeAction < Action
  def execute
    transaction.abort!
  end
end

It seemed like a good idea at the time, but the strange thing was that no matter which incantation I tried, I simply couldn’t create a new Action instance and have it write a record to the database. This simply didn’t work:

ReturnAction.create! :transaction_id => 1

There were no errors on the returned object. No exceptions were thrown. No queries to the database and certainly no insert statements executed. Just complete and utter silence. Out of desperation, as much as anything else, I removed the belongs_to declaration from the Action class and instead declared a has_many on the Transaction class. Voila! It worked like a champ.

After a bit of thought, the has_many association makes complete sense to me in the case where we want to create new Action instances for a particular Transaction. However, if you look in the code above, the execute methods of each sub-class are referring to a transaction object/method—which I no longer have. However I don’t necessarily need the full-blown belongs_to association here. I can just fake the bits I want in the parent Action class like so:

class Action < ActiveRecord::Base
  def transaction
    @transaction ||= Transaction.find(self.transaction_id)
  end
end

So none if this is particularly earth-shattering. Sorry folks, no great gems of philosophical wisdom today. Just one man’s small accomplishment blown completely out of proportion.

Posted in Rails, Ruby | 1 Comment »