The State of MongoDB and Ruby

Jan 20 • Posted 3 years ago

The state of Ruby and MongoDB is strong. In this post, I’d like to describe some of the recent developments in the Ruby driver and provide a few notes on Rails and the object mappers in particular.

The Ruby Driver

We just released v1.2 of the MongoDB Ruby driver. This release is stable and supports all the latest features of MongoDB. If you haven’t been paying attention to the driver’s development, the Cliff’s Notes are below. (Note that if you’re an using older version of the driver, you owe it to your app to upgrade).

If you’re totally new to the driver, you may want to read Ethan’s Gunderson’s excellent post introducing it before continuing on.

Connections

There are now two connection classes: Connection and ReplSetConnection. The first simply creates a connection to a single node, primary or secondary. But you probably already knew that.

The ReplSetConnection class is brand new.  It has a slightly different API and must be used when connecting to a replica set. To connect, initialize the ReplSetConnection with a set of seed nodes followed by any connection options.

You can pass the replica set’s name as a kind of sanity check, ensuring that each node connected to is part of the same replica set.

Replica sets

If you’re running replica sets (and why wouldn’t you be?), then you’ll first want to make sure you connect with the ReplSetConnection class. Why? Because this class facilitates discovery, automatic failover, and read distribution.

Discovery is the process of finding the nodes of a set and determining their roles. When you pass a set of seed nodes to the ReplSetConnection class, you may now know which is the primary node. The driver will find that node and ensure that all writes are sent to it. In addition, the driver will discover any other nodes not specified as seeds and then cache those for failover and, optionally, read distribution.

Failover works like this. Your application is humming along when, for whatever reason, the primary member of the replica set goes down. So subsequent operations will fail, and the driver will raise the Mongo::ConnectionFailure exception until the replica set has successfully elected a new primary.

We’ve decided that connection failures shouldn’t be handled automatically by the driver. However, it’s not hard to achieve the oft-sought seamless failover. You simply need to make sure that 1) all writes use safe mode and 2) that all operations are wrapped in a rescue block. Details on just how to do that can be found in the replica set docs.

Finally, we should mention read distribution. For certain read-heavy applications, it’s useful to distribute the read load to a number of slave nodes, and the driver now facilitates this. 

With :read_secondary => true, the connection will send all reads to an arbitrary secondary node. When running Ruby in production, where you’ll have a whole bunch of Thins and Mongrels or forked workers (à la Unicorn and Phusion), you should get a good distribution of reads across secondaries. 

Write concern (i.e., safe mode plus)

Write concern is the term we use to describe safe mode and its options. For instance, you can use safe mode to ensure that a given write blocks until it’s been replicated to three nodes by specifying :safe => {:w => 3}. For example:

That gets verbose after a while, which is why the Ruby driver supports setting a default safe mode on the Connection, DB, and Collection levels as well. For instance:

Now, the insert will still use safe mode with w equal to 3, but it inherits this setting through the @con, @db, and @collection objects. A few more details on this can be found in the write concern docs.

JRuby

One of the most exciting advances in the last few months is the driver’s special support for JRuby. Essentially, when you run the driver on JRuby, the BSON library uses a Java-based serializer, guaranteeing the best performance for the platform.

One of the big advantages to running on JRuby is its support for native threads. So if you’re building multi-threaded apps, you may want to take advantage of the driver’s built-in connection pooling. Whether you’re creating a standard connection or a replica set connection, simply pass in a size and timeout for the thread pool, and you’re good to go.

Another relevant feature that’s slated for the next month is an asynchronous facade for the driver that uses the reactor pattern. (This has been spearheaded, and is in fact used in production, by Chuck Remes. Thanks, Chuck!). You can track progress at the async branch.

Rails and the Object Mappers

Finally, a word about Rails and object mappers. If you’re a Rails user, then there’s a good chance that you don’t use the Ruby driver directly at all. Instead, you probably use one of the available object mappers.

The object mappers can be a great help, but do be careful. We’ve seen a number of users get burned because they don’t understand the data model being created. So the biggest piece of advice is to understand the underlying representation being built out by your object mapper. It’s all too easy to abuse the nice abstractions provided by the OMs to create unwieldy, inefficient mega-documents down below. Caveat programator.

That said, I get a lot of questions about which OM to use. Now, if you understand how the OM actually works, then it really shouldn’t matter which one you use. But not everyone has the time to dig into these code bases. So when I do recommend one, I recommend MongoMapper. This is, admittedly, a bit of an aesthetic judgment, but I like the API and have found the software to be simple and reliable. Long-awaited docs for the projects are imminent, and we’ll tweet about them once they’re available.

What’s next

If you want to know more about the Ruby driver, tune in to next week’s Ruby driver webcast, where I’ll talk about everything in the post, plus some.

Finally, a big thanks to all those who have contributed to the driver, to the object mapper authors, and the all users of MongoDB with Ruby.

- Kyle Banker