In June, the kernel team released MongoDB 2.1.2, which will evolve into the 2.2 release, as well as the 2.0.6 stable release.
At the same time the drivers team has been hard at work improving the drivers and adding support for new features in the upcoming 2.2 release. These releases are:
Additional improvements continue for MongoDB wrapper for Azure:
For up to date information on new MongoDB releases join the MongoDB Announcements mailing list.
Every month, we’ll be publishing the best community blog posts from the month. Here is the digest for June:
Rick Copeland wrote two posts on Python development with Ming, an ODM developed at SourceForge: Schema Maintenance with Ming and MongoDB and Declarative Schemas for MongoDB in Python using Ming.
After his talk on Geolocation, Maps and MongoDB at MongoDB UK, Derick Rethans wrote a post on Indexing Freeform-Tagged Data.
MongoDB developer Kristina Chodorow continued her series on replica sets internals with a post on initial sync.
With the Mongoose version 3 release imminent, Aaron Heckmann started a series of posts on the newest features for the node.js ODM. The first post covers versioning and the second covers FindAndModify.
Ryan McGeary wrote a post on schema design at BusyConf, winning 10gen’s recent blogging contest. To see the rest of the entries, check out blog.10gen.com.
10gen interviewed several MongoDB users on their blog, including Xperscore, Merchpin, iBE.net, and Equal Experts.
Dwight Merriman, one of the MongoDB committers, wrote a post on Why MongoDB is (somewhat) feature-heavy.
The team at Buffer, a social media management tool, blogged about using MongoDB for real-time analytics.
5in5NYC recently published their “10gen and friends” episode, featuring five New York City based startups working with MongoDB: 10gen, Art.sy, Bitfloor, Crowdtap, and NextBigSound.
This is a cross posting from the blog of MongoDB developer Kristina Chodorow. While TTL collections won’t be available until version 2.2, you can test them out yourself in the latest development (unstable) release of MongoDB, version 2.1.2.
In The Princess Bride, every night the Dread Pirate Roberts tells Westley: “Good night, Westley. Good work. Sleep well. I’ll most likely kill you in the morning.” Let’s say the Dread Pirate Roberts wants to optimize this process, so he stores prisoners in a database. When he captures Westley, he can put:
> db.prisoners.insert({
... name: "Westley",
... sentenceStart: new Date()
... })

However, now he has to run some sort of cron job that runs all the time in order to kill everyone who needs killing and keep his database up-to-date. Enter time-to-live (TTL) collections. TTL collections are going to be released in MongoDB 2.2 and they’re collections where documents expire in a more controlled way than with capped collections. What the Dread Pirate Roberts can do is:
> db.prisoners.ensureIndex({sentenceStart: 1}, {expireAfterSeconds: 24*60*60}
Now, MongoDB will regularly comb this index looking for docs to expire (so it’s actually more of a TTL index than a TTL collection). Let’s try it out ourselves. You’ll need to download the latest Development Release (Unstable) to use this feature. Start up the mongod and run the following in the Mongo shell:
> db.prisoners.ensureIndex({sentenceStart: 1}, {expireAfterSeconds: 30})
We’re on a schedule here, so our pirate ship is more brutal: death after 30 seconds. Let’s take aboard a prisoner and watch him die.
> var start = new Date()
> db.prisoners.insert({name: "Haggard Richard", sentenceStart: start})
> while (true) {
... var count = db.prisoners.count();
... print("# of prisoners: " + count + " (" + (new Date() - start) + "ms)");
... if (count == 0)
... break;
... sleep(4000); }
# of prisoners: 1 (2020ms)
# of prisoners: 1 (6021ms)
# of prisoners: 1 (10022ms)
# of prisoners: 1 (14022ms)
# of prisoners: 1 (18023ms)
# of prisoners: 1 (22024ms)
# of prisoners: 1 (26024ms)
# of prisoners: 0 (30025ms)
…and he’s gone. Conversely, let’s say we want to play the “maybe I’ll kill you tomorrow” game and keep bumping Westley’s expiration date. We can do that by updating the TTL-indexed field:
> db.prisoners.insert({name: "Westley", sentenceStart: new Date()})
> for (i=0; i < 10; i++) {
... print("updating Westley's sentence (" + i + ")");
... db.prisoners.update({name:"Westley"}, {$set:{sentenceStart:new Date}});
... sleep(4000);
... }
updating Westley's sentence (0)
updating Westley's sentence (1)
updating Westley's sentence (2)
updating Westley's sentence (3)
updating Westley's sentence (4)
updating Westley's sentence (5)
updating Westley's sentence (6)
updating Westley's sentence (7)
updating Westley's sentence (8)
updating Westley's sentence (9)
> db.prisoners.count()
1
…and Westley’s still there, even though it’s been more than 30 seconds. Once he gets promoted and becomes the Dread Pirate Roberts himself, he can remove himself from the execution rotation by changing his sentenceStartfield to a non-date (or removing it altogether):
> db.prisoners.update({name: "Westley"}, {$unset : {sentenceStart: 1}});
When not on pirate ships, developers generally use TTL collections for sessions and other cache expiration problems. If ye be wanting a less grim introduction to MongoDB’s TTL collections, there are some docs on it in the manual.
This is part one of a three part blog series by Mitch Pirtle.
We have covered a lot on the blog about MongoDB features, as well as many ways to utilize MongoDB from different languages. This is the first in a series of posts from the perspective of a PHP developer; and covers the gamut from getting started to advanced concepts.
I’m not going to waste the first blog post getting you up and running with MongoDB and the PHP extension, as that whole process is documented quite beautifully:
While we’re at it, you should also take advantage of the online documentation for MongoDB, as well as the reference for the MongoDB extension for PHP.
“So now what?” Oddly enough, I get this question quite frequently, as most folks familiar with relational databases are expecting to create a database, make sure the proper character set is being used, and also enforce login security for database access.
With MongoDB this is greatly simplified, as all MongoDB databases are UTF-8. This is important to note, as some databases arbitrarily pick character sets based on - leaving you with some annoying migration problems to deal with down the road.
We need to secure the database but that is relatively straightforward. At least for development, not using authentication (“trusted mode”) is fine as it should be running on your laptop, with your laptop firewall on. On a public server, either lock down the db with authentication or lock down the network (much like one would do with memcached).
You can dynamically create your database from PHP when you save your first document. This is actually the right place to start for your first experience with MongoDB.
Let’s get some quick terminology down so things make more sense. In MongoDB, a table is called a collection, and a row or recordset is called a document.
You are looking at MongoDB most likely wondering what a document database can do for you, and your applications. Unlike relational databases, MongoDB is schema-less, meaning you can store different types of documents in the same collection. This makes MongoDB fantastic for rapid prototyping, but also passes the responsibility to you, our intrepid developer, to make sure to lock down your schema when you finalize your data model.
Why don’t we start with people, since they are the most common user of web applications:
{
"first_name" : "MongoDB",
"last_name" : "Fan",
"tags" : ["developer","user"]
}
This is familiar to a lot of you that work with JavaScript, as it is a JSON document. MongoDB stores your documents as a serialized binary JSON called BSON, giving you the ability to reach inside your documents to find and manipulate them quickly and easily.
There are the ever-present first_name and last_name fields, as well as one called tags which adds a wrinkle to things. This is not stored as a string, but an array of strings. Here is one of MongoDB’s most profound differences with relational databases.
If you are storing arrays inside your documents, you are also able to search by them as well. With the above example, you could get a count of all users in your database who listed developer as one of their tags.
Most importantly, MongoDB can search the tag field regardless of whether it is there or not, or it contains a string, an embedded array, or even an embedded document.
Here is what this document looks like in PHP, I’m calling this first script test.php:
$user = array(
'first_name' => 'MongoDB',
'last_name' => 'Fan',
'tags' => array('developer','user')
);
Now let’s quit talking and start doing, as that’s the way I like to learn. I’m going to be verbose, so you understand all the things that are taking place. We can shorten this process later. Add the following to your existing test PHP script:
// Configuration
$dbhost = 'localhost';
$dbname = 'test';
// Connect to test database
$m = new Mongo("mongodb://$dbhost");
$db = $m->$dbname;
// Get the users collection
$c_users = $db->users;
// Insert this new document into the users collection
$c_users->save($user);
When you run this script (php -f test.php) you might not notice much. What happened in the database? Let’s find out, by firing up the MongoDB client app on the command line:
mpirtle$ mongo test
MongoDB shell version: 2.0.6
connecting to: test
> show collections;
system.indexes
users
> db.users.findOne()
{
"_id" : ObjectId("4fd371a4f479d1924f000000"),
"first_name" : "MongoDB",
"last_name" : "Fan",
"tags" : [
"developer",
"user"
]
}
The test database didn’t exist, as well as the users collection, until we saved that document. I appreciate using findOne() as it formats the JSON output nicely - calling find() will show all results, but no pretty indentation.
There’s a surprise waiting for you in your shiny new document; a field called _id. This is added by MongoDB to ensure a unique key, one that remains unique across a great many shards and clusters of servers.
If you compare the document that is returned by findOne() versus the original data, you will notice that an _id field has been added. The _id field is the unique identifier for a document in a collection. If you don’t provide an _id value, MongoDB will generate a unique ObjectID which is a 12-byte binary value designed to have a high probability of being unique when allocated. An _id value is typically an ObjectID, but you can also specify your own value if there is a more natural primary key. If you’re just starting with MongoDB it’s generally best to use the default ObjectID.
Now it is time to take a look at how to retrieve this data. I’m creating another PHP script called test2.php, and it looks strikingly similar to this:
// Configuration
$dbhost = 'localhost';
$dbname = 'test';
// Connect to test database
$m = new Mongo("mongodb://$dbhost");
$db = $m->$dbname;
// Get the users collection
$c_users = $db->users;
// Find the user with first_name 'MongoDB' and last_name 'Fan'
$user = array(
'first_name' => 'MongoDB',
'last_name' => 'Fan'
);
$user = $c_users->findOne($user);
var_dump($user);
Running this script (php -f test2.php) produces the following output:
mpirtle$ php -f test2.php
array(4) {
'_id' =>
class MongoId#6 (1) {
public $$id =>
string(24) "4fd37aa3f479d1c850000000"
}
'first_name' =>
string(3) "MongoDB"
'last_name' =>
string(5) "Fan"
'tags' =>
array(2) {
[0] =>
string(9) "developer"
[1] =>
string(5) "user"
}
}
The PHP var_dump() output is similar to the shell output for db.users.findOne() above. You will notice that the document created by test2.php has a different ObjectId from the first document that was created in test.php. ObjectIds are non-sequential and are generated from a combination of current timestamp, machine ID, process ID, and a counter field (read the ObjectId specification if you’re curious to know more).
This post hopefully gets your interest piqued with MongoDB and how you can work with it from within your favorite language, PHP. Don’t miss the opportunity to see the most excellent tutorial at php.net.
This article demonstrates a few simple concepts behind MongoDB and how they relate to PHP. In the next series we will talk about more advanced query concepts, different datatypes supported by MongoDB’s BSON, and more.
MongoDB has some native data processing tools, such as the built-in Javascript-oriented MapReduce framework, and a new Aggregation Framework in MongoDB v2.2. That said, there will always be a need to decouple persistance and computational layers when working with Big Data.
Enter MongoDB+Hadoop: an adapter that allows Apache’s Hadoop platform to integrate with MongoDB.

Using this adapter, it is possible to use MongoDB as a real-time datastore for your application while shifting large aggregation, batch processing, and ETL workloads to a platform better suited for the task.

Well, the engineers at 10gen have taken it one step further with the introduction of the streaming assembly for Mongo-Hadoop.
What does all that mean?
The streaming assembly lets you write MapReduce jobs in languages like Python, Ruby, and JavaScript instead of Java, making it easy for developers that are familiar with MongoDB and popular dynamic programing languages to leverage the power of Hadoop.

It works like this:
Once a developer has Java installed and Hadoop ready to rock they download and build the adapter. With the adapter built, you compile the streaming assembly, load some data into Mongo, and get down to writing some MapReduce jobs.
The assembly streams data from MongoDB into Hadoop and back out again, running it through the mappers and reducers defined in a language you feel at home with. Cool right?
Ruby support was recently added and is particularly easy to get started with. Lets take a look at an example where we analyze twitter data.
Import some data into MongoDB from twitter:
Next, write a Mapper and save it in a file called mapper.rb:
Now, write a Reducer and save it in a file called reducer.rb:
To run it all, create a shell script that executes hadoop with the streaming assembly jar and tells it how to find the mapper and reducer files as well as where to retrieve and store the data:
Make them all executable by running chmod +x on the all the scripts and run twit.sh to have hadoop process the job.
We’ve had a big month with updates and improvements to our drivers. Here’s a summary:
Update: watch the video of Jeremy Zawodny and Chris Mooney’s talk on A Year of MongoDB at Craigslist at MongoSF ‘12
Last year, Craigslist moved their archive to MongoDB from MySQL. After the initial set up, we spoke with Jeremy Zawodny, software engineer at Craigslist and the author of High Performance MySQL (O’Reilly), and asked him some questions about their cluster. In advance of their talk at MongoSF tomorrow, we caught up with Jeremy to get the scoop on what’s happening at Craigslist one year later.
Last time we spoke you were building a MongoDB store for 5 Billion Documents. What do your numbers look like now?
We’re currently approaching the 3 billion mark. The 5 billion number was our target capacity when building the system. Back then we had about 2.5 billion documents that we migrated into MongoDB, and we’ve continued to add documents ever since then.
Update: Watch the video of Greg Brockman’s talk on MongoDB for High Availability at MongoSF ‘12
Stripe offers a simple platform for developers to accept online payments. They are a long-time user of MongoDB and have built a powerful and flexible system for enabling transactions on the web. In advance of their talk at MongoSF on MongoDB for high availability, Stripe’s engineer, Greg Brockman spoke with us about what’s going on with MongoDB at Stripe.
We’re revamping MongoDB’s documentation. The new design in the MongoDB Manual has an improved reference section and an index for simplified search. It will also eventually support multiple MongoDB versions at the same time.
This project is a work in progress, and things are changing quickly. Our goal is to consolidate, sharpen, organize, and continue to improve the documentation in support of MongoDB. For now, the new docs will live alongside the original MongoDB Wiki. But over the next few months, we’ll be transitioning everything to the new manual.
In the spirit of open source, the docs are housed on Github. Feedback is welcome! Feel free to fork the repository and issue pull requests. You can also open tickets in JIRA, and we’ll promptly address any suggestions.
Variety is a lightweight tool which gives a feel for an application’s schema, as well as any schema outliers. It is particularly useful for
• quickly learning how data is structured, if inheriting a codebase with a production data dump
• finding all rare keys in a given collection
We’ll make a collection, within the MongoDB shell:
db.users.insert({name: "Tom", bio: "A nice guy.", pets: ["monkey", "fish"], someWeirdLegacyKey: "I like Ike!"});
db.users.insert({name: "Dick", bio: "I swordfight."});
db.users.insert({name: "Harry", pets: "egret"});
db.users.insert({name: "Geneviève", bio: "Ça va?"});
With their strong roots in JavaScript, Node.js and MongoDB have always been a natural fit, and the Node.js community has embraced MongoDB with a number of open source projects. To support the community’s efforts, 10gen is happy to announce that the MongoDB Node.js driver will join the existing set of 12 officially supported drivers for MongoDB.
The Node.js driver was born out of necessity. Christian Kvalheim started using Node.js in early 2010. He had heard good things about MongoDB but was disappointed to discover that no native driver had yet been developed. So, he got to work. Over the past two years, Christian has done amazing work in his driver, and it has matured through the contributions of a large community and the rigors of production. For some time now, the driver has been on par with 10gen’s officially supported MongoDB drivers. So we were naturally thrilled to welcome Christian full time at 10gen to continue his work on the Node.js driver.
Groovy and Grails’ speed and simplicity are a perfect match to the flexibility and power of MongoDB. Dozens of plugins and libraries connect these two together, making it a breeze to get Grooving with MongoDB.
For the purpose of this post, let’s pretend we’re writing a hospital application that uses the following domain class.
class Doctor {
String first
String last
String degree
String specialty
}
There are a few grails plugins that help communicate with MongoDB, but one of the easiest to use is the one created by Graeme Rocher himself (Grails project lead). The MongoDB GORM plugin allows you to persist all your domain classes in MongoDB. To use it, first remove any unneeded persistance-related plugins after you’ve executed the ‘grails create-app’ command, and install the MongoDB GORM plugin.
Available in 2.1 development release. Will be stable for production in the 2.2 release
Built by Chris Westin (@cwestin63)
MongoDB has built-in MapReduce functionality that can be used for complex analytics tasks. However, we’ve found that most of the time, users need the kind of group-by functionality that SQL implementations have. This can be implemented using map/reduce, but doing so is more work than it was in SQL. In version 2.1, MongoDB is introducing a new aggregation framework that will make it much easier to obtain the kind of results SQL group-by is used for, without having to write custom JavaScript.