Update: You can view a video of Jeremy Zawodny’s talk at MongoSF on 10gen.com.
MongoDB is now live at Craigslist, where it is being used to archive billions of records.
Craiglist has kept every post anyone has ever made in a large MySQL cluster. A few months ago, they began looking for alternatives: schema changes were taking forever (Craigslist’s schema has changed a couple times since 1995) and it wasn’t really relational information. They wanted to be able to add new machines without downtime (which sharding provides) and route around dead machines without clients failing (which replica sets provide), so MongoDB was a very strong candidate. After looking into a few of the most popular non-relational database systems, they decided to go with MongoDB.
Jeremy Zawodny is a software engineer at Craigslist and an author of High Performance MySQL (O’Reilly). He kindly agreed to answer some questions about their MongoDB cluster (editor’s comments in italics).
Any numbers you can give us?
We’re sizing the install for around 5 billion documents. That’s from the initial 2 billion document import we need to do plus room to grow for a few years to come. Average document size is right around 2KB. (Five billion 2KB documents is 10TB of data.) We’re getting our feet wet with MongoDB so this particular task isn’t high throughput or growing in unpredictable ways.
We can put data into MongoDB faster than we can get it out of MySQL during the migration.