March 8, 2010

State of MongoDB March, 2010

Every once in a while, I think its important for us (the core MongoDB team) to give a broad picture of where we think MongoDB is and where we’re hoping to take it.  This is useful both as a gut check for us, to give the community some insight into what we’re thinking, and to make sure we’re all on the same page.

MongoDB has made great strides in the last year.  The first public release was just over a year ago (2/11/2009) and since then we’ve seen tremendous support and interest from the developer community.  We’ve made a lot of great progress on the core database, drivers and tools.  The community has contributed a large and growing number of great drivers and tools, as well as invaluable testing and feedback.  We’ve also seen a really great amount of production MongoDB installations coming online in the last year.

MongoDB got 2 stable releases (1.0 and 1.2) and there is a third coming (1.4) which has many things we’re very proud of:  better concurrency, geospatial indexing, “usability” enhancements and speed enhancements to name a few.  We’re planning on a stable release every 3 months as a way to balance speed, carefulness, and practicality.

So, where are we in our grand view of the world: about half done.  MongoDB is in a great place today, but we have a long way to go.  MongoDB was never designed nor intended to be a niche database for a small subset of problems, but a new type of database, that solves lots of real world problems for a large subset of the developer community.  We’re getting there, and we’re suitable for a lot of problems, but there are lots of things we still need to do.  So if you’re looking at MongoDB and saying “its not good enough for this” or “it would be great if only it had X,” its very likely we agree with you.

Some major things we’re thinking about for the next 6-12 months

  • better replication: real time, replica sets, more options for data durability
  • production ready sharding
  • more features for working with embedded documents
  • flushing out more atomic update operators
  • single server durability
  • full text search

Talking about embedded objects as one example, we added support for this very early on because we think its often a better way to model your data in the database.  Being able to store addresses for a user, or tags for a blog post inside of the main document is great for many reasons, particularly speed and manageability.  This is a very different paradigm than relational, and we’ve had to add a lot of features to make it work nicely: indexed embedded fields, in-place incremental updates,  etc…  We still have many features we want to add that people have asked for to make it even nicer to program with embedded documents.

I’ve recently changed the way I describe MongoDB when I first talk to people that I think sheds some light on how we’re thinking.  MongoDB wasn’t designed in a lab.  We built MongoDB from our own experiences building large scale, high availability, robust systems.  We didn’t start from scratch, we really tried to figure out what was broken, and tackle that.  So the way I think about MongoDB is that if you take MySql, and change the data model from relational to document based, you get a lot of great features: embedded docs for speed, manageability, agile development with schema-less databases, easier horizontal scalability because joins aren’t as important.  There are lots of things that work great in relational databases: indexes, dynamic queries and updates to name a few, and we haven’t changed much there.  For example, the way you design your indexes in MongoDB should be exactly the way you do it in MySql or Oracle, you just have the option of indexing an embedded field.

So in conclusion, I hope you find MongoDB useful and productive now, we hope to make great strides in the next year, and are grateful for the communities support, advice, debugging and interest.

-Eliot and the core MongoDB Team

Comments (View)
March 5, 2010

You need to learn MongoDB

You need to learn MongoDB. We’re offering an informative, hands-on training session to help you do just that. From document-based data modeling to high-performance optimizations, we’ll answer your questions and prepare you for the move to Mongo.
Among the topics we’ll cover:

  • How to use the language drivers, and how they work
  • How to make the most of atomic updates
  • Data-modeling with documents
  • Administration with the JavaScript shell
  • Scaling out using master/slave configurations and auto-sharding
  • Backups and recovery

The first sessions will be held in San Francisco and New York City. Discounts are available for startups. If you need to learn MongoDB, a session like this will take you a long way. Sign up now! If you have any questions, send us an email at info@10gen.com.

Comments (View)
March 3, 2010

2d geospatial indexing

We have now added geospatial indexing to the product.  Our approach has been to make something simple but fast: 2d only, and effective for common real world use cases such as lat/long location searches.

Would love to get some feedback on features people would like to see, how its working, etc…

Comments (View)
March 1, 2010

MongoDB March Events and NYC Office Hours

Upcoming MongoDB Events

MongoDB will be featured at several events, conferences, and meetups in March, including a webinar on MongoDB internals, Mountain West Ruby Conference in Salt Lake City, NoSQL Live Boston, QCon London, and Cloud Connect in Santa Clara. There’s a MongoDB training session in San Francisco and there will even be a MongoDB Day in Austin! Check the Events page for a complete listing.

In addition, 10gen will be hosting MongoDB Office Hours on Wednesdays in New York City, starting on March 17. Stop by to meet the MongoDB team, ask questions, or have a beer.

MongoDB Office Hours

Wednesdays (Starting March 17)
4pm - 6pm
17 West 18th Street - 8th Floor
Between 5th & 6th Avenues
New York, NY

February Highlights

The best of the MongoDB presentations in February:

Comments (View)
February 23, 2010

Announcing Speakers for NoSQL Live

It’s not too late to register for NoSQL Live in Boston on March 11th. We have an exciting lineup of speakers and panelists who will discuss real use cases for NoSQL in production systems.

Session topics at NoSQL Live will include scaling with NoSQL, NoSQL in the cloud, schema design with document-oriented databases, the evolution of graph data structure from research to production, the enterprise adoption of NoSQL, and toward web standards for NoSQL. In addition to speakers and panels, the conference will also include lightning talks and a NoSQL Lab for practical exploration of working with specific NoSQL products.

Here’s the confirmed list of speakers, panelists, and moderators:

— Dwight Merriman, CEO, 10gen

— Eliot Horowitz, CTO, 10gen

— Adam Kocoloski, CTO, Cloudant

— Alan Hoffman, CEO, Cloudant

— Durran Jordan, Senior Developer, Hashrocket

— Les Hill, Software Adventurer, Hashrocket

— Marko Rodriguez, Graph Systems Architect, AT&T Interactive

— Ryan King, Technical Lead, Storage Team, Twitter

— Alex Feinberg, Senior Software Engineer, LinkedIn

— Jonathan Ellis, Systems Architect, The Rackspace Cloud

— Sandro Hawke, Software Developer and Systems Architect, W3C

— Benjamin Day, Microsoft MVP

— Ryan Rawson, Systems Architect, StumbleUpon

— Bryan Fink, Senior Software Developer, Basho Technologies

— Rusty Klophaus, Senior Software Engineer, Basho Technologies

— Adam Wiggins, Co-founder, Heroku

— Mark Atwood, Director of Community Development, Gear6

— Sourav Mazumder, Principal Technology Architect, Infosys Technologies Limited

— Tim Anglade, CTO, GemKitty

— Bradford Stephens, Founder, Drawn to Scale

— Doug Judd, CEO, Hypertable

— Daniel Rinehart, Chief Software Architect, Allurent

— Emil Eifrém, CEO, Neo Technology

— Paul Davis, Research Assistant, New England Biolabs

— Borislav Iordanov, Software Architecture Consultant, Kobrix

— Jim Wilson, Lead Software Engineer, Vistaprint

— Ryan Angilly, Senior Developer, Punchbowl Software

Sign ups for the $40 early bird registration end today. If you’re interested in presenting or sponsoring, contact Meghan Gill, event coordinator, at meghan@10gen.com. Check out http://nosqlboston.eventbrite.com for more information and to register.

Comments (View)
February 22, 2010

MongoDB: How it Works Webinar

In October, 10gen hosted a webinar where we heard from 10gen CEO Dwight Merriman and The Business Insider Lead Developer Ian White about the basics of developing applications with MongoDB and about how MongoDB is used in production at TBI.

We’d like to follow up with a webinar focused on how MongoDB works “under the hood.” Please join us on March 8 at 12:30 PM Eastern Time. 10gen software engineer Mike Dirolf will lead the session. Registration is free but limited to 125 attendees.

Register now.

Comments (View)
February 18, 2010

MongoDB Survey Results

A couple weeks ago we asked people on Twitter, IRC, and the mailing list to fill out a survey on how they were using MongoDB.  About 120 people responded (thanks guys!).

Here is what we gleaned:

Everyone’s a noob

How long people have been using Mongo:


Most people haven’t been using Mongo for very long.  Exactly 0% said they’d been using Mongo for a year or more (which makes sense, given our first official release was ~12 months ago).

Interesting things being stored in Mongo

Lots of people are storing log data, analytics, user info… the usual.  Some less usual stuff:

  • Game title development info
  • Patents
  • Crime reports and warrants

And quite a few people said: ”Everything.”

So, how big is it?

One person said they were testing up to 40 billion documents, but I wasn’t clear on if they had actually put in 40 billion or were going to.  So, we’ll ignore the outlier, but we can pretty safely say people are storing ~70 million documents.

On a scale of 1-10, would you recommend Mongo to a friend?

Happily, the average was 9.64!  If you are happy with MongoDB, please consider tweeting, writing a blog post, or giving a talk at a conference or meetup… the biggest obstacle we’re facing right now is letting people know we exist!

If you were below average (haha), I’d encourage you to hit the list or IRC.  We’d love to help out (or at least find out why you’re unhappy).

And, finally, most importantly, religious wars:

Kyle has the most users

Ruby wins handily with over 40% of users.

“Other” contains mainly C#, Perl, and Groovy users.

OS X: the universal dev environment

OS people are using for development:

OS people are using for production:

Go Linux go!

If you feel left out, feel free to fill out the survey now.  Thanks to everyone for you input!

Comments (View)
February 10, 2010

What about Durability?

We get lots of questions about why MongoDB doesn’t have full single server durability, and there are many people that think this is a major problem.  We wanted to shed some light on why we haven’t done single server durability, what our suggestions are, and our future plans.

To start, there are some very practical reasons why we think single server durability is overvalued.  First, there are many scenarios in which that server loses all its data no matter what.  If there is water damage, fire, some hardware problems, etc… no matter how durable the software is, data can be lost.  Yes - there are ways to mitigate some of these, but those add another layer of complexity, that has to be tested, proofed, and adds more variables which can fail.

In the real world, traditional durability often isn’t even done correctly.  If you are using a DBMS that uses a transaction log for durability, you either have to turn off hardware buffering or have a battery backed RAID controller.  Without hardware buffering, transaction logs are very slow.  Battery backed raid controllers will work well, but you have to really have one.  With the move towards the cloud and outsourced hosting, custom hardware is not always an option.

Requirements for web applications are also changing.  99.99% uptime is no longer the goal, people want 100% uptime as much as possible.  If you have durability through a transaction log, then you have to replay it to come back up.  If you have a master and slave in the same data center and you lose power, both will have to recover which could take 5-30 minutes.[1]

Another feature of new non-relational databases is horizontal scalability.  While MongoDB’s auto-sharding is still in Alpha, we still feel this is a core component. With horizontal scalability comes many servers.  If you have a 100 node cluster, worrying about every machine is a liability.  If a machine goes down in the middle of the night, you want the system to recover as fast as possible, without human intervention.  Given that, and that a high percentage of failures are hardware, the best thing is to just mark that server as inactive, and ignore it until someone can look at it easily (could be hours or days).

Given all this, we’re not saying durability isn’t important, we just think that single server durability isn’t the best way to get true durability.  We think the right path to durability is replication (local and remote) and snapshotting.  That’s why we’ve spent so much time making replication fast and easy and work over wide area networks in MongoDB.

We are currently planning many more enhancements to replication to make it better.

  • psuedo real-time with optional blocking for writes until on multiple servers
  • replica sets instead of replica pairs
  • easier to create new slaves with large data sets

Now - there are definitely some cases where single server durability is the best option.  It is on our road map, its just not on the short list right now.  We know what we want to do and how we want to do it, it’s just a matter of code :)

[1] Some databases such as CouchDB use an append only model that allows for instantaneous restarts. However, this type of design usually requires compaction routines to be run periodically, so can be costly in high update scenarios.

Comments (View)
February 8, 2010

Practical MongoDB Training with Kyle Banker

10gen is offering day-long MongoDB training sessions in San Francisco and New York City! Kyle Banker, a software engineer at 10gen, will be leading both sessions. Kyle has presented MongoDB in numerous forums, most recently at Chicago Ruby, and is excited to share his expertise. Kyle is preparing several interesting and challenging projects so that attendees can really get their hands dirty. Whether you are brand new to MongoDB or you’ve played with it already, you will leave this course with a comprehensive understanding of how to build applications with MongoDB.

More details available on the 10gen website.

Comments (View)
February 4, 2010

Hosting Center Update

Update on supported hosting options:

  • Dreamhost is now offering instant configuration and deployment of MongoDB to DreamHost PS customers
  • Webfaction and Linode have recently published instructions for installing MongoDB on their respective systems

Check out the Hosting Center for more details. If you’re interested in support from other hosting providers, please let us know which ones you’d like to see in the comments.

Comments (View)
January 26, 2010

Announcing NoSQL Live from Boston: March 11, 2010

Clear your calendars for NoSQL Live, hosted by 10gen in Boston on March 11th. It’s not your ordinary NoSQL meetup. Rather than introducing attendees to basic functions on the tools out there, NoSQL Live will bring together people using MongoDB and a number of different non-relational databases to discuss real use cases in production systems. The full-day conference will feature panel discussions, lightening talks, networking sessions, and a NoSQL Lab where attendees can get a practical view of programming with NoSQL databases. Cloudant, which provides a hosted database and data analytics platform based on Apache CouchDB, has been confirmed as a co-sponsor.

More information on speakers, panels, and schedules to come. For those interested in presenting or sponsoring, contact Meghan Gill, event coordinator, at meghan@10gen.com. Visit http://nosqlboston.eventbrite.com/ to register.

Comments (View)
December 30, 2009

“Partial Object Updates” will be an Important NoSQL Feature

It’s nice that in SQL we can do things like

UPDATE PERSONS SET X = X + 1

We term this a “partial object update”: we updated the value of X without sending a full row update to the server.

Seems like a very simple thing to be discussing, yet some nosql solutions do not support this (others do).

In these new datastores, the average stored object size (whether it be a document, a key/value blob, or a row) tends to be larger than the traditional database row.  The data is not fully normalized, so we are packing more data into a single storage object than before.

This means the cost of full updates is higher.  If we have a 100KB document and want to set a single value within it, passing the full 100KB in both directions over the network for the operation is expensive.

MongoDB supports partial updates in its update operation via a set of special $ operators: $inc, $set, $push, etc.  More of these operators will be added in the future.

There are further benefits to the technique too.  First, we get easy (single document) atomicity for these operations (consider $inc).  Second, replication is made cheaper: when a partial update occurs, MongoDB replicates the partial update rather than the full object changed.  This makes replication much less expensive and network intensive.

Comments (View)
December 10, 2009

NoSQL and the future of cloud databases

news.cnet.com One of the cloud-related trends that developers have been paying attention to is “NoSQL,” a set of operational-data technologies based on nonrelational technology. According to Dwight Merriman, CEO of 10gen (the commercial team behind the open-source MongoDB project), we’ll see NoSQL complement existing applications for the foreseeable future.

Comments (View)
November 18, 2009

Fast Updates with MongoDB (update-in-place)

One nice feature with MongoDB is that updates can happen “in place” — the database does not have to allocate and write a full new copy of the object.

This can be highly performant for frequent update use cases.  For example, incrementing a counter is a highly efficient operation.  We need not fetch the document from the server, we can simply send an increment operation over:

db.my_collection.update( { _id : ... }, { $inc : { y : 2 } } ); // increment y by 2

MongoDB disk writes are lazy.  If we receive 1,000 increments in one second for the object, it will only be written once.  Physical writes occur a couple of seconds after the operation.

One question is what happens when an object grows.  If the object fits in its previous allocation space, it will update in place.  If it does not, it will be moved to a new location in the datafile, and its index keys must be updated, which is slower.  Because of this, Mongo uses an adaptive algorithm to try to minimize moves on an update.  The database computes a padding factor for each collection based on how often items grow and move.  The more often the objects grow, the larger the padding factor will be; when less frequent, smaller.

See also:

http://www.mongodb.org/display/DOCS/Updating

http://blog.mongodb.org/post/171353301/using-mongodb-for-real-time-analytics

Comments (View)
November 12, 2009

Webinar recording posted

The recording of the webinar on MongoDB by Dwight Merriman (10gen) & Ian White (Business Insider) is available here: http://vivu.tv/portal/archive.jsp?flow=527-472-7945&id=1256920226675 

Comments (View)