Archiving - a good MongoDB use case?

Someone recently pointed out to me, rather insightfully, that MongoDB is a good fit for archival of relational data.  

I had not really considered this before, but it is a good point : flexible schemas are very helpful for archival.  How do we keep an archive of data, say, 10 years or more of data history, when over that time period the schema will undergo significant changes?  It is not so easy.

One approach would be to apply any schema changes from the online / operational database at the archival database too.  However, there are some issues.  First, the archival database may be huge, making schema migrations impractical.  But more importantly, these changes may not be what we want in an archive.  Imagine we decide to drop a column in the online db.  It may now be deprecated and unneeded.  However, a true and complete archive would still have that data.  Dropping the column in the archive is not what we want.

Document-oriented databases, with their flexible schemas, provide a nice solution.  We can have older documents which vary a bit from the newer ones in the archive.  The lack of homogeneity over time may mean that querying the archive is a little harder.  However, keeping the data is potentially much easier.

—dm

MongoDB 1.6 Released

MongoDB 1.6.0 is the fourth stable major release (even numbers are “stable” : 1.0, 1.2, 1.4, …) and is the culmination of the 1.5 development series.

Scale-out

The focus of the 1.6 release is scale-out.  Sharding is now production-ready.  The combination of sharding and replica sets allows one to build out horizontally scalable data storage clusters with no single points of failure.

A single instance of mongod can be upgraded to a distributed cluster with zero downtime when the need arises.

A big thanks to all the 1.5.x beta testers of sharding (including foursquare and bit.ly who have been using sharding in production for a while now).

Read More

Node.js and MongoDB

Visit the more recent post, Getting Started with VMware Cloud Foundry, MongoDB, and Node.js. Listen to the recorded Node.js Panel Discussion webinar.

Node.js is turning out to be a framework of choice for building real-time applications of all kinds, from analytics systems to chat servers to location-based tracking services. If you’re still new to Node, check out Simon Willison’s excellent introductory post. If you’re already using Node, you probably need a database, and you just might have considered using MongoDB.

The rationale is certainly there. Working with Node’s JavaScript means that MongoDB documents get their most natural representation — as JSON — right in the application layer. There’s also significant continuity between your application and the MongoDB shell, since the shell is essentially a JavaScript interpreter, so you don’t have to change languages when moving from application to database.

Read More

Blog Contest Winners!

We’re pleased to announce the winner’s of the MongoDB blogging contest!

Grand Prize

Runners Up

The winners should contact meghan@10gen.com to claim their prizes.

You check out all the awesome entries at mongodb.slinkset.com

Thanks to everyone who submitted!

Highlights from MongoNYC

On May 21, 10gen organized the second conference dedicated to MongoDB. Like MongoSF, MongoNYC included a great line-up of speakers. One of the more popular talks was Kyle Banker’s Schema Design session, which was so crowded that many attendees sat on the floor! Both the video and slides from the talk are now available. 

Read More

Holy Large Hadron Collider, Batman!

Valentin Kuznetsov just presented a paper at the International Conference on Computational Science on CERN’s use of MongoDB for Large Hadron Collider data. The paper, The CMS Data Aggregation System, is available as a PDF at ScienceDirect.

A summary

“CMS” stands for Compact Muon Solenoid, a general-purpose particle physics detector built on the Large Hadron Collider. The CMS project posted a few comics which provide a nice, simple (if somewhat cheesy) explanation of what the CMS/LHC does.

The LHC generates massive amounts of data of all different varieties, which is distributed across a worldwide grid. It sends status messages to some of the computers, job monitoring info to other computers, bookkeeping info still elsewhere, and so on.

This means that each location has specialized queries it can do on the data it has, but up until now it’s been very difficult to query across the whole grid. Enter the Data Aggregation System, designed to allow anything to be queried across all of the machines.

How it works

Read More

Write a blog post on MongoDB for a chance to win a ticket to OSCON 2010!

10gen has a ticket to OSCON that we’d like to give to a MongoDB user.

How to Enter

  • Write a blog post.  It has to be about MongoDB, but within that it can be anything: a how-to, an experience you had, a review, a rant, a rave, a technical piece, a humorous piece… whatever you want.
  • Post your blog at mongodb.slinkset.com and include your contact information in the description.  You must submit by June 1st.  Your post must be publicly accessible (not behind a pay wall or members-only site).
  • We’ll announce the winners June 7th.

Prizes

Grand prize

  • 1 ticket to OSCON
  • MongoDB swag package: shirt, coffee mug, and stickers
  • Option to do a guest post on blog.mongodb.org

Read More

MongoSF Slides & Video; Discounts on upcoming MongoDB conferences

MongoSF, the first full-day conference dedicated to MongoDB, featured 35+ sessions and even produced a few surprises along the way. Over 200 people attended the April 30 conference. Slides and video from many sessions now available on the 10gen website.

The Sharding presentation was one of the major highlights of the event. Eliot Horowitz, the CTO of 10gen, demoed a 25-node cluster on EC2. Check out the video of the session.

MongoHQ made a major announcement during their session, launching their add-on to all Heroku users as a public beta. For more details, check out the Heroku blog.

The conference was held at Bently Reserve, where we hope many future tech conferences will take place. Not only is the venue beautiful and historic, but it is fully equipped for conferences. The wireless actually worked!


The main banking hall at Bently Reserve at 8:30 on the morning of MongoSF - it filled up shortly thereafter!

More MongoDB conferences coming soon! Use the discount code “blog” when registering.

MongoNYC - Friday, May 21
MongoUK - Friday, June 18 
MongoFR - Monday, June 21 

MongoDB Conferences in London and Paris in June

MongoDB conferences are coming to Europe! MongoUK is on Friday, June 18 at Skills Matter and MongoFR is on Monday, June 21 at La Cantine. Each conference will feature sessions from the 10gen team on schema design, replication, sharding, indexing, and map/reduce. In addition, attendees will learn about MongoDB in production through presentations by companies like Boxed Ice, Silentale, OCW Search, and Novelys.

10gen is organizing MongoFR in conjunction with the NoSQL Paris User Group, with the help of NoSQL enthusiast Tim Anglade. (Pour les francophones : nous devons vous avertir que certaines sessions de MongoFR seront en anglais et d’autres en français. La journée sera également suivie d’une rencontre spéciale de l’UG — entrée gratuite à partir de 19h à La Cantine. Pour plus d’informations en français, contactez timanglade@gmail.com par email ou @timanglade sur Twitter.)

We look forward to seeing you there!

MongoDB Q1 Download Numbers

The MongoDB team is very excited about how the developer community is building around MongoDB, and we wanted to share some numbers.

These are download numbers for the core server for January through March.  It is exactly the number of downloads of the core database from downloads.mongodb.org minus all bots (all known plus anything with bot in the user-agent) and all other crawlers we determine.  We use these numbers internally, so we do try and keep them accurate.

January    15647
February  23226
March      37144

We are very excited about these numbers — please spread the word and help us continue growth of MongoDB!

-Eliot

On Distributed Consistency - Part 6 - Consistency Chart

See also:

The following diagram (click for large version) shows the various consistency models that have been discussed in this blog post series.  Stronger consistency modes generally meet the requirements of weaker modes, and are thus shown as subsets in this Venn-like diagram. 

Keep in mind that for many products, consistency is tunable: a product doesn’t necessarily belong to a particular rectangle, but a given operation certainly does.

  • Eventual Consistency - eventual consistency as defined by Amazon in the Dynamo paper.
  • Monotonic read consistency - a stricter form of eventual consistency.
  • Read-your-own-writes consistency - a stricter form of eventual consistency.
  • MRC + RYOW - a system with both monotonic read plus read-your-own-writes properties.  A master-master replication system, where a given client always interacts with a single master, would have these properties.
  • Immediate Consistency - a system which is immediately consistent but which does not support atomic operations.  Strict quorum systems, where R+W>N, meet this criteria (and theoretically could do more, depending on the design).
  • Strong Consistency - a system which supports read/write atomic operations on single data entities.  This is the default mode for MongoDB.
  • Full Transactions - Oracle!

On Distributed Consistency - Part 5 - Many Writer Eventual Consistency

See also:

In part 2 we primarily discussed “single writer” eventual consistency.  Here we will discuss many-writer, and define that term more precisely.

By many-writer, we mean a system where different data servers can receive writes concurrently (and asynchronously).  Examples of many-writer eventually consistent systems include

  • Amazon Dynamo
  • CouchDB master-master replication

With multi-writer eventual consistency, we need to address the phenomenon of conflicting writes. Writes to two servers at the same time may be updates for the same object.  We must resolve the conflict in a way that is acceptable for the use case in question.  Some ways to do this are:

  • last write wins
  • programmatic merge
  • commutative operations

Last Write Wins

Last write wins is a popular default in many systems. If we receive an operation that is older, we simply ignore it.  In a distributed system the definition of “last” is hard as clocks can’t be perfectly synchronized.  Thus many systems use vector clocks.

Inserts

Surprisingly, a traditional insert operation is tricky with many writers. Consider these operations performed at about the same time at different servers:

op1: insert( { _id : 'joe', age : 30 } )
op2: insert( { _id : 'joe', age : 33 } )

If we naively apply these two operations in any order, we get an inconsistent result.  insert typically means:

if( !already_exists(x._id) ) then set( x );

However, with eventual consistency we do not have real-time global state.  Checking already_exists() is thus hard.

The best solution is to not support insert, but rather set() - i.e. “set a new value”.  Sometimes this is called an upsert.  Then, if we have last-write-wins semantics, everything is fine.

Deletes

Deletes require special handling in cases of object rebirth.  Consider this sequence:

op1: set( { _id : 'joe', age : 40 } }
op2: delete( { _id : 'joe' } )
op3: set( { _id : 'joe', age : 33 } )

If op2 and op3 are reversed in execution order, we would have a problem.  Thus we need to remember the delete for a while, and apply last-operation-wins semantics.  Some products call the remembrance of the delete a tombstone.

Updates

Updates have a similar issue as insert, so for updates, we use the set() operation we described above instead.

Note that partial object updates can be tricky to replicate efficiently.  Consider a set() operation where we wish to update a single field:

  update users set age=40 where _id=’joe’

This is no problem with eventual consistency if we replicate a full copy of the object.  However, what if the user object was 1MB in size?  It would be really nice to just send the new age field and the _id, rather than the whole object.  However, this is difficult.  Consider:

op1: update users set age=40 where _id='joe'
op2: update users set state='ca' where _id='joe'

Wen can’t simply replicate the partial update and use last-write-wins; the database will need more sophistication to handle this efficiently.

Programmatic Merge

Last-write-wins is great, but is not always sufficient.  Having the client application resolve the conflict via a merge is a fine alternative.  Let’s consider an example mentioned in the Amazon Dynamo paper: manipulations of shopping carts.  With eventual consistency it would not be safe to do something like:

update cart set this[our_sku].qty=1 where _id='joe'

If there are multiple manipulations of the cart, some may get lost using last-write-wins.  Instead, the Dynamo paper talks of storing the operations in the cart object, rather than the actual data state.  We could store something like:

update cart append { time : now(), op : 'addToCart', sku : our_sku, qty : 1 }
  where _id='joe'

When a conflict occurs, cart objects can be merged.  We do not lose any operations.  When it is time to check out, we replay all the operations, which might include quantity adjustments and removes from cart.  After replay we have the final cart state.

Note the above example uses a timestamp field — in a real system a vector clock might be used to order the operations in the cart.

It’s interesting to note that not only have we avoided conflicts, we are also able to do operations where atomicity would be required.

Commutative Operations

If all operations are commutative (more precisely, foldable), we will never have any conflicts.  Operations can simply be applied in any order, and the result is the same.  For example:

// x starts as { }
x.increment('a', 1);
x.increment('a', 3);
x.addToSet('b', 'foo');
x.addToSet('b', 'bar');
result: { a : 4, b : {bar,foo} }

// x starts as { }
x.addToSet('b', 'bar');
x.increment('a', 3);
x.increment('a', 1);
x.addToSet('b', 'foo');
result: { a : 4, b : {bar,foo} }

Note however that composition of addToSet and increment would not be foldable; thus, we have to use only one or the other for a particular field of the object.

On Distributed Consistency - Part 4 - Multi Data Center

See also:

Eventual consistency makes multi-data center data storage easier.  There are reasons eventual consistency is helpful for multi-data center that are unrelated to availability and CAP.  And as mentioned in Part 3, some common types of network partitions, such as loss of an entire data center, are actually trivial network partitions and may not even effect availability anyway.

Here are a few architectures for multi-data center data storage:

  • DR
  • Single Region
  • Local reads, remote writes
  • Intelligent Homing
  • Eventual consistency

DR

By DR we mean a traditional disaster recovery / business continuity architecture.  It’s pretty simple: we serve everything from one data center, with replication to a secondary facility that is offline.  In a failure we cut over.

Availability can be quite high in this model as on any issue with the first data center, including internal network partitions, we cut over, and with the whole first data center disabled, the partition is trivial.

This model works fine with strong consistency.

Multi Data Center, Single Region

This option is analogous to using multiple data centers within a single region.  Amazon and DoubleClick have used this scheme in the past.  We have multiple data centers, physically separated, but all within one region (such as the Northwest).  The latency between data centers is then reasonable: if we stay within a 150 mile radius, we can have round trip times of around 5ms.  We might have a fiber ring among say, 3 or 4 data centers.  As the latency is reasonable, for many problems, a WAN operation here is fine.  With a ring topology, a non-trivial network partition is unlikely.

Single region is useful both for strong consistent and eventually consistent architectures.  With a Dynamo style product, when N=W or N=R, this is a good option, as otherwise when using multiple data centers we will have a long wait time to confirm remote writes.

Local Reads, Remote Writes

For read-heavy use cases, this is a good option.  Here we read eventually consistent data (easy with most database products including RDBMS systems) but do all writes back to the master facility over the WAN.  A dynamo style system in multiple data centers with a very high W value and low R value can be thought of this way also.

This pattern would work great for traditional content management: publishing is infrequent and reading is very frequent.

Using a Content Delivery Network (CDN), with a centralized origin web site serving dynamic content, is another example.

Intelligent Homing

We discussed “Intelligent Homing” a bit in Part 3.  The idea is to store the master copy of a given data entity near its user.

This model works quite well if data correlates with the user, such as the user’s profile, inbox, etc.

We have fast locally confirmed writes.  If a data center goes completely down, we could still fail over master status to somewhere else which has a replica.

Eventual consistency

Many-writer eventual consistency gives us two benefits with multiple data centers:

  • higher availability in the face of network outages;
  • fast locally confirmed writes

In the diagram below, a client of a dynamo-style system writes the data to four servers (N=4).  However, it only awaits confirmation of the writes from two servers in its local data center, to keep write confirmation latency low.

Note however that if R+W > N, we can’t have both fast local reads and writes at the same time if all the data centers are equal peers.

Combinations

Combinations often make sense.  For example, it’s common to mix DR and Read Local Write Remote.

On Distributed Consistency - Part 3 - Network Partitions

See also:

It’s fascinating that the formal theorem statement for CAP, in the first proof (that I know of), doesn’t use the word partition!

Theorem 1 It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties:
• Availability
• Atomic consistency in all fair executions (including those in which messages are lost).

That said, let’s talk about partitions, as “messages lost…in the asynchronous network model” is directly analogous.

Let’s look at an example:

In our diagram above, the network is partitioned.  The left and right halves (perhaps these correspond say to two continents) cannot communicate at all.  Four clients and four data server nodes are shown in the diagram.  So what are our options?

  1. Deny all writes.  If we deny all writes when the network is partitioned, we can still read fully consistent data on both sides.  So this is one option.  We give up write availability, and keep consistency.
  2. Allow writes on one side.  Via some sort of consensus mechanism, we could let one side of the partition “win” and have a master (as shown by the “M” in the diagram).  In this case, reads and writes could occur on that side.  On the other non-master partitions, we could either (a) be strict and allow no operations, or (b) allow eventually consistent reads, but no writes.  So in this situation we have full consistency in one partition, and partial operation in all others.
  3. Allow reads and writes in all partitions.  Here, we keep availability, but we must sacrifice strong consistency.  One partition will not see the operations and state from the other until the network is restored.  Once restored, we will need to a method to merge operations that occurred while disconnected.

A mitigation technique also comes to mind.  Suppose a particular client C has a much higher probability of needing an entity X than other clients.  If we store the master copy of X on a server close to C, we increase the probability that C can read and write X in option (2) above.  Let’s call this “intelligent homing”.  A real world example of this would be to “store master copies of data for east coast users on servers on the east coast”.  Intelligent homing doesn’t solve our problems, but would likely significantly decrease their frequency — that’s good, we just want more nines anyway.

Hopefully the above is a good informal “proof” of CAP.  It really is pretty simple.

Trivial Network Partitions

Many common network partitions are what we might term trivial.  Let’s consider from the perspective of option (2) above. We define a trivial network partition is one such that on all non-master partitions, there are either

  • no live clients at all, or
  • no servers at all

For example, if we have many data centers and our clients are Internet web browsers, and one of our data centers goes completely dark (and we have more left), that is a trivial network partition (we assume here that we can fail over master status in such a situation).  Likewise, losing a single rack in its entirety is often a trivial network partition.

In these situations, we can still be consistent and available.  (Well, for the partitioned client, we are unavailable, but that is of course a certainty if it cannot reach any servers anywhere.)

On Distributed Consistency - Part 2 - Some Eventual Consistency Forms

See Also

In Part 1 we discussed C-class and A-class behaviors.  For A-class, we need to weaken consistency constraints.  This does not mean the system need be completely inconsistent, but it does mean we will need to relax the consistency model to some extent.

Amazon popularized the concept of “Eventual Consistency”.  Their definition is: 

the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value.

This is not new, but it is great to have the concept formalized/popularized.  A few examples of eventually consistent systems:

  1. DNS (mentioned in the above paper)
  2. Asynchronous master/slave replication on an RDBMS (also on MongoDB)
  3. memcached in front of mysql, caching reads

Many (not all) traditional examples that come to mind have eventually consistent reads, but a single writer (by “single writer”, we mean a data server, not the clients).  Things get more interesting — and complex — with when there are many writers.  Amazon Dynamo is an example of a “many writer eventually consistent” system.  All of the above are perhaps “single writer eventually consistent”.

One other traditional technology worth noting is message queues.  It has properties reminiscent of eventual consistency.

Forms of Consistency

Let’s look at a particular example.  Consider a system using MongoDB in the following configuration:

“master”, “slave”, and “slave” could be mongod instances for example — or other databases with asynchronous replication.  Clients randomly read from any slave for a given query, and always write to the master.  Two slaves and two clients are shown, but let’s assume each of those scale out.

This sort of system we term “single writer eventual consistency”.  So what are its properties?  (1) A client could read stale data. (2) The client could see out-of-order write operations.

Let’s suppose we are storing some entity x in the datastore.  Let’s assume entities have an initial value of zero.  There are a series of writes to x by clients:

  W(x=3), W(x=7), W(x=5)

Because the system is eventually consistent, if writes to x stop at some point, we know we will eventually read 5 — that is, R(x==5).  However in the short term a client might  for example see:

  R(x==7), R(x==0), R(x==5), R(x==3)

(Note more nodes than 2 slaves are needed for this example behavior.)

So this is our weakest form of consistency - eventually consistent with out of order reads in the short term. 

We can make this stronger.  Consider the SourceForge mongodb configuration (larger diagram here).  This configuration is eventually consistent, but we will not see the result of writes out of order.  It provides monotonic read consistency.

One possible eventual consistency property is read-your-own-writes consistency, meaning a process is guaranteed to see the writes it has made when it does reads.  This is a very useful property that makes programming easier. Note that neither of the above examples provide read-your-own-writes consistency.  Also worth considering with this model is the definition of “your”.  On a web application, that might be the user.  If the system’s load balancer sends requests to different app servers, having read-your-own-write consistency for a single app server might not solve the real world consistency need.

EC Use Case Checklist

Thus when using eventual consistency, it is good for the architect to ask:

  • can my use case tolerate stale reads?
  • can it tolerate reading values out of order?  if not, is my configuration monotonic read consistent?
  • can it tolerate not reading my own writes?  if not, is my configuration read-your-own-write consistent?