Posts tagged:

MongoDB

How Buffer uses MongoDB to power its Growth Platform

Jun 9 • Posted 1 month ago

By Sunil Sadasivin, CTO at Buffer

Buffer, powered by experiments and metrics

At Buffer, every product decision we make is driven by quantitative metrics. We have always sought to be lean in our decision making, and one of the core tenants of being lean is launching experimental features early and measuring their impact.

Buffer is a social media tool to help you schedule and space out your posts on social media networks like Twitter, Facebook, Google+ and Linkedin. We started in late 2010 and thanks to a keen focus on analytical data, we have now grown to over 1.5 million users and 155k unique active users per month. We’re now responsible for sharing 3 million social media posts a week.

When I started at Buffer in September 2012 we were using a mixture of Google Analytics, Kissmetrics and an internal tool to track our app usage and analytics. We struggled to move fast and effectively measure product and feature usage with these disconnected tools. We didn’t have an easy way to generate powerful reports like cohort analysis charts or measure things like activation segmented by signup sources over time. Third party tracking services were great for us early on, but as we started to dig deeper into our app insights, we realized there was no way around it—we needed to build our own custom metrics and event tracking.

We took the plunge in April 2013 to build our own metrics framework using MongoDB. While we’ve had some bumps and growing pains setting this up, it’s been one of the best decisions we’ve made. We are now in control of all metrics and event tracking and are able to understand what’s going on with our app at a deeper level. Here’s how we use MongoDB to power our metrics framework.

Why we chose MongoDB

At the time we were evaluating datastores, we had no idea what our data would look like. When I started designing our schema, I quickly found that we needed something that would let us change the metrics we track over time and on the fly. Today, I’ll want to measure our signup funnel based on referrals, tomorrow I might want to measure some custom event and associated data that is specific to some future experiment. I needed to plan for the future, and give our developers the power to track any arbitrary data. MongoDB and its dynamic schema made the most sense for us. MongoDB’s super powerful aggregation framework also seemed perfect for creating the right views with our tracking data.

Our Metrics Framework Architecture

In our app, we’ve set up an AWS SQS queue and any data we want to track from the app goes immediately to this queue. We use SQS heavily in our app and have found it to be a great tool to manage messaging at high throughput levels. We have a simple python worker which picks off messages from this queue and writes them to our metrics database. The reason why we’ve done this instead of connecting and writing directly to our metrics MongoDB database is because we wanted our metrics set up to have absolutely zero impact on application performance. Much like Google Analytics offers no overhead to an application, our event tracking had to do the same. The MongoDB database that would store our events would be extremely write heavy since we would be tracking anything we could think of, including every API request, page visited, Buffer user/profile/post/email created etc. If, for whatever reason our metrics db goes down, or starts having write locking issues, our users shouldn’t be impacted. Using SQS as a middleman would allow tracking data to queue up if any of these issues occur. SQS gives us enough time to figure out what the issue is, fix, it and then process that backlog. We’ve had quite a few times in the past year where using Amazon’s robust SQS service has saved us from losing any data during maintenance or downtime that would occur when creating a robust high throughput metrics framework from scratch. We use MongoHQ to host our data. They’ve been super helpful with any challenges in scaling a db like ours. Since our setup is write heavy, we’ve initially set up a 400GB SSD replica set. As of today (May 16) we have 90 collections and are storing over 500 million documents.

We wrote simple client libraries for tracking data for every language that we use (PHP, Python, Java, NodeJS, Javascript, Objective-C). In addition to bufferapp.com, our API, mobile apps and internal tools all plug into this framework.

Tracking events

Our event tracking is super simple. When a developer creates a new event message, our python worker creates a generic event collection (if it doesn’t exist) and stores event data that’s defined by the developer. It will store the user or visitor id, and the date that the event occurred. It’ll also store the user_joined_at date which is useful for cohort analysis.

Here are some examples of event tracking our metrics platform lets us do.

Visitor page views in the app.

Like many other apps, we want to track every visitor that hits our page. There is a bunch of data that we want to store to understand the context around the event. We’d like to know the IP address, the URI they viewed, the user agent they’re using among other data.

Here’s what the tracking would look like in our app written in PHP:

Here’s the corresponding result in our MongoDB metrics db:

Logging User API calls

We track every API call our clients make to the Buffer API. Essentially what we’ve done here is create query-able logging for API requests. This has been way more effective than standard web server logs and has allowed us to dig deeper into API bugs, security issues and understanding the load on our API.

Experiment data

With this type of event tracking, our developers are able to track anything by writing a single line of code. This has been especially useful for measuring events specific to a feature experiment. This frictionless process helps keep us lean: we can measure feature usage as soon as a feature is launched. For example, we recently launched a group sharing feature for business customers so that they can group their Buffer social media accounts together. Our hypothesis was that people with several social media accounts prefer to share specific content to subsets of accounts. We wanted to quantifiably validate whether this is something many would use, or whether it’s a niche or power user feature. After a week of testing this out, we had our answer.

This example shows our tracking of our ‘group sharing’ experiment. We wanted to track each group that was created with this new feature. With this, we’re able to track the user, the groups created, the name of the group, and the date it was created.

Making sense of the data

We store a lot of tracking data. While it’s great that we’re tracking all this data, there would be no point if we weren’t able to make sense of it. Our goal for tracking this data was to create our own growth dashboard so we can keep track of key metrics, and understand results of experiments. Making sense of the data was one of the most challenging parts of setting up our growth platform.

MongoDB Aggregation

We rely heavily on MongoDB’s aggregation framework. It has been super handy for things like gauging API client requests by hour, response times separated by API endpoint, number of visitors based on referrers, cohort analysis and so much more.

Here’s a simple example of how we use MongoDB aggregation to obtain our average API response times between April 8th and April 9th: Result:

With the aggregation framework, we have powerful insight into how clients are using our platform, which users are power users and a lot more. We previously created long running scripts to generate our cohort analysis reports. Now we can use MongoDB aggregation for much of this.

Running ETL jobs

We have several ETL jobs that run periodically to power our growth dashboard. This is the way we make sense of our data core. Some of the more complex reports need this level of reporting. For example, the way we measure product activation is whether someone has posted an update within a week of joining. With the way we’ve structured our data, this requires a join query in two different collections. All of this processing is done in our ETL jobs. We’ll upload the results to a different database which is used to power the views in our growth dashboard for faster loading.

Here are some reports on our growth dashboard that are powered by ETL jobs

Scaling Challenges and Pitfalls

We’ve faced a few different challenges and we’ve iterated to get to a point where we can make solid use out of our growth platform. Here are a few pitfalls and examples of challenges that we’ve faced in setting this up and scaling our platform.

Plan for high disk I/O and write throughput.

The DB server size and type has a key role in how quickly we could process and store events. In planning for the future we knew that we’d be tracking quite a lot of data and a fast pace, so a db with high disk write throughput was key for us. We ended up going for a large SSD replica set. This of course really depends on your application and use case. If you use an intermediate datastore like SQS, you can always start small, and upgrade db servers when you need it without any data loss.

We keep an eye on mongostat and SQS queue size almost daily to see how our writes are doing.

One of the good things about an SSD backed DB is that disk reads are much quicker compared to hard disk. This means it’s much more feasible to run ad hoc queries on un-indexed fields. We do this all the time whenever we have a hunch of something to dig into further.

Be mindful of the MongoDB document limit and how data is structured

Our first iteration of schema design was not scalable. True, MongoDB does not perform schema validation but that doesn’t mean it’s not important to think about how data is structured. Originally, we tracked all events in a single user_metrics and visitor_metrics collection. An event was stored as an embedded object in an array in a user document. Our hope was that we wouldn’t need to do any joins and we could effectively segment out tracking data super easily by user.

We had fields as arrays that were unbounded and could grow infinitely causing the document size to grow. For some highly active users (and bots), after a few months of tracking data in this way some documents in this collection would hit the 16MB document limit and fail to write any more. This created various performance issues in processing updates, and in our growth worker and ETL jobs because there were these huge documents transferred over the wire. When this happened we had to move quickly to restructure our data.

Moving to a single collection per event type has been the most scalable solution and a more flexible solution.

Reading from secondaries

Some of our ETL jobs read and process a lot of data. If you end up querying documents that haven’t been read or written to recently, it is very possible this may be stored out of memory and on disk. Querying this data means MongoDB will page out some documents that have been touched recently and bring query results into memory. This will then make writing to that paged out document slower. It’s for this reason that we have set up our ETL and aggregation queries to read only from our secondaries in our replica-set, even though they may not be consistent with the primary.

Our secondaries have a high number of faults because of paging due to reading ‘stale’ data

Visualizing results

As I mentioned before, one of the more challenging parts about maintaining our own growth platform is extracting and visualizing the data in a way that makes a lot of sense. I can’t say that we’ve come to a great solution yet. We’ve put a lot of effort into building out and maintaining our growth dashboard and creating visualizations is the bottleneck for us today. There is really a lot of room to reduce the turnaround time. We have started to experiment a bit with using Stripe’s MoSQL to map results from MongoDB to PostgresSQL and connect with something like Chart.io to make this a bit more seamless. If you’ve come across some solid solutions for visualizing event tracking with MongoDB, I’d love to hear about it!

Event tracking for everyone!

We would love to open source our growth platform. It’s something we’re hoping to do later this year. We’ve learned a lot by setting up our own tracking platform. If you have any questions about any of this or would like to have more control of your own event tracking with MongoDB, just hit me up @sunils34

Want to help build out our growth platform? Buffer is looking to grow its growth team and reliability team!

Like what you see? Sign up for the MongoDB Newsletter and get MongoDB updates straight to your inbox

Efficient Indexing in MongoDB 2.6

Jun 4 • Posted 1 month ago

By Osmar Olivo, Product Manager at MongoDB

One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Having extensive indexing functionality makes it easier for developers to build apps that provide rich functionality and low latency.

MongoDB 2.6 introduces a new query planner, including the ability to perform index intersection. Prior to 2.6 the query planner could only make use of a single index for most queries. That meant that if you wanted to query on multiple fields together, you needed to create a compound index. It also meant that if there were several different combinations of fields you wanted to query on, you might need several different compound indexes.

Each index adds overhead to your deployment - indexes consume space, on disk and in RAM, and indexes are maintained during updates, which adds disk IO. In other words, indexes improve the efficiency of many operations, but they also come at a cost. For many applications, index intersection will allow users to reduce the number of indexes they need while still providing rich features and low latency.

In the following sections we will take a deep dive into index intersection and how it can be applied to applications.

An Example - The Phone Book

Let’s take the example of a phone book with the following schema.

{
    FirstName
    LastName
    Phone_Number
    Address
}

If I were to search for “Smith, John” how would I index the following query to be as efficient as possible?

db.phonebook.find({ FirstName : “John”, LastName : “Smith” })

I could use an individual index on FirstName and search for all of the “Johns”.

This would look something like ensureIndex( { FirstName : 1 } )

We run this query and we get back 200,000 John Smiths. Looking at the explain() output below however, we see that we scanned 1,000,000 “Johns” in the process of finding 200,000 “John Smiths”.

> db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor FirstName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 1000000,
    "nscanned" : 1000000,
    "nscannedObjectsAllPlans" : 1000101,
    "nscannedAllPlans" : 1000101,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 2,
    "nChunkSkips" : 0,
    "millis" : 2043,
    "indexBounds" : {
        "FirstName" : [
            [
                "John",
                "John"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

How about creating an individual index on LastName?

This would look something like ensureIndex( { LastName : 1 } )

Running this query we get back 200,000 “John Smiths” but our explain output says that we now scanned 400,000 “Smiths”. How can we make this better?

db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor LastName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 400000,
    "nscanned" : 400000,
    "nscannedObjectsAllPlans" : 400101,
    "nscannedAllPlans" : 400101,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 1,
    "nChunkSkips" : 0,
    "millis" : 852,
    "indexBounds" : {
        "LastName" : [
            [
                "Smith",
                "Smith"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

So we know that there are 1,000,000 “John” entries, 400,000 “Smith” entries, and 200,000 “John Smith” entries in our phonebook. Is there a way that we can scan just the 200,000 we need?

In the case of a phone book this is somewhat simple; since we know that we want it to be sorted by Lastname, Firstname we can create a compound index on them, like the below.

ensureIndex( {  LastName : true, FirstName : 1  } ) 

db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor LastName_1_FirstName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 200000,
    "nscanned" : 200000,
    "nscannedObjectsAllPlans" : 200000,
    "nscannedAllPlans" : 200000,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 370,
    "indexBounds" : {
        "LastName" : [
            [
                "Smith",
                "Smith"
            ]
        ],
        "FirstName" : [
            [
                "John",
                "John"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

Looking at the explain on this, we see that the index only scanned the 200,000 documents that matched, so we got a perfect hit.

Beyond Compound Indexes

The compound index is a great solution in the case of a phonebook in which we always know how we are going to be querying our data. Now what if we have an application in which users can arbitrarily query for different fields together? We can’t possibly create a compound index for every possible combination because of the overhead imposed by indexes, as we discussed above, and because MongoDB limits you to 64 indexes per collection. Index intersection can really help.

Imagine the case of a medical application which doctors use to filter through patients. At a high level, omitting several details, a basic schema may look something like the below.

{
      Fname
      LName
      SSN
      Age
      Blood_Type
      Conditions : [] 
      Medications : [ ]
      ...
      ...
}

Some sample searches that a doctor/nurse may run on this system would look something like the below.

Find me a Patient with Blood_Type = O under the age of 50

db.patients.find( {   Blood_Type : “O”,  Age : {   $lt : 50  }     } )

Find me all patients over the age of 60 on Medication X

db.patients.find( { Medications : “X” , Age : { $gt : 60} })

Find me all Diabetic patients on medication Y

db.patients.find( { Conditions : “Diabetes”, Medications : “Y” } )

With all of the unstructured data in modern applications, along with the desire to be able to search for things as needed in an ad-hoc way, it can become very difficult to predict usage patterns. Since we can’t possibly create compound indexes for every combination of fields, because we don’t necessarily know what those will be ahead of time, we can try indexing individual fields to try to salvage some performance. But as shown above in our phone book application, this can lead to performance issues in which we pull documents into memory that are not matches.

To avoid the paging of unnecessary data, the new index intersection feature in 2.6 increases the overall efficiency of these types of ad-hoc queries by processing the indexes involved individually and then intersecting the result set to find the matching documents. This means you only pull the final matching documents into memory and everything else is processed using the indexes. This processing will utilize more CPU, but should greatly reduce the amount of IO done for queries where all of the data is not in memory as well as allow you to utilize your memory more efficiently.

For example, looking at the earlier example:

db.patients.find( {   Blood_Type : “O”,  Age : {   $lt : 50  }     } )

It is inefficient to find all patients with BloodType: O (which could be millions) and then pull into memory each document to find the ones with age < 50 or vice versa.

Instead, the query planner finds all patients with bloodType: O using the index on BloodType, and all patients with age < 50 using the index on age, and then only pulls the intersection of these 2 result sets into memory. The query planner only needs to fit the subsets of the indexes in memory, instead of pulling in all of the documents. This in turn causes less paging, and less thrashing of the contents of memory, which will yield overall better performance.

Index intersection allows for much more efficient use of existing RAM so less total memory will usually be required to fit the working set then previously. Also, if you had several compound indices that were made up of different combinations of fields, then you can reduce the total number of indexes on the system. This means storing less indices in memory as well as achieving better insert/update performance since fewer indices must be updated.

As of version 2.6.0, you cannot intersect with geo or text indices and you can intersect at most 2 separate indices with each other. These limitations are likely to change in a future release.

Optimizing Multi-key Indexes It is also possible to intersect an index with itself in the case of multi-key indexes. Consider the below query:

Find me all patients with Diabetes & High Blood Pressure

db.patients.find( {  Conditions : { $all : [ “Diabetes”, “High Blood Pressure” ]  }    }  )

In this case we will find the result set of all Patients with Diabetes, and the result set of all patients with High blood pressure, and intersect the two to get all patients with both. Again, this requires less memory and disk speed for better overall performance. As of the 2.6.0 release, an index can intersect with itself up to 10 times.

Do We Still Need Compound Indexes?

To be clear, compound indexing will ALWAYS be more performant IF you know what you are going to be querying on and can create one ahead of time. Furthermore, if your working set is entirely in memory, then you will not reap any of the benefits of Index Intersection as it is primarily based on reducing IO. But in a more ad-hoc case where one cannot predict the shape of the queries and the working set is much larger than available memory, index intersection will automatically take over and choose the most performant path.

Appboy’s co-founder and CIO Jon Hyman discusses how the leading platform for app marketing automation leverages MongoDB and ObjectRocket for real-time data aggregation and scale and gives a preview of his talk with Kenny Gorman of ObjectRocket at MongoDB World.

Want to see more? MongoDB World will feature over 80 MongoDB experts from around the world. Early bird ticket prices for the event end May 23. Register now to grab your seat

MongoDB’s New Bulk API

May 6 • Posted 2 months ago

By Christian Kvalheim, Driver Lead and Node.js Driver Maintainer at MongoDB

The New Bulk API

One of the core new features in MongoDB 2.6 is the new bulk write operations. All the drivers include a new bulk api that allows applications to leverage these new operations using a fluid style API. Let’s explore the API and how it’s implemented in the Node.js driver.

The API

The API has two core concepts. The ordered and the unordered bulk operation. The main difference is in the way the operations are executed in bulk. In the case of an ordered bulk operation, every operation will be executed in the order they are added to the bulk operation. In the case of an unordered bulk operation however there is no guarantee what order the operations are executed. Later we will look at how each is implemented.

Operations

You can initialize an ordered or unordered bulk operation in the following way.

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    var unordered = db.collection('documents').initializeUnorderedBulkOp();

Both the ordered and unordered instances are bulk operation objects that we can add insert, update and remove operations to. The following operations are valid.

updateOne (update first matching document)

```ordered.find({ a : 1 }).updateOne({$inc : {x : 1}});```

update (update all matching documents)

```ordered.find({ a : 1 }).update({$inc : {x : 2}});```

replaceOne (replace entire document)

```ordered.find({ a : 1 }).replaceOne({ x : 2});```

updateOne or upsert (update first existing document or upsert)

```ordered.find({ a : 2 }).upsert().updateOne({ $inc : { x : 1}});```

update or upsert (update all or upsert)

```ordered.find({ a : 2 }).upsert().update({ $inc : { x : 2}});```

replace or upsert (replace first document or upsert)

```ordered.find({ a : 2 }).upsert().replaceOne({ x : 3 });```

removeOne (remove the first document matching)

```ordered.find({ a : 2 }).removeOne();```

remove (remove all documents matching)

```ordered.find({ a : 1 }).remove();```

insert

```ordered.insert({ a : 5});```

What happens under the covers when you start adding operations to a bulk operation? Let’s take a look at the new write operations to see how it works.

The New Write Operations

MongoDB 2.6 introduces a completely new set of write operations. Before 2.6 all write operations where done using wire protocol messages at the socket level. From 2.6 this changes to using commands.

### Insert Write Command

The insert write commands allow an application to insert batches of documents. Here’s an example:

    {
        insert: 'collection name'
      , documents: [{ a : 1}, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

A couple of things to note. The documents field contains an array of all the documents that are to be inserted. The writeConcern field specifies what would have previously been a getLastError command that would follow the pre 2.6 write operations. In other words there is always a response from a write operation in 2.6. This means that w:0 has different semantics than what one is used to in pre 2.6. In the context w:0 basically means only return an ack without any information about the success or failure of insert operations.

Let’s take a look at the update and remove write commands before seeing the results that are returned when executing these operations in 2.6.

Update Write Command

There are some slight differences in the update write command in comparison to the insert write command. Here’s an example:

    {
        update: 'collection name'
      , updates: [{ 
            q: { a : 1 }
          , u: { $inc : { x : 1}}
          , multi: true/false
          , upsert: true/false
        }, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

The main difference here is that the updates array is an array of update operations where each entry in the array contains the q field that specifies the selector for the update. The u contains the update operation. multi specifies if we will updateOne or updateAll documents that matches the selection. Finally upsert tells the server if it will perform an upsert if the document is not found.

Finally let’s look at the remove write command.

Remove Write Command

The remove write command is very similar to the update write command. Here’s an example:

    {
        delete: 'collection name'
      , deletes: [{ 
            q: { a : 1 }
          , limit: 0/1
        }, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

Similar to the update example, we can see that the entries in the deletes array contain documents with specific fields. The q field is the selector that will match which documents will be removed. The limit field sets the number of elements to be removed. Currently limit only supports two values, 0 and 1. The value 0 for limit removes all documents that match the selector. A value of 1 for limit removes the first matching document only.

Now let’s take a look at how results are returned for these new write commands.

Write Command Results

One of the best new aspects of the new write commands is that they can return information about each individual operation error in the batch. Results are efficient - only information about errors are returned as well as the aggregated counts of successful operations. Here’s an example of a comprehensive* result:

    {
      "ok" : 1,
      "n" : 0,
      "nModified": 1, (Applies only to update)
      "nRemoved": 1, (Applies only to removes)
      "writeErrors" : [
        {
          "index" : 0,
          "code" : 11000,
          "errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: t1.t.$a_1  dup key: { : 1.0 }"
        }
      ],
      writeConcernError: {
        code : 22,
        errInfo: { wtimeout : true },
        errmsg: "Could not replicate operation within requested timeout"
      }      
    }

The two most interesting fields here are writeErrors and writeConcernError. If we take a look at writeErrors we can see how it’s an array of objects that include an index field as well as a code and errmsg. The field references the position of the failing document in the original documents, updates or deletes array allowing the application to identify the original batch document that failed.

The Effect of Ordered (true/false)

If ordered is set to true the write operation will fail on the first write error (meaning the first error that fails to apply the operation to memory). If one sets ordered to false the operation will continue until all operations have been executed (potentially in parallel), then return all the results. writeConcernError on the other hand does not stop the processing of a bulk operation if a document fails to be written to MongoDB.

It helps to think of writeErrors as hard errors and writeConcernError as a soft error.

The Special Case of w:0 As I mentioned previously, the semantics for w:0 changed for the write commands. The old style of write operations before 2.6 are a combination of a write wire message and a getLastError command. In the old style w:0 meant that the driver would not send a getLastError command after the write operation.

In 2.6 the new insert/update/delete commands will always respond. While w:0 would not return a result in versions of MongoDB before 2.6, in 2.6 and above it will. However it will truncate all the results and only return if the command ran successfully or failed.

As a result, if you execute.

    {
        insert: 'collection name'
      , documents: [{ a : 1}, ...]
      , writeConcern: {
        w: 0
      }
      , ordered: true/false
    }

All you receive from the server is the result

{ok : 1}

The Implication For The Bulk API

There are some implications to the fact that write commands are not mixed operations but either insert/update or removes. The Bulk API lets you mix operations and then merges the results back into a single result that simulates a mixed operations command in MongoDB. What does that mean in practice. Well let’s look at how node.js implements ordered and unordered bulk operations. Let’s use examples to show what happens.

Ordered Operations

Let’s take the following set of operations:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.insert({ a: 2 });
    ordered.find({ a : 2 }).remove();
    ordered.insert({ a: 3 });

When running in ordered mode the bulk API guarantees the ordering of the operations and thus will execute this as 5 operations one after the other:

    insert bulk operation
    update bulk operation
    insert bulk operation
    remove bulk operation
    insert bulk operation

We have now reduced the bulk API to performing single operations and your throughput suffers accordingly.

If we re-order our bulk operations in the following way:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.insert({ a: 2 });
    ordered.insert({ a: 3 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.find({ a : 2 }).remove();

The execution is reduced to the following operations one after the other:

    insert bulk operation
    update bulk operation
    remove bulk operation

Thus for ordered bulk operations the order of operations will impact the number of write commands that need to be executed and thus the throughput possible.

Unordered Operations

Unordered operations do not guarantee the execution order of operations. Let’s take the example from above:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.insert({ a: 2 });
    ordered.find({ a : 2 }).remove();
    ordered.insert({ a: 3 });

The Node.js driver will collect the operations into separate type-specific operations. So we get.

    insert bulk operation
    update bulk operation
    remove bulk operation

In difference to the ordered operation these bulks all get executed in parallel in Node.js and the results then merged when they have all finished.

Takeaway

MongoDB as of 2.6 only allows batches of inserts, updates or removes and not a mixed batch containing all three of the operation types. When performing ordered bulk operation we need to keep this in mind to avoid the scenario above. However for an unordered bulk operation the missing mixed batch type in 2.6 does not impact performance.

Note: Although the Bulk API actually supports downconversion to 2.4 the performance impact is considerable as all operations are reduced to single write operations with a getLastError. It’s recommended to leverage this API primarily with 2.6 or higher.

Like what you see? Sign up to the MongoDB Newsletter and get monthly updates straight to your inbox

The MongoDB Open Source Hack Contest

Apr 28 • Posted 2 months ago

Some of the best MongoDB tools come from the Open Source community. Projects like the Node.js Driver, Mongoose and Meteor have become the backbone of many MongoDB apps and have helped support the developer community all over the world. We want to see more of what the community has built.

For the month of May, we’ll be hosting a worldwide hack contest for Open Source tools built on or connected to MongoDB. The winner of the contest will receive a ticket to OSCON, furnished by O’Reilly.

Guidelines:

  • All projects must be built with the MongoDB source code or on top of a MongoDB API, community or MongoDB supported driver
  • Any new drivers created should abide by the driver requirements listed in the MongoDB Manual
  • All hacks will be judged by MongoDB engineers

All entries can be submitted to community@MongoDB.com with the following information before May 31

  • Github or Bitbucket URL
  • Description of the project
  • How do users benefit from this application?
  • Why did you choose to contribute to MongoDB?

We’re looking forward to seeing your hacks come in!

Want to keep up-to-date on MongoDB news and events? Sign up for the MongoDB Monthly Newsletter”

maxTimeMS() and Query Optimizer Introspection in MongoDB 2.6

Apr 23 • Posted 3 months ago

By Alex Komyagin, Technical Service Engineer in MongoDB’s New York Offices

In this series of posts we cover some of the enhancements in MongoDB 2.6 and how they will be valuable for users. You can find a more comprehensive list of changes in our 2.6 release notes. These are changes I believe you will find helpful, especially for our advanced users.

This post introduces two new features: maxTimeMS and query optimizer introspection. We will also look at a specific support case where these features would have been helpful.

maxTimeMS: SERVER-2212

The socket timeout option (e.g. MongoOptions.socketTimeout in the Java driver) specifies how long to wait for responses from the server. This timeout governs all types of requests (queries, writes, commands, authentication, etc.). The default value is “no timeout” but users tend to set the socket timeout to a relatively small value with the intention of limiting how long the database will service a single request. It is thus surprising to many users that while a socket timeout will close the connection, it does not affect the underlying database operation. The server will continue to process the request and consume resources even after the connection has closed.

Many applications are configured to retry queries when the driver reports a failed connection. If a query whose connection is killed by a socket timeout continues to consume resources on the server, a retry can give rise to a cascading effect. As queries run in MongoDB, the application may retry the same resource intensive query multiple times, increasing the load on the system leading to severe performance degradation. Before 2.6 the only way to cancel these queries was to issue db.killOp, which requires manual intervention.

MongoDB 2.6 introduces a new queryparameter, maxTimeMS, which limits how long an operation can run in the database. When the time limit is reached, MongoDB automatically cancels the query. maxTimeMS can be used to efficiently prevent unexpectedly slow queries from overloading MongoDB. Following the same logic, maxTimeMS can also be used to identify slow or unoptimized queries.

There are a few important notes about using maxTimeMS:

  • maxTimeMS can be set on a per-operation basis — long running aggregation jobs can have a different setting than simple CRUD operations.
  • requests can block behind locking operations on the server, and that blocking time is not counted, i.e. the timer starts ticking only after the actual start of the query when it initially acquires the appropriate lock;
  • operations are interrupted only at interrupt points where an operation can be safely aborted — the total execution time may exceed the specified maxTimeMS value;
  • maxTimeMS can be applied to both CRUD operations and commands, but not all commands are interruptible;
  • if a query opens a cursor, subsequent getmore requests will be included in the total time (but network roundtrips will not be counted);
  • maxTimeMS does not override the inactive cursor timeout for idle cursors (default is 10 min).

All MongoDB drivers support maxTimeMS in their releases for MongoDB 2.6. For example, the Java driver will start supporting it in 2.12.0 and 3.0.0, and Python - in version 2.7.

To illustrate how maxTimeMS can come in handy, let’s talk about a recent case that the MongoDB support team was working on. The case involved performance degradation of a MongoDB cluster that is powering a popular site with millions of users and heavy traffic. The degradation was caused by a huge amount of read queries performing full scans of the collection. The collection in question stores user comment data, so it has billions of records. Some of these collection scans were running for 15 minutes or so, and they were slowing down the whole system.

Normally, these queries would use an index, but in this case the query optimizer was choosing an unindexed plan. While the solution to this kind of problem is to hint the appropriate index, maxTimeMS can be used to prevent unintentional runaway queries by controlling the maximum execution time. Queries that exceed the maxTimeMS threshold will error out on the application side (in the Java driver it’s the MongoExecutionTimeoutException exception). maxTimeMS will help users to prevent unexpected performance degradation, and to gain better visibility into the operation of their systems.

In the next section we’ll take a look at another feature which would help in diagnosing the case we just discussed.

Query Optimizer Introspection: SEVER-666

To troubleshoot the support case described above and to understand why the query optimizer was not picking up the correct index for queries, it would have been very helpful to know a few things, such as whether the query plan had changed and to see the query plan cache.

MongoDB determines the best execution plan for a query and stores this plan in a cache for reuse. The cached plan is refreshed periodically and under certain operational conditions (which are discussed in detail below). Prior to 2.6 this caching behavior was opaque to the client. Version 2.6 provides a new query subsystem and query plan cache. Now users have visibility and control over the plan cache using a set of new methods for query plan introspection.

Each collection contains at most one plan cache. The cache is cleared every time a change is made to the indexes for the collection. To determine the best plan, multiple plans are executed in parallel and the winner is selected based on the amount of retrieved results within a fixed amount of steps, where each step is basically one “advance” of the query cursor. When a query has successfully passed the planning process, it is added to the cache along with the related index information.

A very interesting feature of the new query execution framework is the plan cache feedback mechanism. MongoDB stores runtime statistics for each cached plan in order to remove plans that are determined to be the cause of performance degradation. In practice, we don’t see these degradations often, but if they happen it is usually a consequence of a change in the composition of the data. For example, with new records being inserted, an indexed field may become less selective, leading to slower index performance. These degradations are extremely hard to manually diagnose, and we expect the feedback mechanism to automatically handle this change if a better alternative index is present.

The following events will result in a cached plan removal:

  • Performance degradation
  • Index add/drop
  • No more space in cache (the total number of plans stored in the collection cache is currently limited to 200; this is a subject to change in future releases)
  • Explicit commands to mutate cache
  • After the number of write operations on a collection exceeds the built-in limit, the whole collection plan cache is dropped (data distribution has changed)

MongoDB 2.6 supports a set of commands to view and manipulate the cache. List all known query shapes (planCacheListQueryShapes), display cached plans for a query shape (planCacheListPlans), manual removal of a query shape from the cache as well as emptying the whole cache (planCacheClear).

Here is an example invocation of the planCacheListQueryShapes command, that lists the shape of the query

db.test.find({first_name:"john",last_name:"galt"},

{_id:0,first_name:1,last_name:1}).sort({age:1}):

> db.runCommand({planCacheListQueryShapes: "test"})
{
    "shapes" : [
        {
            "query" : {
                "first_name" : "alex",
                "last_name" : "komyagin"
            },
            "sort" : {
                "age" : 1
            },
            "projection" : {
                "_id" : 0,
                "first_name" : 1,
                "last_name" : 1
            }
        }
    ],
    "ok" : 1
}

The exact values in the query predicate are insignificant in determining the query shape. For example, a query predicate

{first_name:"john", last_name:"galt"} is equivalent to the query predicate {first_name:"alex", last_name:"komyagin"}.

Additionally, with log level set to 1 or greater MongoDB will log plan cache changes caused by the events listed above. You can set the log level using the following command:

use admin
db.runCommand( { setParameter: 1, logLevel: 1 } )

Please don’t forget to change it back to 0 afterwards, since log level 1 logs all query operations and it can negatively affect the performance of your system.

Together, the new plan cache and the new query optimizer should make related operations more transparent and help users to have the visibility and control necessary for maintaining predictable response times for their queries.

You can see how these new features work yourself, by trying our latest release MongoDB 2.6 available for download here. I hope you will find these features helpful. We look forward to hearing your feedback, please post your questions in the mongodb-user google group.

Like what you see? Get MongoDB updates straight to your inbox

MongoDB 2.6: Our Biggest Release Ever

Apr 8 • Posted 3 months ago

By Eliot Horowitz, CTO and Co-founder, MongoDB

Discuss on Hacker News

In the five years since the initial release of MongoDB, and after hundreds of thousands of deployments, we have learned a lot. The time has come to take everything we have learned and create a basis for continued innovation over the next ten years.

Today I’m pleased to announce that, with the release of MongoDB 2.6, we have achieved that goal. With comprehensive core server enhancements, a groundbreaking new automation tool, and critical enterprise features, MongoDB 2.6 is by far our biggest release ever.

You’ll see the benefits in better performance and new innovations. We re-wrote the entire query execution engine to improve scalability, and took our first step in building a sophisticated query planner by introducing index intersection. We’ve made the codebase easier to maintain, and made it easier to implement new features. Finally, MongoDB 2.6 lays the foundation for massive improvements to concurrency in MongoDB 2.8, including document-level locking.

From the very beginning, MongoDB has offered developers a simple and elegant way to manage their data. Now we’re bringing that same simplicity and elegance to managing MongoDB. MongoDB Management Service (MMS), which already provides 35,000 MongoDB customers with monitoring and alerting, now provides backup and point-in-time restore functionality, in the cloud and on-premises.

We are also announcing a game-changing feature coming later this year: automation, also with hosted and on-premises options. Automation will allow you to provision and manage MongoDB replica sets and sharded clusters via a simple yet sophisticated interface.

MongoDB 2.6 brings security, integration and analytics enhancements to ease deployment in enterprise environments. LDAP, x.509 and Kerberos authentication are critical enhancements for organizations that require a single authentication mechanism across their entire infrastructure. To enhance security, MongoDB 2.6 implements TLS encryption, user-defined roles, auditing and field-level redaction, a critical building block for trusted systems. IBM Guardium also now offers integration with MongoDB, providing more extensive auditing abilities.

These are only a few of the key improvements; read the full official release notes for more details.

MongoDB 2.6 was a major endeavor and bringing it to fruition required hard work and coordination across a rapidly growing team. Over the past few years we have built and invested in that team, and I can proudly say we have the experience, drive and determination to deliver on this and future releases. There is much still to be done, and with MongoDB 2.6, we have a foundation for the next decade of database innovation.

Like what you see? Get MongoDB updates straight to your inbox

MongoDB Innovation Awards: Call for Nominations

Apr 2 • Posted 3 months ago

Fortune 500 enterprises, startups, hospitals, governments and organizations of all kinds use MongoDB because it is the best database for modern applications. The MongoDB Innovation Awards recognize organizations and individuals who are changing the world with MongoDB.

Nominations are open to any individual representing an organization that runs MongoDB, including partner solutions built on or for MongoDB.

The deadline to submit is midnight eastern time on May 30. A panel of judges will review nominations. Winners will be announced at MongoDB World on June 23-25.

Each winner will receive:

  • MongoDB Innovation Award
  • Pass to MongoDB World
  • Invitation to award winners reception at MongoDB World
  • Inclusion in Innovation Awards press release, blog post and email newsletter
  • $2,500 Amazon Gift Card

Submit a nomination by completing this nomination form — you’ll receive a discount code for MongoDB World.

Running MongoDB Queries Concurrently With Go

Mar 24 • Posted 4 months ago

This is a guest post by William Kennedy, managing partner at Ardan Studios in Miami, FL, a mobile and web app development company. Bill is also the author of the blog GoingGo.Net and the organizer for the Go-Miami and Miami MongoDB meetups in Miami. Bill looked for a new language in 2013 that would allow him to develop back end systems in Linux and found Go. He has never looked back.

If you are attending GopherCon 2014 or plan to watch the videos once they are released, this article will prepare you for the talk by Gustavo Niemeyer and Steve Francia. It provides a beginners view for using the Go mgo driver against a MongoDB database.

Introduction

MongoDB supports many different programming languages thanks to a great set of drivers. One such driver is the MongoDB Go driver which is called mgo. This driver was developed by Gustavo Niemeyer from Canonical with some assistance from MongoDB Inc. Both Gustavo and Steve Francia, the head of the drivers team, will be talking at GopherCon 2014 in April about “Painless Data Storage With MongoDB and Go”. The talk describes the mgo driver and how MongoDB and Go work well together for building highly scalable and concurrent software.

MongoDB and Go let you build scalable software on many different operating systems and architectures, without the need to install frameworks or runtime environments. Go programs are native binaries and the Go tooling is constantly improving to create binaries that run as fast as equivalent C programs. That wouldn’t mean anything if writing code in Go was complicated and as tedious as writing programs in C. This is where Go really shines because once you get up to speed, writing programs in Go is fast and fun.

In this post I am going to show you how to write a Go program using the mgo driver to connect and run queries concurrently against a MongoDB database. I will break down the sample code and explain a few things that can be a bit confusing to those new to using MongoDB and Go together.

Sample Program

The sample program connects to a public MongoDB database I have hosted with MongoLab. If you have Go and Bazaar installed on your machine, you can run the program against my database. The program is very simple - it launches ten goroutines that individually query all the records from the buoy_stations collection inside the goinggo database. The records are unmarshaled into native Go types and each goroutine logs the number of documents returned:

Now that you have seen the entire program, we can break it down. Let’s start with the type structures that are defined in the beginning:

The structures represent the data that we are going to retrieve and unmarshal from our query. BuoyStation represents the main document and BuoyCondition and BuoyLocation are embedded documents. The mgo driver makes it easy to use native types that represent the documents stored in our collections by using tags. With the tags, we can control how the mgo driver unmarshals the returned documents into our native Go structures.

Now let’s look at how we connect to a MongoDB database using mgo:

We start with creating a mgo.DialInfo object. Connecting to a replica set can be accomplished by providing multiple addresses in the Addrs field or with a single address. If we are using a single host address to connect to a replica set, the mgo driver will learn about any remaining hosts from the replica set member we connect to. In our case we are connecting to a single host.

After providing the host, we specify the database, username and password we need for authentication. One thing to note is that the database we authenticate against may not necessarily be the database our application needs to access. Some applications authenticate against the admin database and then use other databases depending on their configuration. The mgo driver supports these types of configurations very well.

Next we use the mgo.DialWithInfo method to create a mgo.Session object. Each session specifies a Strong or Monotonic mode, and other settings such as write concern and read preference. The mgo.Session object maintains a pool of connections to MongoDB. We can create multiple sessions with different modes and settings to support different aspects of our applications.

The next line of code sets the mode for the session. There are three modes that can be set, Strong, Monotonic and Eventual. Each mode sets a specific consistency for how reads and writes are performed. For more information on the differences between each mode, check out the documentation for the mgo driver.

We are using Monotonic mode which provides reads that may not entirely be up to date, but the reads will always see the history of changes moving forward. In this mode reads occur against secondary members of our replica sets until a write happens. Once a write happens, the primary member is used. The benefit is some distribution of the reading load can take place against the secondaries when possible.

With the session set and ready to go, next we execute multiple queries concurrently:

This code is classic Go concurrency in action. First we create a sync.WaitGroup object so we can keep track of all the goroutines we are going to launch as they complete their work. Then we immediately set the count of the sync.WaitGroup object to ten and use a for loop to launch ten goroutines using the RunQuery function. The keyword go is used to launch a function or method to run concurrently. The final line of code calls the Wait method on the sync.WaitGroup object which locks the main goroutine until everything is done processing.

To learn more about Go concurrency and better understand how this particular piece of code works, check out these posts on concurrency and channels.

Now let’s look at the RunQuery function and see how to properly use the mgo.Session object to acquire a connection and execute a query:

The very first thing we do inside of the RunQuery function is to defer the execution of the Done method on the sync.WaitGroup object. The defer keyword will postpone the execution of the Done method, to take place once the RunQuery function returns. This will guarantee that the sync.WaitGroup objects count will decrement even if an unhandled exception occurs.

Next we make a copy of the session we created in the main goroutine. Each goroutine needs to create a copy of the session so they each obtain their own socket without serializing their calls with the other goroutines. Again, we use the defer keyword to postpone and guarantee the execution of the Close method on the session once the RunQuery function returns. Closing the session returns the socket back to the main pool, so this is very important.

To execute a query we need a mgo.Collection object. We can get a mgo.Collection object through the mgo.Session object by specifying the name of the database and then the collection. Using the mgo.Collection object, we can perform a Find and retrieve all the documents from the collection. The All function will unmarshal the response into our slice of BuoyStation objects. A slice is a dynamic array in Go. Be aware that the All method will load all the data in memory at once. For large collections it is better to use the Iter method instead. Finally, we just log the number of BuoyStation objects that are returned.

Conclusion

The example shows how to use Go concurrency to launch multiple goroutines that can execute queries against a MongoDB database independently. Once a session is established, the mgo driver exposes all of the MongoDB functionality and handles the unmarshaling of BSON documents into Go native types.

MongoDB can handle a large number of concurrent requests when you architect your MongoDB databases and collections with concurrency in mind. Go and the mgo driver are perfectly aligned to push MongoDB to its limits and build software that can take advantage of all the computing power that is available.

The mgo driver provides a safe way to leverage Go’s concurrency support and you have the flexibility to execute queries concurrently and in parallel. It is best to take the time to learn a bit about MongoDB replica sets and load balancer configuration. Then make sure the load balancer is behaving as expected under the different types of load your application can produce.

Now is a great time to see what MongoDB and Go can do for your software applications, web services and service platforms. Both technologies are being battle tested everyday by all types of companies, solving all types of business and computing problems.

Processing 2 Billion Documents A Day And 30TB A Month With MongoDB

Mar 14 • Posted 4 months ago

This is a guest post by David Mytton. He has been programming Python for over 10 years and founded his website and and monitoring company, Server Density, back in 2009.

Server Density processes over 30TB/month of incoming data points from the servers and web checks we monitor for our customers, ranging from simple Linux system load average to website response times from 18 different countries. All of this data goes into MongoDB in real time and is pulled out when customers need to view graphs, update dashboards and generate reports.

We’ve been using MongoDB in production since mid-2009 and have learned a lot over the years about scaling the database. We run multiple MongoDB clusters but the one storing the historical data does the most throughput and is the one I shall focus on in this article, going through some of the things we’ve done to scale it.

1. Use dedicated hardware, and SSDs

All our MongoDB instances run on dedicated servers across two data centers at Softlayer. We’ve had bad experiences with virtualisation because you have no control over the host, and databases need guaranteed performance from disk i/o. When running on shared storage (e.g., a SAN) this is difficult to achieve unless you can get guaranteed throughput from things like AWS’s Provisioned IOPS on EBS (which are backed by SSDs).

MongoDB doesn’t really have many bottlenecks when it comes to CPU because CPU bound operations are rare (usually things like building indexes), but what really causes problem is CPU steal - when other guests on the host are competing for the CPU resources.

The way we have combated these problems is to eliminate the possibility of CPU steal and noisy neighbours by moving onto dedicated hardware. And we avoid problems with shared storage by deploying the dbpath onto locally mounted SSDs.

I’ll be speaking in-depth about managing MongoDB deployments in virtualized or dedicated hardware at MongoDB World this June.

2. Use multiple databases to benefit from improved concurrency

Running the dbpath on an SSD is a good first step but you can get better performance by splitting your data across multiple databases, and putting each database on a separate SSD with the journal on another.

Locking in MongoDB is managed at the database level so moving collections into their own databases helps spread things out - mostly important for scaling writes when you are also trying to read data. If you keep databases on the same disk you’ll start hitting the throughput limitations of the disk itself. This is improved by putting each database on its own SSD by using the directoryperdb option. SSDs help by significantly alleviating i/o latency, which is related to the number of IOPS and the latency for each operation, particularly when doing random reads/writes. This is even more visible for Windows environments where the memory mapped data files are flushed serially and synchronously. Again, SSDs help with this.

The journal is always within a directory so you can mount this onto its own SSD as a first step. All writes go via the journal and are later flushed to disk so if your write concern is configured to return when the write is successfully written to the journal, making those writes faster by using an SSD will improve query times. Even so, enabling the directoryperdb option gives you the flexibility to optimise for different goals (e.g., put some databases on SSDs and some on other types of disk, or EBS PIOPS volumes, if you want to save cost).

It’s worth noting that filesystem based snapshots where MongoDB is still running are no longer possible if you move the journal to a different disk (and so different filesystem). You would instead need to shut down MongoDB (to prevent further writes) then take the snapshot from all volumes.

3. Use hash-based sharding for uniform distribution

Every item we monitor (e.g., a server) has a unique MongoID and we use this as the shard key for storing the metrics data.

The query index is on the item ID (e.g. the server ID), the metric type (e.g. load average) and the time range; but because every query always has the item ID, it makes it a good shard key. That said, it is important to ensure that there aren’t large numbers of documents under a single item ID because this can lead to jumbo chunks which cannot be migrated. Jumbo chunks arise from failed splits where they’re already over the chunk size but cannot be split any further.

To ensure that the shard chunks are always evenly distributed, we’re using the hashed shard key functionality in MongoDB 2.4. Hashed shard keys are often a good choice for ensuring uniform distribution, but if you end up not using the hashed field in your queries, you could actually hurt performance because then a non-targeted scatter/gather query has to be used.

4. Let MongoDB delete data with TTL indexes

The majority of our users are only interested in the highest resolution data for a short period and more general trends over longer periods, so over time we average the time series data we collect then delete the original values. We actually insert the data twice - once as the actual value and once as part of a sum/count to allow us to calculate the average when we pull the data out later. Depending on the query time range we either read the average or the true values - if the query range is too long then we risk returning too many data points to be plotted. This method also avoids any batch processing so we can provide all the data in real time rather than waiting for a calculation to catch up at some point in the future.

Removal of the data after a period of time is done by using a TTL index. This is set based on surveying our customers to understand how long they want the high resolution data for. Using the TTL index to delete the data is much more efficient than doing our own batch removes and means we can rely on MongoDB to purge the data at the right time.

Inserting and deleting a lot of data can have implications for data fragmentation, but using a TTL index helps because it automatically activates PowerOf2Sizes for the collection, making disk usage more efficient. Although as of MongoDB 2.6, this storage option will become the default.

5. Take care over query and schema design

The biggest hit on performance I have seen is when documents grow, particularly when you are doing huge numbers of updates. If the document size increases after it has been written then the entire document has to be read and rewritten to another part of the data file with the indexes updated to point to the new location, which takes significantly more time than simply updating the existing document.

As such, it’s important to design your schema and queries to avoid this, and to use the right modifiers to minimise what has to be transmitted over the network and then applied as an update to the document. A good example of what you shouldn’t do when updating documents is to read the document into your application, update the document, then write it back to the database. Instead, use the appropriate commands - such as set, remove, and increment - to modify documents directly.

This also means paying attention to the BSON data types and pre-allocating documents, things I wrote about in MongoDB schema design pitfalls.

6. Consider network throughput & number of packets

Assuming 100Mbps networking is sufficient is likely to cause you problems, perhaps not during normal operations, but probably when you have some unusual event like needing to resync a secondary replica set member.

When cloning the database, MongoDB is going to use as much network capacity as it can to transfer the data over as quickly as possible before the oplog rolls over. If you’re doing 50-60Mbps of normal network traffic, there isn’t much spare capacity on a 100Mbps connection so that resync is going to be held up by hitting the throughput limits.

Also keep an eye on the number of packets being transmitted over the network - it’s not just the raw throughput that is important. A huge number of packets can overwhelm low quality network equipment - a problem we saw several years ago at our previous hosting provider. This will show up as packet loss and be very difficult to diagnose.

Conclusions

Scaling is an incremental process - there’s rarely one thing that will give you a big win. All of these tweaks and optimisations together help us to perform thousands of write operations per second and get response times within 10ms whilst using a write concern of 1.

Ultimately, all this ensures that our customers can load the graphs they want incredibly quickly. Behind the scenes we know that data is being written quickly, safely and that we can scale it as we continue to grow.

Bug Hunt Winners for MongoDB 2.6-rc0

Mar 13 • Posted 4 months ago

We would like to thank everyone who participated in MongoDB’s inaugural Bug Hunt. Your efforts have helped to improve the 2.6 release.

The MongoDB Bug Hunt for 2.6-rc0 uncovered a number of bugs. From this pool, we’ve selected 5 winning Bug Reporters based on three criteria: user impact, severity and prevalence.

The Winners

First Prize: Roman Kuzmin, SERVER-12846
  • 1 ticket to MongoDB World — with a reserved front-row seat for keynote sessions
  • $1000 Amazon Gift Card
  • MongoDB Contributor T-shirt
Second Prize: Mark Callaghan, Server 13060 & 13041
  • 1 ticket to MongoDB World — with a reserved front-row seat for keynote sessions
  • $500 Donation to Khan Academy (at Mark’s request)
  • MongoDB Contributor T-shirt
Honorable Mentions: Tim Callaghan (Server 12878), David Glasser (Server 12981) and Jeff Lee (Server 12925)
  • 1 ticket to MongoDB World — with a reserved front-row seat for keynote sessions
  • $250 Amazon Gift Card
  • MongoDB Contributor T-shirt

Congratulations to the five winners of the first MongoDB Bug Hunt and thanks to everyone who downloaded, tested and gave feedback on the release candidates.

-Eliot, Dan, Alvin and the MongoDB team

Announcing the MongoDB Bug Hunt 2.6.0-rc0 

Feb 21 • Posted 5 months ago

The MongoDB team released MongoDB 2.6.0-rc0 today and is proud to announce the MongoDB Bug Hunt. The MongoDB Bug Hunt is a new initiative to reward our community members who contribute to improving this MongoDB release. We’ve put the release through rigorous correctness, performance and usability testing. Now it’s your turn. Over the next 10 days, we challenge you to test and uncover any lingering issues in MongoDB 2.6.0-rc0.

How it works

You can download this release at MongoDB.org/downloads. If you find a bug, submit the issue to Jira (Core Server project) by March 4 at 12:00AM GMT. Bug reports will be judged on three criteria: user impact, severity and prevalence.

We will review all bugs submitted against 2.6.0-rc0. Winners will be announced on the MongoDB blog and user forum by March 8. There will be one first place winner, one second place winner and at least two honorable mentions.

The Rewards
First Prize:
  • 1 ticket to MongoDB World — with a reserved front-row seat for keynote sessions
  • $1000 Amazon Gift Card
  • MongoDB Contributor T-shirt
Second Prize:
  • 1 ticket to MongoDB World — with a reserved front-row seat for keynote sessions
  • $500 Amazon Gift Card
  • MongoDB Contributor T-shirt
Honorable Mentions:
  • 1 ticket to MongoDB World — with a reserved front-row seat for keynote sessions
  • $250 Amazon Gift Card
  • MongoDB Contributor T-shirt

How to get started:

  • Deploy in your test environment: Software is best tested in a realistic environment. Help us see how 2.6 fares with your code and data so that others can build and run applications on MongoDB 2.6 successfully.
  • Test new features and improvements: Dozens of new features were added in 2.6. See the 2.6 Release Notes for a full list.
  • Log a ticket: If you find an issue, create a report in Jira. See the documentation for a guide to submitting well written bug reports.

If you are interested in doing this work full time, consider applying to join our engineering teams in New York City, Palo Alto and Austin, Texas.

Happy hunting!

Eliot, Dan and the MongoDB Team

Crittercism: Scaling To Billions Of Requests Per Day On MongoDB

Feb 20 • Posted 5 months ago

This is a guest post by Mike Chesnut, Director of Operations Engineering at Crittercism. This June, Mike will present at MongoDB World on how Crittercism scaled to 30,000 requests/second (and beyond) on MongoDB.

MongoDB is capable of scaling to meet your business needs — that is why its name is based on the word humongous. This doesn’t mean there aren’t some growing pains you’ll encounter along the way, of course. At Crittercism, we’ve had a huge amount of growth over the past 2 years and have hit some bumps in the road along the way, but we’ve also learned some important lessons that can hopefully be of use to others.

Background

Crittercism provides the world’s first and leading Mobile Application Performance Management (mAPM) solution. Our SDK is embedded in tens of thousands of applications, and used by nearly a billion users worldwide. We collect performance data such as error reporting, crash diagnostics details, network breadcrumbs, device/carrier/OS statistics, and user behavior. This data is largely unstructured and varies greatly among different applications, versions, devices, and usage patterns.

Storing all of this data in MongoDB allows us to collect raw information that may be of use in a variety of ways to our customers, while also providing the analytics necessary to summarize the data down to digestible, actionable metrics.

As our request volume has grown, so too has our data footprint; over the course of 18 months our daily request volume increased over 40x:

Our primary MongoDB cluster now houses over 20TB of data, and getting to this point has helped us learn a few things along the way.

Routing

The MongoDB documentation suggests that the most common topology is to include a router — a mongos process — on each client system. We started doing this and it worked well for quite a while.

As the number of front-end application servers in production grew from the order of 10s to several 100s, we found that we were creating heavy load via hundreds (or sometimes thousands) of connections between our mongos routers and our mongod shard servers. This meant that whenever chunk balancing occurred — something that is an integral part of maintaining a well-balanced, sharded MongoDB cluster — the chunk location information that is stored in the config database took a long time to propagate. This is because every mongos router needs to have a full picture of where in the cluster all of the chunks reside.

So what did we learn? We found that we could alleviate this issue by consolidating mongos routers onto a few hosts. Our production infrastructure is in AWS, so we deployed 2 mongos servers per availability zone. This gave us redundancy per AZ, as well as offered the shortest network path possible from the clients to the mongos routers. We were concerned about adding an additional hop to the request path, but using Chef to configure all of our clients to only talk to the mongos routers in their own AZ helped minimize this issue.

Making this topology change greatly reduced the number of open connections to our mongod shards, which we were able to measure using MMS, without a noticeable reduction in application performance. At the same time, there were several improvements to MongoDB that made both the mongos updates and the internal consistency checks more efficient in general. Combined with the new infrastructure this meant that we could now balance chunks throughout our cluster without causing performance problems while doing so.

Shard Replacement

Another scenario we’ve encountered is the need to dynamically replace mongod servers in order to migrate to larger shards. Again following the recommended best deployment practice, we deploy MongoDB onto server instances utilizing large RAID10 arrays running xfs. We use m2.4xlarge instances in AWS with 16 disks. We’ve used basic Linux mdadm for performance, but at the expense of flexibility in disk configuration. As a result when we are ready to allocate more capacity to our shards, we need to perform a migration procedure that can sometimes take several days. This not only means that we need to plan ahead appropriately, but also that we need to be aware of the full process in order to monitor it and react when things go wrong.

We start with a replica set where all replicas are at approximately the same disk utilization. We first create a new server instance with more disk allocated to it, and add it to this replica set with rs.add().

The new replica will enter the STARTUP2 state and remain there for a long time (in our case, usually 2-3 days) while it first clones data, then catches up via oplog replication and builds indexes. During this time, index builds will often stop the replication process (note that this behavior is set to change in MongoDB 2.6), and so the replication lag time does not strictly decrease the whole time — it will steadily decrease for a while, then an index build will cause it to pause and start to fall further behind. Once the index build completes the replication will resume. It’s worth noting that while index builds occur, mongostat and anything else that requires a read lock will be blocked as well.

Eventually the replica will enter the SECONDARY state and will be fully functional. At this point we can rs.stepDown() one of the old replicas, shut down the mongod process running on it, and then remove it from the replica set via rs.remove(), making the server ready for retirement.

We then repeat the process for each member of the replica set, until all have been replaced with the new instances with larger disks.

While this process can be time-consuming and somewhat tedious, it allows for a graceful way to grow our database footprint without any customer-facing impact.

Conclusion

Operating MongoDB at scale — as with any other technology — requires some knowledge that you can gain from documentation, and some that you have to learn from experience. By being willing to experiment with different strategies such as those shown above, you can discover flexibility that may not have been previously obvious. Consolidating the mongos router tier was a big win for Crittercism’s Ops team in terms of both performance and manageability, and developing the above described migration procedure has enabled us to continue to grow to meet our needs without affecting our service or our customers.

See how Crittercism, Stripe, Carfax, Intuit, Parse and Sailthru are building the next generation of applications at MongoDB World. Register now and join the MongoDB Community in New York City this June.

Background Indexing on Secondaries and Orphaned Document Cleanup in MongoDB 2.6

Jan 27 • Posted 5 months ago

By Alex Komyagin, Technical Services Engineer in MongoDB’s New York Office

The MongoDB Support Team has broad visibility into the community’s use of MongoDB, issues they encounter, feature requests, bug fixes and the work of the engineering team. This is the first of a series of posts to help explain, from our perspective, what is changing in 2.6 and why.

Many of these changes are available today for testing in the 2.5.4 Development Release, which is available as of November 18, 2013 (2.5.5 release, coming soon, will be feature complete). Development Releases have odd-numbered minor versions (e.g., 2.1, 2.3, 2.5), and Production Releases have even-numbered minor versions (e.g., 2.2, 2.4, 2.6). MongoDB 2.6 will become available a little later this year.

Community testing helps MongoDB improve. You can test the development of MongoDB 2.5.4 today. Downloads are available here, and you can log Server issues in Jira.

Background indexes on secondaries (SERVER-2771)

Suppose you have a production replica set with two secondary servers, and you have a large, 1TB collection. At some point, you may decide that you need a new index to reflect a recent change in your application, so you build one in the background:

db.col.ensureIndex({..},{background : true})

Let’s also suppose that your application uses secondary reads (users should take special care with this feature, especially in sharded systems; for an example of why, see the next section in this post). After some time you observe that some of your application reads have started to fail, and replication lag on both secondaries has started to grow. While you are searching Google Groups for answers, everything magically goes back to normal by itself. Secondaries have caught up, and application reads on your secondaries are working fine. What happened?

One would expect that building indexes in the background would allow the replica set to continue serving regular operations during the index build. However, in all MongoDB releases before 2.6, background index builds on primaries become foreground index builds on secondaries, as noted in the documentation. Foreground index building is resource intensive and it can also affect replication and read/write operations on the database (see the FAQ on the MongoDB Docs). The good news is that impact can be minimized if your indexed collections are small enough for index builds to be relatively fast (on the order of minutes to complete).

The only way to make sure that indexing operations are not affecting the replica set in earlier versions of MongoDB was to build indexes in a rolling fashion. This works perfectly for most users, but not for everyone. For example, it wouldn’t work well for those who use a write concern “w:all”.

Starting with MongoDB 2.6, a background index build on the primary becomes a background index build on the secondaries. This behavior is much more intuitive and will improve the replica set robustness. We feel this will be a welcome enhancement for many users.

Please note that background index building normally takes longer than foreground building, because it allows other operations on the database to run. Keep in mind that, like most database systems, indexing in MongoDB is resource intensive and will increase the load on your system, whether it is a foreground or background process. Accordingly, it is best to perform these operations during a maintenance window or during off-peak hours.

The actual time needed to build a background index varies with the active load on your system, number of documents, database size and your hardware specs. Therefore, for production systems with large collections users can still take advantage of building indexes in a rolling fashion, or building them in foreground during maintenance windows if they believe a background index build will require more time than is acceptable.

Orphaned documents cleanup command (SERVER-8598)

MongoDB provides horizontal scaling through a feature called sharding. If you’re unfamiliar with sharding and how it works, I encourage you to read the nice new introduction to this feature the documentation team added a few months ago. Let me try and summarize some of the key concepts:

  • MongoDB partitions documents across shards.
  • Periodically the system runs a balancing process to ensure documents are uniformly distributed across the shards.
  • Groups of documents, called chunks, are the unit of a balancing job.
  • In certain failure scenarios stale copies of documents may remain on shards, which we call “orphaned documents.”

Under normal circumstances, there will be no orphaned documents in your system. However, in some production systems, “normal circumstances” are a very rare event, and migrations can fail (e.g., due to network connectivity issues), thus leaving orphaned documents behind.

The presence of orphaned documents can produce incorrect results for some queries. While orphaned documents are safe to delete, in versions prior to 2.6 there was no simple way to do so. In MongoDB 2.6 we implemented a new administrative command for sharded clusters: cleanupOrphaned(). This command removes orphaned documents from the shard in a single range of data.

The scenario where users typically encounter issues related to orphaned documents is when issuing secondary reads. In a sharded cluster, primary replicas for each shard are aware of the chunk placements, while secondaries are not. If you query the primary (which is the default read preference), you will not see any issues as the primary will not return orphaned documents even if it has them. But if you are using secondary reads, the presence of orphaned documents can produce unexpected results, because secondaries are not aware of the chunk ownerships and they can’t filter out orphaned documents. This scenario does not affect targeted queries (those having the shard key included), as mongos automatically routes them to correct shards.

To illustrate this discussion with an example, one of our clients told us that after a series of failed migrations he noticed that his queries were returning duplicate documents. He was using scatter-gather queries, meaning that they did not contain the shard key and were broadcast by mongos to all shards, as well as secondary reads. Shards return all the documents matching the query (including orphaned documents), which in this situation lead to duplicate entries in the final result set.

A short term solution was to remove orphaned documents (we used to have a special script for this). But a long term workaround for this particular client was to make their queries targeted, by including the shard key in each query. This way, mongos could efficiently route each query to the correct shard, not hitting the orphaned data. Routed queries are a best practice in any system as they also scale much better than scatter-gather queries.

Unfortunately, there are a few cases where there is no good way to make queries targeted, and you would need to either switch to primary reads or implement a regular process for removing orphaned documents.

The cleanupOrphaned() command is the first step on the path to automated cleanup of orphaned documents. This command should be run on the primary server and will clean up one unowned range on this shard. The idea is to run the command repeatedly, with a delay between calls to tune the cleanup rate.

In some configurations secondary servers might not be able to keep up with the rate of delete operations, resulting in replication lag. In order to control the lag, cleanupOrphaned() waits for the majority of the replica set members after the range removal is complete. Additionally, you can use the secondaryThrottle option, and each individual delete operation will be made with write concern w:2 (waits for one secondary). This may be useful for reducing the impact of removing orphaned documents on your regular operations.

You can find command usage examples and more information about the command in the 2.6 documentation.

I hope you will find these features helpful. We look forward to hearing your feedback on these features. If you would like to test them out, download MongoDB 2.5.4, the most recent Development Release of MongoDB.

Schema Design for Social Inboxes in MongoDB

Oct 31 • Posted 8 months ago

Designing a schema is a critical part of any application. Like most databases, there are many options for modeling data in MongoDB, and it is important to incorporate the functional requirements and performance goals for your application when determining the best design. In this post, we’ll explore three approaches for using MongoDB when creating social inboxes or message timelines.

If you’re building a social network, like Twitter for example, you need to design a schema that is efficient for users viewing their inbox, as well as users sending messages to all their followers. The whole point of social media, after all, is that you can connect in real time.

There are several design considerations for this kind of application:

  • The application needs to support a potentially large volume of reads and writes.
  • Reads and writes are not uniformly distributed across users. Some users post much more frequently than others, and some users have many, many more followers than others.
  • The application must provide a user experience that is instantaneous.
  • Edit 11/6: The application will have little to no user deletions of data (a follow up blog post will include information about user deletions and historical data)

Because we are designing an application that needs to support a large volume of reads and writes we will be using a sharded collection for the messages. All three designs include the concept of “fan out,” which refers to distributing the work across the shards in parallel:

  1. Fan out on Read
  2. Fan out on Write
  3. Fan out on Write with Buckets

Each approach presents trade-offs, and you should use the design that is best for your application’s requirements.

The first design you might consider is called Fan Out on Read. When a user sends a message, it is simply saved to the inbox collection. When any user views their own inbox, the application queries for all messages that include the user as a recipient. The messages are returned in descending date order so that users can see the most recent messages.

To implement this design, create a sharded collection called inbox, specifying the from field as the shard key, which represents the address sending the message. You can then add a compound index on the to field and the sent field. Once the document is saved into the inbox, the message is effectively sent to all the recipients. With this approach sending messages is very efficient.

Viewing an inbox, on the other hand, is less efficient. When a user views their inbox the application issues a find command based on the to field, sorted by sent. Because the inbox collection uses from as its shard key, messages are grouped by sender across the shards. In MongoDB queries that are not based on the shard key will be routed to all shards. Therefore, each inbox view will be routed to all shards in the system. As the system scales and many users go to view their inbox, all queries will be routed to all shards. This design does not scale as well as each query being routed to a single shard.

With the “Fan Out on Read” method, sending a message is very efficient, but viewing the inbox is less efficient.

Fan out on Read is very efficient for sending messages, but less efficient for reading messages. If the majority of your application consists of users sending messages, but very few go to read what anyone sends them — let’s call it an anti-social app — then this design might work well. However, for most social apps there are more requests by users to view their inbox than there are to send messages.

The Fan out on Write takes a different approach that is more optimized for viewing inboxes. This time, instead of sharding our inbox collection on the sender, we shard on the message recipient. In this way, when we go to view an inbox the queries can be routed to a single shard, which will scale very well. Our message document is the same, but now save a copy of the message for every recipient.

With the “Fan Out on Write” method, viewing the inbox is efficient, but sending messages consumes more resources.

In practice we might implement the saving of messages asynchronously. Imagine two celebrities quickly exchange messages at a high-profile event - the system could quickly be saturated with millions of writes. By saving a first copy of their message, then using a pool of background workers to write copies to all followers, we can ensure the two celebrities can exchange messages quickly, and that followers will soon have their own copies. Furthermore, we could maintain a last-viewed date on the user document to ensure they have accessed the system recently - zombie accounts probably shouldn’t get a copy of the message, and for users that haven’t accessed their account recently we could always resort to our first design - Fan out on Read - to repopulate their inbox. Subsequent requests would then be fast again.

At this point we have improved the design for viewing inboxes by routing each inbox view to a single shard. However, each message in the user’s inbox will produce a random read operation. If each inbox view produces 50 random reads, then it only takes a relatively modest number of concurrent users to potentially saturate the disks. Fortunately we can take advantage of the document data model to further optimize this design to be even more efficient.

Fan out on Write with Buckets refines the Fan Out on Write design by “bucketing” messages together into documents of 50 messages ordered by time. When a user views their inbox the request can be fulfilled by reading just a few documents of 50 messages each instead of performing many random reads. Because read time is dominated by seek time, reducing the number of seeks can provide a major performance improvement to the application. Another advantage to this approach is that there are fewer index entries.

To implement this design we create two collections, an inbox collection and a user collection. The inbox collection uses two fields for the shard key, owner and sequence, which holds the owner’s user id and sequence number (i.e. the id of 50-message “bucket” documents in their inbox). The user collection contains simple user documents for tracking the total number of messages in their inbox. Since we will probably need to show the total number of messages for a user in a variety of places in our application, this is a nice place to maintain the count instead of calculating for each request. Our message document is the same as in the prior examples.

To send a message we iterate through the list of recipients as we did in the Fan out on Write example, but we also take another step to increment the count of total messages in the inbox of the recipient, which is maintained on the user document. Once we know the count of messages, we know the “bucket” in which to add the latest message. As these messages reach the 50 item threshold, the sequence number increments and we begin to add messages to the next “bucket” document. The most recent messages will always be in the “bucket” document with the highest sequence number. Viewing the most recent 50 messages for a user’s inbox is at most two reads; viewing the most recent 100 messages is at most three reads.

Normally a user’s entire inbox will exist on a single shard. However, it is possible that a few user inboxes could be spread across two shards. Because our application will probably page through a user’s inbox, it is still likely that every query for these few users will be routed to a single shard.

Fan out on Write with Buckets is generally the most scalable approach of the these three designs for social inbox applications. Every design presents different trade-offs. In this case viewing a user’s inbox is very efficient, but writes are somewhat more complex, and more disk space is consumed. For many applications these are the right trade-offs to make.

Schema design is one of the most important optimizations you can make for your application. We have a number of additional resources available on schema design if you are interested in learning more:

Fan out on Read
Fan out on Write
Fan out on Write with Buckets
Send Message Performance
Best
Single write
Good
Shard per recipient
Multiple writes
Worst
Shard per recipient
Appends (grows)
Read Inbox Performance
Worst
Broadcast all shards
Random reads
Good
Single shard
Random reads
Best
Single shard
Single read
Data Size
Best
Message stored once
Worst
Copy per recipient
Worst
Copy per recipient


Schema design is one of the most important optimizations you can make for your application. We have a number of additional resources available on schema design if you are interested in learning more:

blog comments powered by Disqus