MongoDB Security Part II: 10 mistakes that can compromise your database

Jun 3 • Posted 4 months ago

This is the second in our 3-part series on MongoDB Security by Andreas Nilsson, Lead Security Engineer at MongoDB

This post outlines 10 things to avoid when configuring security for MongoDB. These types of mistakes can lead to the loss of sensitive data, disrupted operations and have the potential to put entire companies out of business. These recommendations are based on my experience working with MongoDB users, and building security systems for databases and financial services organizations. Items are ordered by a combination of severity and frequency.

Mistake #1: Directly exposing a MongoDB server to the Internet

It is surprisingly common to deploy MongoDB database servers directly online or in a DMZ. The MongoDB server network interface is designed to be stable under rogue conditions but exposing the database to the Internet is an unnecessary risk. This holds true for any backend system and is not specific for MongoDB.

If public network exposure is combined with lack of access control, the entire content of the database is up for grabs for anyone who cares to look. In addition, an attacker could intentionally or accidentally change the database configuration, modify the application behavior or perform a Denial of Service (DoS) attack.

Recommendation

Design web applications with a multi-tier architecture in mind, use firewalls to segment the network layers appropriately, and place your database server at the inner data storage layer.

Mistake #2: No access control

Access control is not enabled by default when installing MongoDB. If access control is not enabled, anyone with network access to the server can perform any operation. This includes extracting all data and configuration, running arbitrary Javascript using the eval command, modifying any data in the database, creating and removing shards, etc.

Recommendation

Always enable access control, unless it is guaranteed that no untrusted entities will gain access to the server, see the section on Role-Based Access Control in the MongoDB security manual.

Mistake #3 - Not enabling SSL

It is fairly straightforward to protect the network communication using SSL, and enabling SSL in MongoDB does not impact the database performance. SSL also protects against man-in-the-middle attacks, where an attacker would intercept and modify communication between two parties.

Recommendation

Enable SSL to protect network communication against eavesdropping between the clients and the servers and within MongoDB clusters and replica sets.

Mistake #4 - Unnecessary exposure of interfaces

MongoDB ships with an HTTP server and REST interface. By default this interface is turned off in MongoDB 2.6. Do not enable the HTTP server interface unless it is used for backwards compatibility. Instead use the wire API for communication with the server.

We also recommend only binding to necessary network interfaces and turn off server side Javacript execution if not needed.

Recommendation

Run MongoDB with secure configuration options as described in the documentation.

Mistake #5 - Poor user account configuration

There are a few ways to configure user accounts incorrectly, for MongoDB as well as for other systems. These include but are not limited to:

  • Use of a single high-privilege admin account for all purposes.
  • Granting high privileges and roles to users who do not need them.
  • Use of weak passwords or the same password for multiple accounts.
  • Orphaned user accounts belonging to decommissioned users or applications.

Recommendation

Use the principle of least privilege when configuring user accounts and utilize the flexibility available in the MongoDB access control system. Use unique, complex passwords and be mindful to decommission deprecated user accounts.

Mistake #6 - Insecure OS privileges

Running the mongod or mongos processes using a non-dedicated, high-privilege account like root puts your Operating System at unnecessary risk. Instead use a dedicated, special purpose account.

Avoid lax, insecure OS file permissions on * Data files * Keyfile * SSL private key files * Log files

Recommendation

Database data files, the keyfile and SSL private key files should only be readable by the mongod/mongos user. Log files should only be writable by the mongod/mongos user and readable only by root.

Mistake #7 - Insecure replica set keyfile configuration

The content of the keyfile used for authentication in sharded clusters and replicasets is in essence a password and should as such be long and of high entropy. Avoid:

  • Short or low-entropy password in the keyfile
  • Inadequate protection of the keyfile

Recommendation

Use a long, complex password if using a keyfile and protect it using adequate file permissions.

Mistake #8 - Poor SSL configuration

SSL is a complex protocol that needs to be configured properly to avoid leaving unexpected security holes.

Recommendation

Always provide MongoDB servers or the mongo shell with one or several CA certificates to establish a basis of trust.

Do not use self-signed certificates unless you are only looking for the encryption parts of SSL. Using a self-signed certificate invalidates substantial parts of SSL. Use certificates issued by a commercial or internal Certificate Authority.

Avoid using the same certificate across servers or clients. This exposes the private key in multiple places and unless a wildcard (*) certificate is used no hostname validation can be performed.

Mistake #9 - Unprotected backups

Care should be taken to adequately protect backup files generated by copying the data files or using the mongodump tool. If the content of the database is sensitive, the content of the backup is equally sensitive and should be treated as such.

Recommendation

Treat database backup files with the same level of care as the original database storage files.

Mistake #10 - Conscious or unconscious ignorance

A guaranteed way to create an insecure system is to ignore the topic altogether, or hope someone else thinks about it.

Recommendation

Before deploying a MongoDB instance with sensitive data, please consult the MongoDB Security Manual and the MongoDB Security Architecture Whitepaper and stay conscious about potential threats to your application.

MongoDB subscriptions provide access to additional enterprise grade capabilities. The subscription includes all the ease-of-use, broad driver support and scalability features of MongoDB, while addressing the more demanding security and certification requirements of corporate and government information security environments. To see more, download the development version of the MongoDB Enterprise edition here

6 Rules of Thumb for MongoDB Schema Design: Part 1

May 29 • Posted 4 months ago

By William Zola, Lead Technical Support Engineer at MongoDB

“I have lots of experience with SQL, but I’m just a beginner with MongoDB. How do I model a one-to-N relationship?” This is one of the more common questions I get from users attending MongoDB office hours.

I don’t have a short answer to this question, because there isn’t just one way, there’s a whole rainbow’s worth of ways. MongoDB has a rich and nuanced vocabulary for expressing what, in SQL, gets flattened into the term “One-to-N”. Let me take you on a tour of your choices in modeling One-to-N relationships.

There’s so much to talk about here, I’m breaking this up into three parts. In this first part, I’ll talk about the three basic ways to model One-to-N relationships. In the second part I’ll cover more sophisticated schema designs, including denormalization and two-way referencing. And in the final part, I’ll review the entire rainbow of choices, and give you some suggestions for choosing among the thousands (really — thousands) of choices that you may consider when modeling a single One-to-N relationship.

Many beginners think that the only way to model “One-to-N” in MongoDB is to embed an array of sub-documents into the parent document, but that’s just not true. Just because you can embed a document, doesn’t mean you should embed a document.

When designing a MongoDB schema, you need to start with a question that you’d never consider when using SQL: what is the cardinality of the relationship? Put less formally: you need to characterize your “One-to-N” relationship with a bit more nuance: is it “one-to-few”, “one-to-many”, or “one-to-squillions”? Depending on which one it is, you’d use a different format to model the relationship.

Basics: Modeling One-to-Few

An example of “one-to-few” might be the addresses for a person. This is a good use case for embedding — you’d put the addresses in an array inside of your Person object:

This design has all of the advantages and disadvantages of embedding. The main advantage is that you don’t have to perform a separate query to get the embedded details; the main disadvantage is that you have no way of accessing the embedded details as stand-alone entities.

For example, if you were modeling a task-tracking system, each Person would have a number of Tasks assigned to them. Embedding Tasks inside the Person document would make queries like “Show me all Tasks due tomorrow” much more difficult than they need to be. I will cover a more appropriate design for this use case in the next post.

Basics: One-to-Many

An example of “one-to-many” might be parts for a product in a replacement parts ordering system. Each product may have up to several hundred replacement parts, but never more than a couple thousand or so. (All of those different-sized bolts, washers, and gaskets add up.) This is a good use case for referencing — you’d put the ObjectIDs of the parts in an array in product document. (For these examples I’m using 2-byte ObjectIDs because they’re easier to read: real-world code would use 12-byte ObjectIDs.)

Each Part would have its own document:

Each Product would have its own document, which would contain an array of ObjectID references to the Parts that make up that Product:

You would then use an application-level join to retrieve the parts for a particular product:

For efficient operation, you’d need to have an index on ‘products.catalog_number’. Note that there will always be an index on ‘parts._id’, so that query will always be efficient.

This style of referencing has a complementary set of advantages and disadvantages to embedding. Each Part is a stand-alone document, so it’s easy to search them and update them independently. One trade off for using this schema is having to perform a second query to get details about the Parts for a Product. (But hold that thought until we get to denormalizing in part 2.)

As an added bonus, this schema lets you have individual Parts used by multiple Products, so your One-to-N schema just became an N-to-N schema without any need for a join table!

Basics: One-to-Squillions

An example of “one-to-squillions” might be an event logging system that collects log messages for different machines. Any given host could generate enough messages to overflow the 16 MB document size, even if all you stored in the array was the ObjectID. This is the classic use case for “parent-referencing” — you’d have a document for the host, and then store the ObjectID of the host in the documents for the log messages.

You’d use a (slightly different) application-level join to find the most recent 5,000 messages for a host:

Recap

So, even at this basic level, there is more to think about when designing a MongoDB schema than when designing a comparable relational schema. You need to consider two factors:

  • Will the entities on the “N” side of the One-to-N ever need to stand alone?
  • What is the cardinality of the relationship: is it one-to-few; one-to-many; or one-to-squillions?

Based on these factors, you can pick one of the three basic One-to-N schema designs:

  • Embed the N side if the cardinality is one-to-few and there is no need to access the embedded object outside the context of the parent object
  • Use an array of references to the N-side objects if the cardinality is one-to-many or if the N-side objects should stand alone for any reasons
  • Use a reference to the One-side in the N-side objects if the cardinality is one-to-squillions

Next time we’ll see how to use two-way relationship and denormalizing to enhance the performance of these basic schemas.

Appboy’s co-founder and CIO Jon Hyman discusses how the leading platform for app marketing automation leverages MongoDB and ObjectRocket for real-time data aggregation and scale and gives a preview of his talk with Kenny Gorman of ObjectRocket at MongoDB World.

Want to see more? MongoDB World will feature over 80 MongoDB experts from around the world. Early bird ticket prices for the event end May 23. Register now to grab your seat

MongoDB Security Part 1 - Design and Configuration

May 21 • Posted 5 months ago

By Andreas Nilsson, Lead Security Engineer at MongoDB

With increased regulatory compliance, heightened concerns around privacy and growing risk from hackers and organized crime, the need to secure access to data has never been more urgent.

MongoDB 2.6 provides a number of features to facilitate building secure applications, such as auditing and authentication with Kerberos and LDAP. MongoDB now features a more competent and complete role-based access control system, x.509 authentication, an improved SSL stack and revamped security documentation.

In a short series of blog posts I will attempt to explain the philosophy and design of the security model of MongoDB. The first post covers the basics of securing a MongoDB server and application and gives an overview of the options available. The second post lists the most common security mistakes when configuring MongoDB. The final post is a deep dive into the authentication and authorization subsystems, specifically covering sharded systems with multiple databases and how to use the new Role-Based Access Control system.

For details of the MongoDB security architecture and a complete list of features, please refer to the MongoDB Security Architecture Whitepaper.

Security Model

Before going into details let’s start with the basics. A database security model needs to offer a basic set of features:

  • control of read and write access to data
  • protection of the integrity and confidentiality of the data stored.
  • control of modifications to the database system configuration.
  • privilege levels for different user types, administrators, applications etc.
  • auditing of sensitive operations.
  • stable and secure operation in a potentially hostile environment.

These security requirements can be achieved in different ways. A disconnected database server in a locked room would constitute a secure deployment regardless of how the database was being protected. However, it would not be very useful. A database is often placed unprotected on a “secured”, internal network. This is an idealized scenario since no network is entirely secure, architecture changes over time, and a considerable number of successful breaches are from internal sources. A defense-in-depth approach is therefore recommended when implementing your application’s infrastructure. While MongoDB’s newest security features help to improve your overall security posture, security is only as strong as the weakest link in the chain.

A conceptual view of the MongoDB security architecture is represented in the image below. The security model is divided into the four pillars of authentication, authorization, auditing and encryption.

Secure Your Deployment

In the discussion below we will assume a simple web application which reads from and writes to a replicated database. Each of the steps below is described in detail in the documentation. The steps are described in greater detail in the MongoDB Security Checklist. The focus is primarily on enabling access control and transport encryption.

Infrastructure Prerequisites

Design the application to work in a multilayer fashion, and place the database server(s) on a dedicated network segment, isolated from the DMZ where the web application resides. Configure firewall rules to limit network access to the database server. Lock down MongoDB user and file permissions. The database files should be protected from unauthorized access and the mongod/mongos daemons should be running with minimum privileges, specifically not as root.

Enable Access Control

By default there are no users configured in MongoDB. In order to enable access control users needs to be created and assigned appropriate privileges. Access control is enabled using the command line —auth or —keyfile flags. When access control is enabled, clients and drivers are required to authenticate to the server, and servers are required to authenticate to each other.

The following is the recommended series of steps to enable access control.

Design Determine which type of authentication methods to use for client authentication. The options are challenge-response based username/password (MONGODB-CR), x.509, LDAP and Kerberos. Determine which type of authentication method to use for server-server cluster authentication. The options are username/password and x.509. Please note that x.509 requires the use of SSL. List the different user types that will exist in the system; administrators, support staff, different type of application users etc. For each of the user types, determine which built-in roles are required. Optionally create customized roles tailored for your deployment.

Deployment Configure a keyfile or x.509 certificates for the cluster nodes. Start the mongod servers with the —auth flag set and appropriate cluster authentication options as determined in 3. Start mongos servers with appropriate cluster authentication options. Create the desired users with correct permissions. Please note that after creating the first user access control is enabled. Therefore, at a minimum the first user should have the userAdminAnyDatabase or root roles in order to be able to create other users.

Enable Transport Encryption

In order to protect the network traffic, SSL should be enabled between clients and the server and in between servers. Enabling SSL is well described in the security documentation http://docs.mongodb.org/master//tutorial/configure-ssl/

MongoDB supports the use of any server SSL certificate as long as the corresponding root CA certificate is provided with the configuration parameter —sslCAFile. If no CA certificate is specified, no certificate validation is performed and the certificate is only used for encryption purposes. Although supported, use of self-signed certificates is not recommended, since there is no basis for trust, and hence no certificate validation can be performed.

There are several different ways to configure SSL with MongoDB. Mutual Validation In MongoDB 2.4 the recommended configuration was to issue SSL server certificates to the server, and to the clients connecting to enable mutual validation. The server certificate is validated against the CA certificate file provided on the client side, and the client certificates are validated against the CA certificate provided on the server side. However no authentication is performed.

In order to allow clients to connect without a certificate, the server can be started with the command line flag —sslWeakCertificateValidation.

Mutual validation is still supported but in MongoDB 2.6 several new SSL options are included. X.509 Authentication If SSL is enabled clients can use the new authentication mechanism MONGODB-X509 to authenticate using x.509 certificates.

It is also possible to use x.509 authentication between the servers in a cluster. From a security perspective this is a great improvement to the default keyfile solution.

Mixed mode SSL MongoDB 2.6 introduces the option of mixing encrypted and non-encrypted connections. That is, the server will listen for and detect both SSL and non-SSL inbound connections.

This feature enables cluster members running SSL to talk to non-SSL nodes and vice versa. It also enables a rolling configuration “upgrade” from a non-SSL to an SSL cluster without downtime.

The mixed mode behavior is controlled by the —sslMode parameter. From a security hardening perspective SSL mixed mode should be turned off, unless explicitly needed for one of the two scenarios discussed above.

Disable Unused Exposed Interfaces

Disable sensitive interfaces and functionality that is not needed.

MongoDB supports the execution of JavaScript code for certain server-side operations: mapReduce, group, eval, and $where. If you do not use these operations, disable server-side scripting by setting —noscripting to true.

Use only the MongoDB wire protocol on production deployments. The following interfaces are disabled by default in MongoDB 2.6: —httpinterface, —jsonp, and —rest. Leave these disabled, unless required for backwards compatibility. If using MongoDB 2.4 disable the HTTP interface using —nohttpinterface.

The bind_ip setting for mongod and mongos instances limits the network interfaces on which MongoDB programs will listen for incoming connections. Configure the server only to bind on desired interfaces.

Auditing

MongoDB Enterprise logs all administrative actions made against the database. Schema operations (such as creating or dropping databases, collections and indexes), replica set reconfiguration events along with authentication and authorization activities are all captured, along with the administrator’s identity and timestamp of the operation, enabling compliance and security analysis.

By default, MongoDB auditing logs all administrative actions, but can also be configured with filters to capture only specific events. The audit log can be written to multiple destinations in a variety of formats including to the console and syslog (in JSON format), and to a file (JSON or BSON), which can then be loaded to MongoDB and analyzed to find relevant events.

MongoDB Maintains an Audit Trail of Administrative Actions Against the Database

Each MongoDB server logs events to its local destination. The DBA can then merge these into a single log, enabling a cluster-wide view of operations that affected multiple nodes.

The MongoDB auditing documentation includes information on how to configure auditing and all of the operations that can be captured.

Summary

Securing the database layer of an application is a necessary step to protect the data from unauthorized access. MongoDB offers a flexible and competent security model but as with all security solutions, care should be take to enable and configure the system correctly.

In part 2, we will closely examine some common configuration mistakes and security pitfalls based on a number of existing MongoDB deployments and users.

MongoDB subscriptions provide access to additional enterprise grade capabilities. The subscription includes all the ease-of-use, broad driver support and scalability features of MongoDB, while addressing the more demanding security and certification requirements of corporate and government information security environments. To see more, download the development version of the MongoDB Enterprise edition here

Powering Social Insights with MongoDB at uberVu

May 19 • Posted 5 months ago

This is a guest post from the uberVU team.

Today, more than ever, marketers are being asked to show how their financial investments are driving tangible business results. We help them accomplish that goal. uberVU is a real-time social media marketing platform that allows organizations to leverage their social media data to better understand, connect with, and grow their online communities. We have an extensive client list including enterprise customers such as Heinz, NBC, World Bank, and Fujitsu.

We were recently acquired by HootSuite, and together our two products offer a complete and integrated feature set that addresses the entire social media lifecycle:

  • monitoring
  • metrics
  • reporting
  • engagement
  • collaboration

We have a five-year history in the social media monitoring market, and our evolving data storage architecture has played a key role in elevating our application’s value and user experience. For our data handling needs, we started with Tokyo Cabinet, SimpleDB, and MySQL and now use MongoDB, DynamoDB, S3, Glacier, and ElasticSearch.

Originally our team intended to use MongoDB as a secondary data store, but after a short implementation and adoption period of 3 months in which it quickly gained traction internally, MongoDB was promoted to our primary data store.

Challenges

We collect and store social media content such as tweets, Facebook posts, blog posts, blog comments, etc. Each item is stored in the database as a separate document.

A stored tweet might look something like this:

{
    generator: ‘twitter’,
    content: ‘This is a tweet example for ubervu’,
    author: ‘Vladimir Oane’,
    gender: ‘male’,
language: ‘english’,
    sentiment: ‘positive’,
    search: ‘ubervu’,
    published: 1391767879,
    ...
}

For our clients, relevant social media content must contain or match a predefined expression of interest in the designated ‘search’ field. In the example above, the tweet is collected and stored because it contains the string ‘ubervu’ in the content body.

Unique Index Structure

Our most common use case with MongoDB is performing a range query over a time frame for a fixed expression. For example, we might want to retrieve social media content that contains or matches the expression ‘ubervu’ between October 1st and November 23rd.

We constructed the unique index in MongoDB, _id, to perform this query automatically. For space considerations, we opted for a 64 bit integer and divided it into three parts:

  1. A hash of the search expression
  2. The entire published timestamp, in seconds
  3. An item id, which together with the timestamp should uniquely identify a document

To conduct a search for all “ubervu” mentions between timestamp1 and timestamp2, we simply run a range query on “_id” between:

and:

Note above how the lowest and highest bounds have the item id portion filled entirely with 0’s and 1’s, respectively. This allows us to cover edge cases of items that fall between timestamp1 and timestamp2.

Efficient Filtering

Another very common use case is retrieving all the data that matches a criteria set. Within our application, the fields we can filter on are predefined (generator, language, sentiment, language, gender).

Efficient filtering is a challenge because the most obvious approach - creating indexes for every combination of filters - is not scalable as every added index costs storage space and has the potential to adversely affect write performance.

To improve query efficiency, we added an ‘attributes’ field into each document that consists of an encoded array of all the field values that can be used in a query. It looks like this:

attributes: [2041, 15, 178, 23 …]

Each numeric code in the array represents a property, such as “sentiment: positive” or “language: English”. We added an index over the ‘attributes’ field to speed up queries.

To retrieve all items matching a criteria set using the ‘attributes’ field, queries are run using the $all operator:

collection.find({attributes:{$all:[...]}}

A shortcoming of using the $all operator is that prior to MongoDB 2.6 the index is only used to match the first code in the ‘attributes’ array; all other codes must be retrieved from disk and matched with the rest of query criteria.

In an effort to reduce the number of documents that need to be checked from disk for each query, we developed a system that first sorts all numeric codes by the frequency that they appear in the data store and then orders the elements in the ‘attributes’ fields according to their ranking.

For example, the property “generator: Twitter” (representing all tweets) is more prevalent in the data store than the property “language: Romanian” (representing all content in Romanian). If we wanted to obtain all tweets written in Romanian from the database, it would be more efficient to place the numeric code representing “language: Romanian” first in the ‘attributes’ array as it is faster to retrieve all Romanian content from disk and check if they are tweets than to retrieve all tweets and check if they are written in Romanian.

This solution described above dramatically improved our query response time. MongoDB’s dynamic schema and rich query model made this possible.

Saving Storage Space

After realizing the fields in our documents would be relatively small in number and mostly consistent across the database, our team decided to impose a two character limit on all field names (“generator” became, “g”, “sentiment” became “s”, etc).

This small change saved us 16% of our storage space, without any loss in information.

Our Infrastructure Setup

We have taken full advantage of the cloud computing resources available to efficiently deploy and scale our offerings. Our current infrastructure resides entirely on the AWS stack.

We currently have 30+ instances deployed and over 30 terabytes of storage in permanent use. All EBS-backed production data stores currently reside on xfs RAID arrays.

This storage architecture provides us with not only volume redundancy, but also performance boosts, which were much needed in the beginning when provisioned IOPS was not yet available to ensure EBS performance.

Our MongoDB setup consists of six production clusters, each with its own unique scope and usage pattern.

Our MongoDB clusters all have the same topology:

  • Each shard (shardA, shardB … shardZ) consists of three member replica sets
  • Three ‘config server’ processes are deployed on separate instances
  • ‘mongos’ routing processes are spread throughout the whole system (webnode, API nodes, worker nodes, etc …)

Optimizing Performance and Redundancy with MongoDB

For us, relying on EBS-backed MongoDB clusters meant we had to familiarize ourselves with the concept of the working set, a number which represents the amount of data that is regularly accessed during day-to-day operations. In situations when the working set is larger than RAM, our application would be forced to read from disk, resulting in an immediate loss in performance due to EBS I/O latency. Now working sets can be estimated using the working set estimator, which was first introduced in MongoDB v2.4.

To prevent the working set from exceeding RAM, we first viewed our data usage patterns in Graphite:

The graphic above represents the ‘write’ working set for one of the clusters.

The graphic above represents the ‘read’ working set generated by our API.

Using our data usage information, we defined the following access patterns:

RecentDB HistoricalDB
Read Working Set 0-20 days 20-90 days
Write Working Set 0-20 days 0-90 days ( > 20 day data can be written directly here)

The architecture represented by the table above has been in place for more than three years and has proven itself on multiple occasions from both a performance and redundancy standpoint.

We currently use MongoDB Management Service (MMS) in addition to tools such as Graphite and Collectd for both monitoring and backup. These applications have been critical to managing our cloud-based cluster backups.

As our MongoDB-powered data store grew in size, the decision was made to implement an ‘inverted pyramid’ mechanism in an effort to provide the best possible response time while remaining cost efficient.

This mechanism relies on two main data stores, RecentDB and HistoricalDB, with the use of an in-house oplog replay tool that keeps the two clusters in sync.

The oplog - short for operations log - is a special capped collection that keeps a rolling record of all operations that modify the data stored in a MongoDB database, and is the basic mechanism that enables replication in MongoDB. Secondary nodes tail the oplog for new operations and replay them locally.

To implement the ‘inverted pyramid’ mechanism, we developed a process that connects source cluster shards (RecentDB) to a destination cluster mongos router instance, verifies the last written timestamp, tails the source oplog to that timestamp, and finally, replays all executed operations.

After taking into consideration our current settings and data volumes, we determined that an oplog replay timeframe of 72 - 96 hrs worked best for our clusters as it ensured there was enough time to counter any major failures at the cluster level (e.g. full replica sets downtime, storage replacements, etc).

In the current implementation, all 5 oplog processes (one per source shard) run on an administrative instance that is continually monitored for delays.

A key design step required in making our inverted pyramid possible was splitting our data store into five ‘segment’ databases, which are provisioned and depleted by two external jobs. This made it possible to drop data (at the db level) from the first two layers, RecentDB and HistoricalDB, in an orderly fashion without impacting any part of the application or compromising performance.

The last step of our data migration consists of offloading all data that passes the 90-day mark from the segment databases to S3. To accomplish this, each HistoricalDB secondary node is provisioned with two Python modules that parse through, collect, and export (in CSV format) all data older than 90 days. The legacy data is then uploaded into an S3 bucket and made available to other parts of our system.

An added benefit of our data architecture is the ability to use HistoricalDB on the off chance that a major issue impacts RecentDB. Although there is a storage space trade-off that comes with storing the data in the 0-20 day intervals on both clusters, having HistoricalDB on hand has proven useful for us in the past, with the AWS EAST-1 crash in the summer of 2012 being the most recent incident.

Conclusion

With MongoDB, we were able to quickly develop the query processes we needed to efficiently serve our customers, all on a flexible database architecture that stresses high performance and redundancy. MongoDB has been a partner that continues to deliver as we grow and tackle new challenges.

To learn more about how MongoDB can have a significant impact on your business, download our whitepaper How a Database Can Make Your Organization Faster, Better, Leaner.

Tiered Storage Models in MongoDB: Optimizing Latency and Cost

May 14 • Posted 5 months ago

By Rohit Nijhawan, Senior Consulting Engineer at MongoDB with André Spiegel and Chad Tindel

For a user-facing application, speed and uptime are critical to success. There are a number of ways you can tune your application and hardware setup to provide the best experience for your customers — the trick is doing so at optimal cost. Here we provide an example for improving performance and lowering costs with MongoDB using Tiered Storage, a method of prioritizing data storage based on latency requirements.

In this example, we will be segmenting data by date: recent data is more frequently accessed and should exhibit lower latency than less recent data. However, the idea applies to other ways of segmenting data, such as location, user, source, size, or other criteria. This approach takes advantage of a powerful feature in MongoDB called tag-aware sharding that has been available since MongoDB 2.2.

Example Application: Insurance Claims

In many applications, low-latency access to data becomes less important as data ages. For example, an insurance company might prioritize access to claims from the last 12 months. Users should be able to view recent claims quickly, but once claims are more than a year old they tend to be accessed much less frequently, and the latency requirements tend to become less demanding.

By creating tiers of storage with different performance and cost profiles, the insurance company can provide a better experience for users while optimizing their costs. Older claims can be stored in a storage tier with more cost-effective hardware such as commodity hard drives. More recent data can be stored in a high-performance storage tier that provides lower latency such as SSD. Because the majority of the claims are more than a year old, storing older data in the lower-cost tier can provide significant cost advantages. The insurance company can optimize their hardware spread across the two tiers, providing a great user experience at an optimized cost point.

The requirements for this application can be summarized as:

The trailing 12 months of claims should reside on faster storage tier Claims over a year old should move to slower storage tier Over time new claims arrive, and older claims need to move from the faster tier to the slower tier

For simplicity, throughout this overview, we’ll distinguish the claims data by “current” and “tier-2” data.

Building Your Own Process: An Operational Headache

One approach to these requirements is use periodic batch jobs: selecting data, loading it into the archive, and erasing it from the faster storage. However, this is inherently complex:

  • The move process must be carefully coded to fail gracefully. In the event that a load fails, you don’t want to delete the original data!
  • If the data to be moved is large, you may wish to throttle the operations.
  • If moves succeed partially, you have to retry the unfinished data.
  • Unless you plan on halting your application during the move (generally unacceptable), your application needs custom code to find the data before, during, and after the move.
  • Your application needs to understand the physical location of the data, which unnecessarily complicates your code to the partitioning logic.

Furthermore, introducing another custom component to your operations requires additional maintenance and monitoring.

It’s an operational headache that many teams are forced to endure, but there is a simpler way: have MongoDB handle the load of migrating documents from the recent storage machines to the tier 2 storage machines, transparently. As it turns out, you can easily implement this approach with a feature called Tag-Aware Sharding.

The MongoDB Way: Tag-aware Sharding

MongoDB provides a feature called sharding to scale systems horizontally across multiple machines. Sharding is transparent to your application - whether you have 1 or 100 shards, your application code is the same. For a comprehensive description of sharding please see the Sharding Guide.

A key component of sharding is a process called the balancer. As collections grow, the balancer operates in the background to carefully move documents between shards. Normally the balancer works to achieve a uniform distribution of documents across shards. However, with tag-aware sharding we can create policies that affect where documents are stored. This feature can be applied in many use cases. One example is to keep user data in data centers that are near the user. In our application, we can use this feature to keep current data on our fast servers, and tier 2 data on cheaper, slower servers.

Here’s how it works:

  • Shards are assigned tags. A tag is an alphanumeric alias like “London-DC”.
  • Unique shard key ranges are ‘pinned’ to tags.
  • During normal balancing operations, chunks migrate only to shards whose tag is associated with a key range which contains the chunk’s key range*.
  • There are a few subtleties regarding what happens when a chunk’s key range overlaps more than one tag range. Please read the documentation carefully regarding this particular case

This means that we can assign the “tier-2” tag to shards running on slow servers and “current” tags to shards running on fast servers, and the balancer will handle migrating the data between tiers automatically. What’s great is that we can keep all the data in one database, so our application code doesn’t need to change as data moves between storage tiers.

Determining the shard key

When you query a sharded collection, the query router will do its best to only inspect the shards holding your data, but it can only do this if you provide the shard key as part of your query. (See Sharded Cluster Query Routing for more information.)

So we need to make sure that the we look up documents by the shard key. We also know that time is the basis for determining the location of documents in our two storage tiers. Accordingly, the shard key must contain an explicit timestamp. In our example, we’ll be using Enron’s email dataset, and we’ll set the top-level “date” as the shard key. Here’s a sample document:

Because the time is stored in the most significant digits of the date, messages from any given day will numerically precede messages from subsequent days.

Implementation

Here are the the steps to set up this system:

Set up an empty, sharded MongoDB cluster Create a target database to host the sharded collection Assign tags to different shards corresponding to the storage tiers Assign tag ranges to the shards Load data into the MongoDB Cluster

Set up the cluster The first thing you will want to do is set up your sharded cluster. You can see more information on how to set this up here.

In this case we will have a database called “enron” and a collection called “messages” which holds part of the Enron email corpus. In this example, we’ve set up a cluster with three shards. The first, shard0000, is optimized for low-latency access to data. The other two, shard0001 and shard0002, use more cost effective hardware for data that is older than the identified cutoff date.

Here’s our sharded cluster. These are empty machines with no data:

Adding the tags We can “tag” each of these shards to associate them with documents that should belong to our “current” tier or those that should belong to “tier-2.” In the absence of tags and range based tags, balancing will try to ensure that the number of chunks on each shard are equal without regard to any other data in the fields. Before we add the data to our collection, let’s tag shard0000 as “current” and the other two as “tier-2”:

Now we can verify our tags by calling sh.status():

Next, we need to set up a database and collection for the Enron emails. We’ll set up a new database ‘enron’ with a collection called ‘messages’ and enable sharding on that collection:

Since we’re going to shard the collection, we’ll need to set up a shard key. We will use the ‘date’ field as our shard key since this is the field that will define how the documents are distributed across shards:

Defining the cutoff date between tiers The cutoff point between “current” data and “tier-2” data is a point in time that we will update periodically to keep the most recent documents in our “current” shard. We will start with a cutoff of July 1, 2001, saved as an ISO Date ISODate(“2001-07-01”). Once we add the data to our collection, we will set this as the tag range. Going forward, when we add documents to the “messages” collection, any documents newer than July 1, 2001 will end up on the “current” shard, and documents older than that will end up on the “tier-2” shard.

It’s important that the two ranges overlap at exactly the same point in time. The lower bound of a tag range is inclusive, and the upper bound is exclusive. This means a document that has an date of exactly ‘ISODate(“2001-07-01”)’ will go on the “current” shard, not the “tier-2” shard.

Below you will see each of the shard’s new tag ranges:

As a final check, look in the config database for the tag range definitions.

Now, that all the shards and ranges are defined, we are ready to load the message data into the server. The collection will follow the instructions given by the tag ranges and land on the correct machines.

Now, let’s check the sharding status to see where the documents reside

That’s it! The mongos process automatically moves documents to comply with the tag ranges. In this example, it took all documents still on the “current” shard with an ISODate older than ISODate(“2001-07-01T00:00:00Z”) and move them to the “tier-2” shard.

The tag ranges must be updated on a regular basis to keep the cutoff point at the correct interval of time in the past (1 year, in our case). In order to do this, both ranges need to be updated. To perform this change the balancer should temporarily be disabled, so there is no point where the ranges overlap. Stopping the balancer temporarily is a safe operation - it will not affect the application or the experience of users.

If you wanted to move the cutoff back another month, to August 1, 2001, you just need to follow these three steps:

Stop the balancer sh.setBalancerState(false) Create a chunk split at August 1 sh.splitAt('enron.messages', {"date" : ISODate("2001-08-01")}) Move the cutoff date to ISODate(“2001-08-01T00:00:00Z”) var configdb=db.getSiblingDB("config"); configdb.tags.update({tag:"tier-2"},{$set:{'max.date':ISODate("2001-08-01")}}) configdb.tags.update({tag:"current"},{$set:{'min.date':ISODate("2001-08-01")}}) Re-start the balancer sh.setBalancerState(true) Verify the sharding status

By updating the chunk split to August 1, we have migrated all the documents with a date after July 1 but before August 1 from the “current” shard to the “tier-2” shards. The good news is that we were able to perform this operation without changing our application code and with no database downtime. We can also see that it would be simple to schedule this process to run automatically through an external process.

From Operational Headache to Simplicity

The end result is one collection spread across three shards and two different storage systems. This solution allows you to lower your storage costs without adding complexity to the architecture of your system. Instead of a complex setup with different databases on different machines we have one database to query, and instead of a data migration we update some simple rules to control the location of data in the system.

Like what you see? Sign up for the MongoDB Newsletter

Introducing mtools

May 8 • Posted 5 months ago

By Thomas Rueckstiess, Kernel Program Manager at MongoDB

mtools is a collection of helper scripts, implemented in Python, to parse and filter MongoDB log files (both for mongod and mongos), to visualize information from log files and to quickly set up complex MongoDB test environments on a local machine.

I started working on mtools a year ago, when I realized I would automate and script most of my daily tasks as an Engineer at MongoDB. Since then, mtools has grown to a suite of flexible, useful command line tools that are being used by many of our Engineers internally, as well as MongoDB customers and users, to diagnose the root cause of system issues.

If you find yourself looking at MongoDB log files to identify system and performance issues, then I encourage you to try mtools as well.

What’s in the box?

mtools in its current version 1.1.4 consists of 5 individual scripts: mloginfo, mlogfilter, mplotqueries, mlogvis and mlaunch.

  • mloginfo should be your first stop on the log file analysis. This script will parse the file quickly and output general information about its contents, including start and end date and time, line numbers, version and whether the file came from a mongos or mongod (if available in the file). In addition, you can request certain “sections” of additional information; currently those are “queries”, “connections”, “restarts” and “distinct”.

  • mlogfilter helps to narrow down the search in log files. The script lets you filter on attributes of log messages, like their namespace (database and collection names), their type of operation (queries, inserts, updates, commands, etc.) or by individual connection. You can also search for slow operations by setting a threshold, identify collection scans (those are the queries not using an index) and other properties. Additional features include slicing the log files by time (with flexible date/time parsing), merging files, shifting them to different time zones or converting timestamp formats, and exporting them to JSON. The key property of mlogfilter is that the output format always remains the same (log lines), so you can pipe the output to another instance of mlogfilter, to the grep command or to other scripts like mplotqueries.

  • mplotqueries takes a log file (mlogfiltered or not) and presents the information visually in various ways. There are a number of options for graph types, such as scatter plots (showing all operations over time vs. their duration), histograms, event and range plots, and other more specialized graphs like connection churn or replica set changes. Independent of graph type, you can assign a specific color to different class categories.

  • mlogvis is mplotqueries’ little brother, it is very similar in its functionality, but provides a web-based alternative using the d3.js javascript visualization library. This is particularly useful if the dependencies required by mplotqueries are not installed/available, or if you want to create a self-contained interactive graph that can be shared with others, such as customers or colleagues. mlogvis will create a single .html file that can be shared, since it loads the d3.js library dynamically.

  • mlaunch is a little different from the other scripts, and actually has nothing to do with log file parsing. mlaunch spins up any number of mongodb nodes on your local machine, either as a stand-alone, as replica sets or sharded clusters. This is useful if you want to do testing or reproduce issues locally. Rather than setting this up manually, mlaunch will start the processes and connect the replica sets or shards together. Within a few seconds, you can have a complex environment running, like a 5 shard cluster, each shard consisting of a replica set, authentication enabled, and any kinds of individual flags you want to pass onto the processes. mlaunch also has options to start and stop individual instances or groups, and to view which ones are running in the current environment and which ones are down.

How does it work?

Rather than going through all the features of each of the scripts, I’d just like to demonstrate two basic use cases. For a full list of features you can visit the mtools wiki, which contains the manual and many usage examples.

Use Case 1: Profiling your Queries with mloginfo

You have a number of slow queries running against MongoDB that are affecting the performance of the database. To get an idea of where MongoDB is slowing down, as a first step take a look at the “queries” section of mloginfo. Here is an example output, created with the following command:

Each line (from left to right) shows the namespace, the query pattern, and various statistics of this particular namespace/pattern combination. The rows are sorted by the “sum” column, descending. Sorting by sum is a good way to see where the database spent most of its time overall. In this example, we see that around half the total time is spent on $ne-type queries in the serverside.scrum_master collection. $ne queries are known to be inefficient since these queries cannot use an index, resulting in a high number of documents scanned. In fact, all of the queries took at least 15 seconds (“min” column). The “count” column also shows that only 20 of the queries were issued, yet these queries contributed to a large amount of the total time spent, more than double the time of the 804 email queries on serverside.user.

When optimizing queries and indexes, starting from the top of this list is a good idea as these optimizations will result in the highest gains in terms of performance.

Use Case 2: Visualizing Log Files with mplotqueries

Another way of looking at query performance and other operations is to visualize them graphically. mplotqueries’ scatter plot (the default) shows the duration of any operation (y-axis) over time (x-axis) and makes it easy to spot long-running operations. The following plot is generated with

mplotqueries mongod.log

and then press L for “logarithmic” y-axis view:

While most of the operations are sub-second (below the 10^3 ms mark), the blue dots immediately stand out, reaching up to the hundreds and thousands of seconds. Clicking on one of the blue dots prints out the relevant log line to stdout:

The getlasterror command is used for write concern. In this case, it blocked until the write was replicated to a majority of nodes in the replica set, which took 16 minutes. That is of course an issue, and because this is a command and not a query (or the query part of an update), it didn’t show up in the previous use case with mloginfo --queries.

To investigate this further, we can overlay the current plot with an “rsstate” plot, that shows replica set status changes over time. The following two commands create an overlay of the two plots:

This shows that for each of the blocking “majority” getlasterrors, replica set members are unavailable. The red vertical lines represent a node being DOWN, preceding the yellow lines for a node being in SECONDARY state again, at which point the getlasterror commands finally succeed.

From here, the next step would be to look at all the log files of the replica set at one of the incidents and investigate why the secondaries became unavailable:

This last command merges the log files of the three replica set members by time, each line prefixed with the filename, slices out a 5-minute window at the first instance of the issue and prints the lines back to stdout.

What’s Next?

This should give you a sense of how to use mtools for diagnosing and debugging issues affecting your MongoDB system. You can organize and visualize data in a number of ways, form a hypothesis, filter out noise and dig deeper into issues affecting your deployment, all from MongoDB log files.

mtools contains many more useful features that our Support team uses daily in working through customer cases. The best way to learn how you can leverage these scripts is to download and install mtools and follow some of the examples on the mtools wiki page. mtools is open source and available for download on github. It is also in the PyPI package index and can be installed via pip. If you have any questions, bug reports or feature requests, simply go to the mtools github issues page and open an issue.

My colleague Asya Kamsky (from askasya.com) will show some more examples on how mtools can be useful for diagnosing and troubleshooting in her talk Diagnostics and Debugging at MongoDB World. I’ll be in the “Ask the Experts” sessions, so if you have any questions you can come ask in the Ask the Experts room. You can use my discount code “25ThomasRueckstiess” for 25% off tickets.

MongoDB’s New Bulk API

May 6 • Posted 5 months ago

By Christian Kvalheim, Driver Lead and Node.js Driver Maintainer at MongoDB

The New Bulk API

One of the core new features in MongoDB 2.6 is the new bulk write operations. All the drivers include a new bulk api that allows applications to leverage these new operations using a fluid style API. Let’s explore the API and how it’s implemented in the Node.js driver.

The API

The API has two core concepts. The ordered and the unordered bulk operation. The main difference is in the way the operations are executed in bulk. In the case of an ordered bulk operation, every operation will be executed in the order they are added to the bulk operation. In the case of an unordered bulk operation however there is no guarantee what order the operations are executed. Later we will look at how each is implemented.

Operations

You can initialize an ordered or unordered bulk operation in the following way.

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    var unordered = db.collection('documents').initializeUnorderedBulkOp();

Both the ordered and unordered instances are bulk operation objects that we can add insert, update and remove operations to. The following operations are valid.

updateOne (update first matching document)

```ordered.find({ a : 1 }).updateOne({$inc : {x : 1}});```

update (update all matching documents)

```ordered.find({ a : 1 }).update({$inc : {x : 2}});```

replaceOne (replace entire document)

```ordered.find({ a : 1 }).replaceOne({ x : 2});```

updateOne or upsert (update first existing document or upsert)

```ordered.find({ a : 2 }).upsert().updateOne({ $inc : { x : 1}});```

update or upsert (update all or upsert)

```ordered.find({ a : 2 }).upsert().update({ $inc : { x : 2}});```

replace or upsert (replace first document or upsert)

```ordered.find({ a : 2 }).upsert().replaceOne({ x : 3 });```

removeOne (remove the first document matching)

```ordered.find({ a : 2 }).removeOne();```

remove (remove all documents matching)

```ordered.find({ a : 1 }).remove();```

insert

```ordered.insert({ a : 5});```

What happens under the covers when you start adding operations to a bulk operation? Let’s take a look at the new write operations to see how it works.

The New Write Operations

MongoDB 2.6 introduces a completely new set of write operations. Before 2.6 all write operations where done using wire protocol messages at the socket level. From 2.6 this changes to using commands.

### Insert Write Command

The insert write commands allow an application to insert batches of documents. Here’s an example:

    {
        insert: 'collection name'
      , documents: [{ a : 1}, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

A couple of things to note. The documents field contains an array of all the documents that are to be inserted. The writeConcern field specifies what would have previously been a getLastError command that would follow the pre 2.6 write operations. In other words there is always a response from a write operation in 2.6. This means that w:0 has different semantics than what one is used to in pre 2.6. In the context w:0 basically means only return an ack without any information about the success or failure of insert operations.

Let’s take a look at the update and remove write commands before seeing the results that are returned when executing these operations in 2.6.

Update Write Command

There are some slight differences in the update write command in comparison to the insert write command. Here’s an example:

    {
        update: 'collection name'
      , updates: [{ 
            q: { a : 1 }
          , u: { $inc : { x : 1}}
          , multi: true/false
          , upsert: true/false
        }, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

The main difference here is that the updates array is an array of update operations where each entry in the array contains the q field that specifies the selector for the update. The u contains the update operation. multi specifies if we will updateOne or updateAll documents that matches the selection. Finally upsert tells the server if it will perform an upsert if the document is not found.

Finally let’s look at the remove write command.

Remove Write Command

The remove write command is very similar to the update write command. Here’s an example:

    {
        delete: 'collection name'
      , deletes: [{ 
            q: { a : 1 }
          , limit: 0/1
        }, ...]
      , writeConcern: {
        w: 1, j: true, wtimeout: 1000
      }
      , ordered: true/false
    }

Similar to the update example, we can see that the entries in the deletes array contain documents with specific fields. The q field is the selector that will match which documents will be removed. The limit field sets the number of elements to be removed. Currently limit only supports two values, 0 and 1. The value 0 for limit removes all documents that match the selector. A value of 1 for limit removes the first matching document only.

Now let’s take a look at how results are returned for these new write commands.

Write Command Results

One of the best new aspects of the new write commands is that they can return information about each individual operation error in the batch. Results are efficient - only information about errors are returned as well as the aggregated counts of successful operations. Here’s an example of a comprehensive* result:

    {
      "ok" : 1,
      "n" : 0,
      "nModified": 1, (Applies only to update)
      "nRemoved": 1, (Applies only to removes)
      "writeErrors" : [
        {
          "index" : 0,
          "code" : 11000,
          "errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: t1.t.$a_1  dup key: { : 1.0 }"
        }
      ],
      writeConcernError: {
        code : 22,
        errInfo: { wtimeout : true },
        errmsg: "Could not replicate operation within requested timeout"
      }      
    }

The two most interesting fields here are writeErrors and writeConcernError. If we take a look at writeErrors we can see how it’s an array of objects that include an index field as well as a code and errmsg. The field references the position of the failing document in the original documents, updates or deletes array allowing the application to identify the original batch document that failed.

The Effect of Ordered (true/false)

If ordered is set to true the write operation will fail on the first write error (meaning the first error that fails to apply the operation to memory). If one sets ordered to false the operation will continue until all operations have been executed (potentially in parallel), then return all the results. writeConcernError on the other hand does not stop the processing of a bulk operation if a document fails to be written to MongoDB.

It helps to think of writeErrors as hard errors and writeConcernError as a soft error.

The Special Case of w:0 As I mentioned previously, the semantics for w:0 changed for the write commands. The old style of write operations before 2.6 are a combination of a write wire message and a getLastError command. In the old style w:0 meant that the driver would not send a getLastError command after the write operation.

In 2.6 the new insert/update/delete commands will always respond. While w:0 would not return a result in versions of MongoDB before 2.6, in 2.6 and above it will. However it will truncate all the results and only return if the command ran successfully or failed.

As a result, if you execute.

    {
        insert: 'collection name'
      , documents: [{ a : 1}, ...]
      , writeConcern: {
        w: 0
      }
      , ordered: true/false
    }

All you receive from the server is the result

{ok : 1}

The Implication For The Bulk API

There are some implications to the fact that write commands are not mixed operations but either insert/update or removes. The Bulk API lets you mix operations and then merges the results back into a single result that simulates a mixed operations command in MongoDB. What does that mean in practice. Well let’s look at how node.js implements ordered and unordered bulk operations. Let’s use examples to show what happens.

Ordered Operations

Let’s take the following set of operations:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.insert({ a: 2 });
    ordered.find({ a : 2 }).remove();
    ordered.insert({ a: 3 });

When running in ordered mode the bulk API guarantees the ordering of the operations and thus will execute this as 5 operations one after the other:

    insert bulk operation
    update bulk operation
    insert bulk operation
    remove bulk operation
    insert bulk operation

We have now reduced the bulk API to performing single operations and your throughput suffers accordingly.

If we re-order our bulk operations in the following way:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.insert({ a: 2 });
    ordered.insert({ a: 3 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.find({ a : 2 }).remove();

The execution is reduced to the following operations one after the other:

    insert bulk operation
    update bulk operation
    remove bulk operation

Thus for ordered bulk operations the order of operations will impact the number of write commands that need to be executed and thus the throughput possible.

Unordered Operations

Unordered operations do not guarantee the execution order of operations. Let’s take the example from above:

    var ordered = db.collection('documents').initializeOrderedBulkOp();
    ordered.insert({ a : 1 });
    ordered.find({ a : 1 }).update({ $inc: { x : 1 }});
    ordered.insert({ a: 2 });
    ordered.find({ a : 2 }).remove();
    ordered.insert({ a: 3 });

The Node.js driver will collect the operations into separate type-specific operations. So we get.

    insert bulk operation
    update bulk operation
    remove bulk operation

In difference to the ordered operation these bulks all get executed in parallel in Node.js and the results then merged when they have all finished.

Takeaway

MongoDB as of 2.6 only allows batches of inserts, updates or removes and not a mixed batch containing all three of the operation types. When performing ordered bulk operation we need to keep this in mind to avoid the scenario above. However for an unordered bulk operation the missing mixed batch type in 2.6 does not impact performance.

Note: Although the Bulk API actually supports downconversion to 2.4 the performance impact is considerable as all operations are reduced to single write operations with a getLastError. It’s recommended to leverage this API primarily with 2.6 or higher.

Like what you see? Sign up to the MongoDB Newsletter and get monthly updates straight to your inbox

Betting the Farm on MongoDB

May 1 • Posted 5 months ago

This is a guest post by Jon Dokulil, VP of Engineering at Hudl. Hudl’s CTO, Brian Kaiser, will be speaking at MongoDB World about migrating from SQL Server to MongoDB

Hudl helps coaches win. We give sports teams from peewee to the pros online tools to make working with and analyzing video easy. Today we store well over 600 million video clips in MongoDB spread across seven shards. Our clips dataset has grown to over 350GB of data with over 70GB of indexes. From our first year of a dozen beta high schools we’ve grown to service the video needs of over 50,000 sports teams worldwide.

Why MongoDB

When we began hacking away on Hudl we chose SQL Server as our database. Our backend is written primarily in C#, so it was a natural choice. After a few years and solid company growth we realized SQL Server was quickly becoming a bottleneck. Because we run in EC2, vertically scaling our DB was not a great option. That’s when we began to look at NoSQL seriously and specifically MongoDB. We wanted something that was fast, flexible and developer-friendly.

After comparing a few alternative NoSQL databases and running our own benchmarks, we settled on MongoDB. Then came the task of moving our existing data from SQL Server to MongoDB. Video clips were not only our biggest dataset, it was also our most frequently-accessed data. During our busy season we average 75 clip views per second but peak at over 800 per second. We wanted to migrate the dataset with zero downtime and zero data loss. We also wanted to have fail-safes ready during each step of the process so we could recover immediately from any unanticipated problems during the migration.

In this post we’ll take a look at our schema design choices, our migration plan and the performance we’ve seen with MongoDB.

Schema Design

In SQL Server we normalized our data model. Pulling together data from multiple tables is SQL’s bread-and-butter. In the NoSQL world joins are not an option and we knew that simply moving the SQL tables directly over to MongoDB and doing joins in code was a bad idea. So, we looked at how our application interacted with SQL and created an optimized schema in MongoDB.

Before I get into the schema we chose, I’ll try to provide context to Hudl’s product. Below is a screenshot of our ‘Library’ page. This is where coaches spend much of their time reviewing and analyzing video.

You see above a video playing and a kind of spreadsheet underneath. The video represents one angle of one clip (many of our teams film two or three angles each game). The spreadsheet contains rows of clips and columns of breakdown data. The breakdown data gives context to what happened in the clip. For example, the second clip was a defensive play from the 30 yard line. It was first and ten and was a run play to the left. This breakdown data is incredibly important for coaches to spot patterns and trends in their opponents play (as well as make sure they don’t have an obvious patterns that could be used against them).

When we translated this schema to MongoDB we wanted to optimize for the most-common operations. Watching video clips and editing clip metadata are our two highest frequency operations. To maximize performance we made a few important decisions.

  1. We chose to encapsulate an entire clip per document. Watching a clip would involve a single document lookup. Because MongoDB stores each document contiguously on disk, it would minimize the number of disk seeks when fetching a clip not in memory, which means faster clip loads.
  2. We denormalized our column names to speed up both writes and reads. Writes are faster because we no longer have to lookup or track Column IDs. A write operation is as simple as:
    db.clips.update({teamId:205, _id:123}, 
    {$set: {'data.PLAY TYPE':'Pass'}}) 
    Reads are also faster because we no longer have to join on the ClipDataColumn table to get the column names. This comes at a cost of greater storage and memory requirements as we store the same column names in multiple documents. Despite that, we felt the performance benefits were worth the cost.

One of the most important considerations when designing a schema in MongoDB is choosing a shard key. Have a good shard key is critical for effective horizontal scaling. Data is stored in shards (each shard is a replica set) and we can add new shards easily as our dataset grows. Replica sets don’t need to know about each other, they are only concerned with their own data. The MongoDB Router (mongos) is the piece that sees the whole picture. It knows which shard houses each document.

When you perform a query against a sharded collection, the shard key is not required. However, there is a cost penalty for not providing the shard key. The key is used to know which shard contains the answer to your query. Without it, the query has to be sent to all shards in your cluster. To illustrate this, I’ve got a four shard cluster. The shard key is TeamId (the property is named ‘t’), and you can see that clips belonging to teams 1-100 live on Shard 1, 101-200 live on Shard 2, etc. Given the query to find clip ‘123’, only Shard 3 will respond with results, but Shards 1, 2 and 4 must also process and execute the query. This is known as a scatter/gather query. In low volume this is ok, but you won’t see the benefits of horizontal scalability if every query has to be sent to all shards. Only when the shard key is provided can the query be sent directly to Shard 3. This is known as a targeted query.

For our Clips collection, we chose TeamId as our shard key. We looked at a few different possible shard keys:

  1. We considered sharding by clipId (_id) but decided against it because we let coaches organize clips into playlists (similar to a song playlist in iTunes or Spotify). While queries to all clips in a playlist are less common than grabbing an individual clip, they are common enough that we wanted it to use a targeted query.
  2. We also considered sharding by the playlist Id, but we wanted the ability for clips to be a part of multiple playlists. The shard key, once set, is immutable. Clips can be added or removed from playlists at any time.
  3. We finally settled on TeamId. TeamId is easily available to us when making the vast majority of our queries to the Clips collection. Only for a few infrequent operations would we need to use scatter/gather queries.

The Transition

As I mentioned, we needed to transition from SQL Server to MongoDB with zero downtime. In case anything went wrong, we needed fallbacks and fail-safes along the way. Our approach was two-fold. In the background we ran a process that ‘fork-lifted’ data from SQL Server to MongoDB. While that ran in the background, we created a multiplexed DAO (data access object, our db abstraction layer) that would only read from SQL but would write to both SQL and MongoDB. That allowed us to batch-move all clips without having to worry about stale data. Once the two databases were completely synced up, we switched over to perform all reads from MongoDB. We continued to dual-write so we could easily switch back to SQL Server if problems arose. After we felt confident in our MongoDB solution, we pulled the plug on SQL Server.

In step one we took a look at how we read and wrote clip data. That let us design an optimal MongoDB schema. We then refactored our existing database abstraction layer to use data-structures that matched the MongoDB schema. This gave us a chance to prove out the schema ahead of time.

Next we began sending write operations to both SQL and MongoDB. This was an important step because it allowed our data fork-lifting process work through all clips one after another while protecting us from data corruption.

The data fork-lifting process took about a week to complete. The time was due to both the large size of the dataset and our own throttling logic. We throttled the rate of data migration to minimize the impact on normal operations. We didn’t want coaches to feel any pain during this migration.

After the data fork-lift was complete we began the process of reading from MongoDB. We built in the ability to progressively send more and more read traffic to MongoDB. That allowed us to gain confidence in our code and the MongoDB cluster without having to switch all-at-once. After a while with dual writes but all MongoDB reads, we turned off dual writes and dropped the tables in SQL Server. It was both a scary moment (sure, we had backups… but still!) and very satisfying. Our SQL database size was reduced by over 80GB. Of that total amount, 20GB was index data, which means our memory footprint was also greatly reduced.

Performance

We have been thrilled with the performance of MongoDB. MongoDB exceeded our average performance goal of 100ms and, just as important, is consistently performant. While it’s good to keep an eye on average times, it’s more important to watch the 90th and 99th percentile performance metrics. With MongoDB our average clip load time is around 18ms and our 99th percentile times are typically at or under 100ms.

Clip load times during the same time period during season

Conclusion

Our transition from SQL Server to MongoDB started with our largest and most critical dataset. After having gone through it, we are very happy with the performance and scalability of MongoDB and appreciate how developer-friendly it is to work with. Moving from a relational to a NoSQL database naturally has a learning curve. Now that we are over it we feel very good about our ability to scale well into the future. Perhaps most telling of all, most new feature development at Hudl is done in MongoDB. We feel MongoDB lets us focus more on writing features to help coaches win and less time crafting database scripts.

Sign up for the MongoDB Newsletter to get MongoDB updates right to your inbox

The MongoDB Open Source Hack Contest

Apr 28 • Posted 5 months ago

Some of the best MongoDB tools come from the Open Source community. Projects like the Node.js Driver, Mongoose and Meteor have become the backbone of many MongoDB apps and have helped support the developer community all over the world. We want to see more of what the community has built.

For the month of May, we’ll be hosting a worldwide hack contest for Open Source tools built on or connected to MongoDB. The winner of the contest will receive a ticket to OSCON, furnished by O’Reilly.

Guidelines:

  • All projects must be built with the MongoDB source code or on top of a MongoDB API, community or MongoDB supported driver
  • Any new drivers created should abide by the driver requirements listed in the MongoDB Manual
  • All hacks will be judged by MongoDB engineers

All entries can be submitted to community@MongoDB.com with the following information before May 31

  • Github or Bitbucket URL
  • Description of the project
  • How do users benefit from this application?
  • Why did you choose to contribute to MongoDB?

We’re looking forward to seeing your hacks come in!

Want to keep up-to-date on MongoDB news and events? Sign up for the MongoDB Monthly Newsletter”

maxTimeMS() and Query Optimizer Introspection in MongoDB 2.6

Apr 23 • Posted 6 months ago

By Alex Komyagin, Technical Service Engineer in MongoDB’s New York Office

In this series of posts we cover some of the enhancements in MongoDB 2.6 and how they will be valuable for users. You can find a more comprehensive list of changes in our 2.6 release notes. These are changes I believe you will find helpful, especially for our advanced users.

This post introduces two new features: maxTimeMS and query optimizer introspection. We will also look at a specific support case where these features would have been helpful.

maxTimeMS: SERVER-2212

The socket timeout option (e.g. MongoOptions.socketTimeout in the Java driver) specifies how long to wait for responses from the server. This timeout governs all types of requests (queries, writes, commands, authentication, etc.). The default value is “no timeout” but users tend to set the socket timeout to a relatively small value with the intention of limiting how long the database will service a single request. It is thus surprising to many users that while a socket timeout will close the connection, it does not affect the underlying database operation. The server will continue to process the request and consume resources even after the connection has closed.

Many applications are configured to retry queries when the driver reports a failed connection. If a query whose connection is killed by a socket timeout continues to consume resources on the server, a retry can give rise to a cascading effect. As queries run in MongoDB, the application may retry the same resource intensive query multiple times, increasing the load on the system leading to severe performance degradation. Before 2.6 the only way to cancel these queries was to issue db.killOp, which requires manual intervention.

MongoDB 2.6 introduces a new queryparameter, maxTimeMS, which limits how long an operation can run in the database. When the time limit is reached, MongoDB automatically cancels the query. maxTimeMS can be used to efficiently prevent unexpectedly slow queries from overloading MongoDB. Following the same logic, maxTimeMS can also be used to identify slow or unoptimized queries.

There are a few important notes about using maxTimeMS:

  • maxTimeMS can be set on a per-operation basis — long running aggregation jobs can have a different setting than simple CRUD operations.
  • requests can block behind locking operations on the server, and that blocking time is not counted, i.e. the timer starts ticking only after the actual start of the query when it initially acquires the appropriate lock;
  • operations are interrupted only at interrupt points where an operation can be safely aborted — the total execution time may exceed the specified maxTimeMS value;
  • maxTimeMS can be applied to both CRUD operations and commands, but not all commands are interruptible;
  • if a query opens a cursor, subsequent getmore requests will be included in the total time (but network roundtrips will not be counted);
  • maxTimeMS does not override the inactive cursor timeout for idle cursors (default is 10 min).

All MongoDB drivers support maxTimeMS in their releases for MongoDB 2.6. For example, the Java driver will start supporting it in 2.12.0 and 3.0.0, and Python - in version 2.7.

To illustrate how maxTimeMS can come in handy, let’s talk about a recent case that the MongoDB support team was working on. The case involved performance degradation of a MongoDB cluster that is powering a popular site with millions of users and heavy traffic. The degradation was caused by a huge amount of read queries performing full scans of the collection. The collection in question stores user comment data, so it has billions of records. Some of these collection scans were running for 15 minutes or so, and they were slowing down the whole system.

Normally, these queries would use an index, but in this case the query optimizer was choosing an unindexed plan. While the solution to this kind of problem is to hint the appropriate index, maxTimeMS can be used to prevent unintentional runaway queries by controlling the maximum execution time. Queries that exceed the maxTimeMS threshold will error out on the application side (in the Java driver it’s the MongoExecutionTimeoutException exception). maxTimeMS will help users to prevent unexpected performance degradation, and to gain better visibility into the operation of their systems.

In the next section we’ll take a look at another feature which would help in diagnosing the case we just discussed.

Query Optimizer Introspection: SERVER-666

To troubleshoot the support case described above and to understand why the query optimizer was not picking up the correct index for queries, it would have been very helpful to know a few things, such as whether the query plan had changed and to see the query plan cache.

MongoDB determines the best execution plan for a query and stores this plan in a cache for reuse. The cached plan is refreshed periodically and under certain operational conditions (which are discussed in detail below). Prior to 2.6 this caching behavior was opaque to the client. Version 2.6 provides a new query subsystem and query plan cache. Now users have visibility and control over the plan cache using a set of new methods for query plan introspection.

Each collection contains at most one plan cache. The cache is cleared every time a change is made to the indexes for the collection. To determine the best plan, multiple plans are executed in parallel and the winner is selected based on the amount of retrieved results within a fixed amount of steps, where each step is basically one “advance” of the query cursor. When a query has successfully passed the planning process, it is added to the cache along with the related index information.

A very interesting feature of the new query execution framework is the plan cache feedback mechanism. MongoDB stores runtime statistics for each cached plan in order to remove plans that are determined to be the cause of performance degradation. In practice, we don’t see these degradations often, but if they happen it is usually a consequence of a change in the composition of the data. For example, with new records being inserted, an indexed field may become less selective, leading to slower index performance. These degradations are extremely hard to manually diagnose, and we expect the feedback mechanism to automatically handle this change if a better alternative index is present.

The following events will result in a cached plan removal:

  • Performance degradation
  • Index add/drop
  • No more space in cache (the total number of plans stored in the collection cache is currently limited to 200; this is a subject to change in future releases)
  • Explicit commands to mutate cache
  • After the number of write operations on a collection exceeds the built-in limit, the whole collection plan cache is dropped (data distribution has changed)

MongoDB 2.6 supports a set of commands to view and manipulate the cache. List all known query shapes (planCacheListQueryShapes), display cached plans for a query shape (planCacheListPlans), manual removal of a query shape from the cache as well as emptying the whole cache (planCacheClear).

Here is an example invocation of the planCacheListQueryShapes command, that lists the shape of the query

db.test.find({first_name:"john",last_name:"galt"},

{_id:0,first_name:1,last_name:1}).sort({age:1}):

> db.runCommand({planCacheListQueryShapes: "test"})
{
    "shapes" : [
        {
            "query" : {
                "first_name" : "alex",
                "last_name" : "komyagin"
            },
            "sort" : {
                "age" : 1
            },
            "projection" : {
                "_id" : 0,
                "first_name" : 1,
                "last_name" : 1
            }
        }
    ],
    "ok" : 1
}

The exact values in the query predicate are insignificant in determining the query shape. For example, a query predicate

{first_name:"john", last_name:"galt"} is equivalent to the query predicate {first_name:"alex", last_name:"komyagin"}.

Additionally, with log level set to 1 or greater MongoDB will log plan cache changes caused by the events listed above. You can set the log level using the following command:

use admin
db.runCommand( { setParameter: 1, logLevel: 1 } )

Please don’t forget to change it back to 0 afterwards, since log level 1 logs all query operations and it can negatively affect the performance of your system.

Together, the new plan cache and the new query optimizer should make related operations more transparent and help users to have the visibility and control necessary for maintaining predictable response times for their queries.

You can see how these new features work yourself, by trying our latest release MongoDB 2.6 available for download here. I hope you will find these features helpful. We look forward to hearing your feedback, please post your questions in the mongodb-user google group.

Like what you see? Get MongoDB updates straight to your inbox

MongoDB 2.6: Our Biggest Release Ever

Apr 8 • Posted 6 months ago

By Eliot Horowitz, CTO and Co-founder, MongoDB

Discuss on Hacker News

In the five years since the initial release of MongoDB, and after hundreds of thousands of deployments, we have learned a lot. The time has come to take everything we have learned and create a basis for continued innovation over the next ten years.

Today I’m pleased to announce that, with the release of MongoDB 2.6, we have achieved that goal. With comprehensive core server enhancements, a groundbreaking new automation tool, and critical enterprise features, MongoDB 2.6 is by far our biggest release ever.

You’ll see the benefits in better performance and new innovations. We re-wrote the entire query execution engine to improve scalability, and took our first step in building a sophisticated query planner by introducing index intersection. We’ve made the codebase easier to maintain, and made it easier to implement new features. Finally, MongoDB 2.6 lays the foundation for massive improvements to concurrency in MongoDB 2.8, including document-level locking.

From the very beginning, MongoDB has offered developers a simple and elegant way to manage their data. Now we’re bringing that same simplicity and elegance to managing MongoDB. MongoDB Management Service (MMS), which already provides 35,000 MongoDB customers with monitoring and alerting, now provides backup and point-in-time restore functionality, in the cloud and on-premises.

We are also announcing a game-changing feature coming later this year: automation, also with hosted and on-premises options. Automation will allow you to provision and manage MongoDB replica sets and sharded clusters via a simple yet sophisticated interface.

MongoDB 2.6 brings security, integration and analytics enhancements to ease deployment in enterprise environments. LDAP, x.509 and Kerberos authentication are critical enhancements for organizations that require a single authentication mechanism across their entire infrastructure. To enhance security, MongoDB 2.6 implements TLS encryption, user-defined roles, auditing and field-level redaction, a critical building block for trusted systems. IBM Guardium also now offers integration with MongoDB, providing more extensive auditing abilities.

These are only a few of the key improvements; read the full official release notes for more details.

MongoDB 2.6 was a major endeavor and bringing it to fruition required hard work and coordination across a rapidly growing team. Over the past few years we have built and invested in that team, and I can proudly say we have the experience, drive and determination to deliver on this and future releases. There is much still to be done, and with MongoDB 2.6, we have a foundation for the next decade of database innovation.

Like what you see? Get MongoDB updates straight to your inbox

MongoDB Innovation Awards: Call for Nominations

Apr 2 • Posted 6 months ago

Fortune 500 enterprises, startups, hospitals, governments and organizations of all kinds use MongoDB because it is the best database for modern applications. The MongoDB Innovation Awards recognize organizations and individuals who are changing the world with MongoDB.

Nominations are open to any individual representing an organization that runs MongoDB, including partner solutions built on or for MongoDB.

The deadline to submit is midnight eastern time on May 30. A panel of judges will review nominations. Winners will be announced at MongoDB World on June 23-25.

Each winner will receive:

  • MongoDB Innovation Award
  • Pass to MongoDB World
  • Invitation to award winners reception at MongoDB World
  • Inclusion in Innovation Awards press release, blog post and email newsletter
  • $2,500 Amazon Gift Card

Submit a nomination by completing this nomination form — you’ll receive a discount code for MongoDB World.

Running MongoDB Queries Concurrently With Go

Mar 24 • Posted 7 months ago

This is a guest post by William Kennedy, managing partner at Ardan Studios in Miami, FL, a mobile and web app development company. Bill is also the author of the blog GoingGo.Net and the organizer for the Go-Miami and Miami MongoDB meetups in Miami. Bill looked for a new language in 2013 that would allow him to develop back end systems in Linux and found Go. He has never looked back.

If you are attending GopherCon 2014 or plan to watch the videos once they are released, this article will prepare you for the talk by Gustavo Niemeyer and Steve Francia. It provides a beginners view for using the Go mgo driver against a MongoDB database.

Introduction

MongoDB supports many different programming languages thanks to a great set of drivers. One such driver is the MongoDB Go driver which is called mgo. This driver was developed by Gustavo Niemeyer from Canonical with some assistance from MongoDB Inc. Both Gustavo and Steve Francia, the head of the drivers team, will be talking at GopherCon 2014 in April about “Painless Data Storage With MongoDB and Go”. The talk describes the mgo driver and how MongoDB and Go work well together for building highly scalable and concurrent software.

MongoDB and Go let you build scalable software on many different operating systems and architectures, without the need to install frameworks or runtime environments. Go programs are native binaries and the Go tooling is constantly improving to create binaries that run as fast as equivalent C programs. That wouldn’t mean anything if writing code in Go was complicated and as tedious as writing programs in C. This is where Go really shines because once you get up to speed, writing programs in Go is fast and fun.

In this post I am going to show you how to write a Go program using the mgo driver to connect and run queries concurrently against a MongoDB database. I will break down the sample code and explain a few things that can be a bit confusing to those new to using MongoDB and Go together.

Sample Program

The sample program connects to a public MongoDB database I have hosted with MongoLab. If you have Go and Bazaar installed on your machine, you can run the program against my database. The program is very simple - it launches ten goroutines that individually query all the records from the buoy_stations collection inside the goinggo database. The records are unmarshaled into native Go types and each goroutine logs the number of documents returned:

Now that you have seen the entire program, we can break it down. Let’s start with the type structures that are defined in the beginning:

The structures represent the data that we are going to retrieve and unmarshal from our query. BuoyStation represents the main document and BuoyCondition and BuoyLocation are embedded documents. The mgo driver makes it easy to use native types that represent the documents stored in our collections by using tags. With the tags, we can control how the mgo driver unmarshals the returned documents into our native Go structures.

Now let’s look at how we connect to a MongoDB database using mgo:

We start with creating a mgo.DialInfo object. Connecting to a replica set can be accomplished by providing multiple addresses in the Addrs field or with a single address. If we are using a single host address to connect to a replica set, the mgo driver will learn about any remaining hosts from the replica set member we connect to. In our case we are connecting to a single host.

After providing the host, we specify the database, username and password we need for authentication. One thing to note is that the database we authenticate against may not necessarily be the database our application needs to access. Some applications authenticate against the admin database and then use other databases depending on their configuration. The mgo driver supports these types of configurations very well.

Next we use the mgo.DialWithInfo method to create a mgo.Session object. Each session specifies a Strong or Monotonic mode, and other settings such as write concern and read preference. The mgo.Session object maintains a pool of connections to MongoDB. We can create multiple sessions with different modes and settings to support different aspects of our applications.

The next line of code sets the mode for the session. There are three modes that can be set, Strong, Monotonic and Eventual. Each mode sets a specific consistency for how reads and writes are performed. For more information on the differences between each mode, check out the documentation for the mgo driver.

We are using Monotonic mode which provides reads that may not entirely be up to date, but the reads will always see the history of changes moving forward. In this mode reads occur against secondary members of our replica sets until a write happens. Once a write happens, the primary member is used. The benefit is some distribution of the reading load can take place against the secondaries when possible.

With the session set and ready to go, next we execute multiple queries concurrently:

This code is classic Go concurrency in action. First we create a sync.WaitGroup object so we can keep track of all the goroutines we are going to launch as they complete their work. Then we immediately set the count of the sync.WaitGroup object to ten and use a for loop to launch ten goroutines using the RunQuery function. The keyword go is used to launch a function or method to run concurrently. The final line of code calls the Wait method on the sync.WaitGroup object which locks the main goroutine until everything is done processing.

To learn more about Go concurrency and better understand how this particular piece of code works, check out these posts on concurrency and channels.

Now let’s look at the RunQuery function and see how to properly use the mgo.Session object to acquire a connection and execute a query:

The very first thing we do inside of the RunQuery function is to defer the execution of the Done method on the sync.WaitGroup object. The defer keyword will postpone the execution of the Done method, to take place once the RunQuery function returns. This will guarantee that the sync.WaitGroup objects count will decrement even if an unhandled exception occurs.

Next we make a copy of the session we created in the main goroutine. Each goroutine needs to create a copy of the session so they each obtain their own socket without serializing their calls with the other goroutines. Again, we use the defer keyword to postpone and guarantee the execution of the Close method on the session once the RunQuery function returns. Closing the session returns the socket back to the main pool, so this is very important.

To execute a query we need a mgo.Collection object. We can get a mgo.Collection object through the mgo.Session object by specifying the name of the database and then the collection. Using the mgo.Collection object, we can perform a Find and retrieve all the documents from the collection. The All function will unmarshal the response into our slice of BuoyStation objects. A slice is a dynamic array in Go. Be aware that the All method will load all the data in memory at once. For large collections it is better to use the Iter method instead. Finally, we just log the number of BuoyStation objects that are returned.

Conclusion

The example shows how to use Go concurrency to launch multiple goroutines that can execute queries against a MongoDB database independently. Once a session is established, the mgo driver exposes all of the MongoDB functionality and handles the unmarshaling of BSON documents into Go native types.

MongoDB can handle a large number of concurrent requests when you architect your MongoDB databases and collections with concurrency in mind. Go and the mgo driver are perfectly aligned to push MongoDB to its limits and build software that can take advantage of all the computing power that is available.

The mgo driver provides a safe way to leverage Go’s concurrency support and you have the flexibility to execute queries concurrently and in parallel. It is best to take the time to learn a bit about MongoDB replica sets and load balancer configuration. Then make sure the load balancer is behaving as expected under the different types of load your application can produce.

Now is a great time to see what MongoDB and Go can do for your software applications, web services and service platforms. Both technologies are being battle tested everyday by all types of companies, solving all types of business and computing problems.

Processing 2 Billion Documents A Day And 30TB A Month With MongoDB

Mar 14 • Posted 7 months ago

This is a guest post by David Mytton. He has been programming Python for over 10 years and founded his website and and monitoring company, Server Density, back in 2009.

Server Density processes over 30TB/month of incoming data points from the servers and web checks we monitor for our customers, ranging from simple Linux system load average to website response times from 18 different countries. All of this data goes into MongoDB in real time and is pulled out when customers need to view graphs, update dashboards and generate reports.

We’ve been using MongoDB in production since mid-2009 and have learned a lot over the years about scaling the database. We run multiple MongoDB clusters but the one storing the historical data does the most throughput and is the one I shall focus on in this article, going through some of the things we’ve done to scale it.

1. Use dedicated hardware, and SSDs

All our MongoDB instances run on dedicated servers across two data centers at Softlayer. We’ve had bad experiences with virtualisation because you have no control over the host, and databases need guaranteed performance from disk i/o. When running on shared storage (e.g., a SAN) this is difficult to achieve unless you can get guaranteed throughput from things like AWS’s Provisioned IOPS on EBS (which are backed by SSDs).

MongoDB doesn’t really have many bottlenecks when it comes to CPU because CPU bound operations are rare (usually things like building indexes), but what really causes problem is CPU steal - when other guests on the host are competing for the CPU resources.

The way we have combated these problems is to eliminate the possibility of CPU steal and noisy neighbours by moving onto dedicated hardware. And we avoid problems with shared storage by deploying the dbpath onto locally mounted SSDs.

I’ll be speaking in-depth about managing MongoDB deployments in virtualized or dedicated hardware at MongoDB World this June.

2. Use multiple databases to benefit from improved concurrency

Running the dbpath on an SSD is a good first step but you can get better performance by splitting your data across multiple databases, and putting each database on a separate SSD with the journal on another.

Locking in MongoDB is managed at the database level so moving collections into their own databases helps spread things out - mostly important for scaling writes when you are also trying to read data. If you keep databases on the same disk you’ll start hitting the throughput limitations of the disk itself. This is improved by putting each database on its own SSD by using the directoryperdb option. SSDs help by significantly alleviating i/o latency, which is related to the number of IOPS and the latency for each operation, particularly when doing random reads/writes. This is even more visible for Windows environments where the memory mapped data files are flushed serially and synchronously. Again, SSDs help with this.

The journal is always within a directory so you can mount this onto its own SSD as a first step. All writes go via the journal and are later flushed to disk so if your write concern is configured to return when the write is successfully written to the journal, making those writes faster by using an SSD will improve query times. Even so, enabling the directoryperdb option gives you the flexibility to optimise for different goals (e.g., put some databases on SSDs and some on other types of disk, or EBS PIOPS volumes, if you want to save cost).

It’s worth noting that filesystem based snapshots where MongoDB is still running are no longer possible if you move the journal to a different disk (and so different filesystem). You would instead need to shut down MongoDB (to prevent further writes) then take the snapshot from all volumes.

3. Use hash-based sharding for uniform distribution

Every item we monitor (e.g., a server) has a unique MongoID and we use this as the shard key for storing the metrics data.

The query index is on the item ID (e.g. the server ID), the metric type (e.g. load average) and the time range; but because every query always has the item ID, it makes it a good shard key. That said, it is important to ensure that there aren’t large numbers of documents under a single item ID because this can lead to jumbo chunks which cannot be migrated. Jumbo chunks arise from failed splits where they’re already over the chunk size but cannot be split any further.

To ensure that the shard chunks are always evenly distributed, we’re using the hashed shard key functionality in MongoDB 2.4. Hashed shard keys are often a good choice for ensuring uniform distribution, but if you end up not using the hashed field in your queries, you could actually hurt performance because then a non-targeted scatter/gather query has to be used.

4. Let MongoDB delete data with TTL indexes

The majority of our users are only interested in the highest resolution data for a short period and more general trends over longer periods, so over time we average the time series data we collect then delete the original values. We actually insert the data twice - once as the actual value and once as part of a sum/count to allow us to calculate the average when we pull the data out later. Depending on the query time range we either read the average or the true values - if the query range is too long then we risk returning too many data points to be plotted. This method also avoids any batch processing so we can provide all the data in real time rather than waiting for a calculation to catch up at some point in the future.

Removal of the data after a period of time is done by using a TTL index. This is set based on surveying our customers to understand how long they want the high resolution data for. Using the TTL index to delete the data is much more efficient than doing our own batch removes and means we can rely on MongoDB to purge the data at the right time.

Inserting and deleting a lot of data can have implications for data fragmentation, but using a TTL index helps because it automatically activates PowerOf2Sizes for the collection, making disk usage more efficient. Although as of MongoDB 2.6, this storage option will become the default.

5. Take care over query and schema design

The biggest hit on performance I have seen is when documents grow, particularly when you are doing huge numbers of updates. If the document size increases after it has been written then the entire document has to be read and rewritten to another part of the data file with the indexes updated to point to the new location, which takes significantly more time than simply updating the existing document.

As such, it’s important to design your schema and queries to avoid this, and to use the right modifiers to minimise what has to be transmitted over the network and then applied as an update to the document. A good example of what you shouldn’t do when updating documents is to read the document into your application, update the document, then write it back to the database. Instead, use the appropriate commands - such as set, remove, and increment - to modify documents directly.

This also means paying attention to the BSON data types and pre-allocating documents, things I wrote about in MongoDB schema design pitfalls.

6. Consider network throughput & number of packets

Assuming 100Mbps networking is sufficient is likely to cause you problems, perhaps not during normal operations, but probably when you have some unusual event like needing to resync a secondary replica set member.

When cloning the database, MongoDB is going to use as much network capacity as it can to transfer the data over as quickly as possible before the oplog rolls over. If you’re doing 50-60Mbps of normal network traffic, there isn’t much spare capacity on a 100Mbps connection so that resync is going to be held up by hitting the throughput limits.

Also keep an eye on the number of packets being transmitted over the network - it’s not just the raw throughput that is important. A huge number of packets can overwhelm low quality network equipment - a problem we saw several years ago at our previous hosting provider. This will show up as packet loss and be very difficult to diagnose.

Conclusions

Scaling is an incremental process - there’s rarely one thing that will give you a big win. All of these tweaks and optimisations together help us to perform thousands of write operations per second and get response times within 10ms whilst using a write concern of 1.

Ultimately, all this ensures that our customers can load the graphs they want incredibly quickly. Behind the scenes we know that data is being written quickly, safely and that we can scale it as we continue to grow.

blog comments powered by Disqus