Posts tagged:

sharding

Sharding Pitfalls: Part I

Oct 1 • Posted 1 day ago

By Adam Comerford, Senior Solutions Engineer

Sharding is a popular feature in MongoDB, primarily used for distributing data across clusters for horizontal scaling. The benefits of sharding for scalability are well known, and often one of the major factors in choosing MongoDB in the first place, but as you add complexity to a distributed system, you increase the chances of hitting a problem.

The good news is that many of the common issues people encounter when moving to a sharded environment are avoidable, and most of them can be mitigated if you have already hit them.

Forewarned is forearmed and so with that in mind, we want users to be aware of best practices and situations to avoid when introducing sharding into your environment. In this three part blog series we will discuss several pitfalls and gotchas that we have seen occur with some regularity among MongoDB users. We’ll give an overview of the problem, how it occurs, how to avoid it and then discuss some possible mitigation strategies to employ if you have already run into this problem.

It should be noted that some of these topics are worthy of full technical articles in their own right, which is beyond the scope of a relatively short blog post. Think of these post as a good starting point and, if you have not yet hit any of these problems, an informative cautionary tale for anyone running a sharded MongoDB cluster. For additional details, please view the Sharding section in the MongoDB Manual.

Many of these topics are also covered as part of the M102 (MongoDB for DBAs) and M202 (Advanced Deployment and Operations) classes that are available for free on MongoDB University.

For our first set of cautionary tales we will focus on shard keys.

1. Using a monotonically increasing shard key (like ObjectID)

Although this is one of the most commonly covered topics on blogs, training material, MongoDB Days and more, the selection of a shard key remains a formidable exercise for the novice MongoDB DBA or developer.

The most common mistake we see is the selection of a monotonically increasing shard key when using range-based sharding rather than hashed sharding, which is a fancy way of saying the shard key value for new documents only increases. Examples of this would be a timestamp (naturally) or anything that has a time component as its most significant component like ObjectID (first 4 bytes are a time stamp).

Why is it a bad idea?

The short answer is insert scalability. If you select such a shard key, all inserts (new documents) will go to a single chunk - the highest range chunk, and that will never change. Hence, regardless of how many shards you add, your maximum write capacity will never increase - you will only ever write new documents to a single chunk and that chunk will only ever live on a single shard.

Occasionally, this type of shard key can be the correct choice, but if so then you won’t be able to scale for write capacity.

Possible Mitigation Strategies

  • Change the shard key - this is problematic with large collections, because the data essentially has to be dumped out and re-imported

  • More specifically, use a hash based shard key, which will allow the use of the same field while providing good write scalability.

2. Trying to Change Value of the Shard Key

Shard keys are immutable (cannot be changed) for an existing document. This issue usually only crops up when sharding a previously unsharded collection. Prior to sharding, certain updates will be possible that are no longer possible after the collection has been sharded.

Attempting to update the shard key for an existing document will fail with the following error:

cannot modify shard key's value fieldid for collection: foo.foo

Possible Mitigation Strategies

  • Delete and re-insert the document to alter the shard key rather than attempting to update it in-place. It should be noted that this will not be an atomic operation, so must be done with caution.

Now you have a better understanding of how to choose and change your shard key if needed. In our next post, we will go through some potential obstacles you will face when scaling your sharded environment.

If you want more insight on scaling techniques for MongoDB, view the slides and video from our recent webinar on how to achieve scale with MongoDB, which reviews three different ways to achieve scale with MongoDB.

Tiered Storage Models in MongoDB: Optimizing Latency and Cost

May 14 • Posted 4 months ago

By Rohit Nijhawan, Senior Consulting Engineer at MongoDB with André Spiegel and Chad Tindel

For a user-facing application, speed and uptime are critical to success. There are a number of ways you can tune your application and hardware setup to provide the best experience for your customers — the trick is doing so at optimal cost. Here we provide an example for improving performance and lowering costs with MongoDB using Tiered Storage, a method of prioritizing data storage based on latency requirements.

In this example, we will be segmenting data by date: recent data is more frequently accessed and should exhibit lower latency than less recent data. However, the idea applies to other ways of segmenting data, such as location, user, source, size, or other criteria. This approach takes advantage of a powerful feature in MongoDB called tag-aware sharding that has been available since MongoDB 2.2.

Example Application: Insurance Claims

In many applications, low-latency access to data becomes less important as data ages. For example, an insurance company might prioritize access to claims from the last 12 months. Users should be able to view recent claims quickly, but once claims are more than a year old they tend to be accessed much less frequently, and the latency requirements tend to become less demanding.

By creating tiers of storage with different performance and cost profiles, the insurance company can provide a better experience for users while optimizing their costs. Older claims can be stored in a storage tier with more cost-effective hardware such as commodity hard drives. More recent data can be stored in a high-performance storage tier that provides lower latency such as SSD. Because the majority of the claims are more than a year old, storing older data in the lower-cost tier can provide significant cost advantages. The insurance company can optimize their hardware spread across the two tiers, providing a great user experience at an optimized cost point.

The requirements for this application can be summarized as:

The trailing 12 months of claims should reside on faster storage tier Claims over a year old should move to slower storage tier Over time new claims arrive, and older claims need to move from the faster tier to the slower tier

For simplicity, throughout this overview, we’ll distinguish the claims data by “current” and “tier-2” data.

Building Your Own Process: An Operational Headache

One approach to these requirements is use periodic batch jobs: selecting data, loading it into the archive, and erasing it from the faster storage. However, this is inherently complex:

  • The move process must be carefully coded to fail gracefully. In the event that a load fails, you don’t want to delete the original data!
  • If the data to be moved is large, you may wish to throttle the operations.
  • If moves succeed partially, you have to retry the unfinished data.
  • Unless you plan on halting your application during the move (generally unacceptable), your application needs custom code to find the data before, during, and after the move.
  • Your application needs to understand the physical location of the data, which unnecessarily complicates your code to the partitioning logic.

Furthermore, introducing another custom component to your operations requires additional maintenance and monitoring.

It’s an operational headache that many teams are forced to endure, but there is a simpler way: have MongoDB handle the load of migrating documents from the recent storage machines to the tier 2 storage machines, transparently. As it turns out, you can easily implement this approach with a feature called Tag-Aware Sharding.

The MongoDB Way: Tag-aware Sharding

MongoDB provides a feature called sharding to scale systems horizontally across multiple machines. Sharding is transparent to your application - whether you have 1 or 100 shards, your application code is the same. For a comprehensive description of sharding please see the Sharding Guide.

A key component of sharding is a process called the balancer. As collections grow, the balancer operates in the background to carefully move documents between shards. Normally the balancer works to achieve a uniform distribution of documents across shards. However, with tag-aware sharding we can create policies that affect where documents are stored. This feature can be applied in many use cases. One example is to keep user data in data centers that are near the user. In our application, we can use this feature to keep current data on our fast servers, and tier 2 data on cheaper, slower servers.

Here’s how it works:

  • Shards are assigned tags. A tag is an alphanumeric alias like “London-DC”.
  • Unique shard key ranges are ‘pinned’ to tags.
  • During normal balancing operations, chunks migrate only to shards whose tag is associated with a key range which contains the chunk’s key range*.
  • There are a few subtleties regarding what happens when a chunk’s key range overlaps more than one tag range. Please read the documentation carefully regarding this particular case

This means that we can assign the “tier-2” tag to shards running on slow servers and “current” tags to shards running on fast servers, and the balancer will handle migrating the data between tiers automatically. What’s great is that we can keep all the data in one database, so our application code doesn’t need to change as data moves between storage tiers.

Determining the shard key

When you query a sharded collection, the query router will do its best to only inspect the shards holding your data, but it can only do this if you provide the shard key as part of your query. (See Sharded Cluster Query Routing for more information.)

So we need to make sure that the we look up documents by the shard key. We also know that time is the basis for determining the location of documents in our two storage tiers. Accordingly, the shard key must contain an explicit timestamp. In our example, we’ll be using Enron’s email dataset, and we’ll set the top-level “date” as the shard key. Here’s a sample document:

Because the time is stored in the most significant digits of the date, messages from any given day will numerically precede messages from subsequent days.

Implementation

Here are the the steps to set up this system:

Set up an empty, sharded MongoDB cluster Create a target database to host the sharded collection Assign tags to different shards corresponding to the storage tiers Assign tag ranges to the shards Load data into the MongoDB Cluster

Set up the cluster The first thing you will want to do is set up your sharded cluster. You can see more information on how to set this up here.

In this case we will have a database called “enron” and a collection called “messages” which holds part of the Enron email corpus. In this example, we’ve set up a cluster with three shards. The first, shard0000, is optimized for low-latency access to data. The other two, shard0001 and shard0002, use more cost effective hardware for data that is older than the identified cutoff date.

Here’s our sharded cluster. These are empty machines with no data:

Adding the tags We can “tag” each of these shards to associate them with documents that should belong to our “current” tier or those that should belong to “tier-2.” In the absence of tags and range based tags, balancing will try to ensure that the number of chunks on each shard are equal without regard to any other data in the fields. Before we add the data to our collection, let’s tag shard0000 as “current” and the other two as “tier-2”:

Now we can verify our tags by calling sh.status():

Next, we need to set up a database and collection for the Enron emails. We’ll set up a new database ‘enron’ with a collection called ‘messages’ and enable sharding on that collection:

Since we’re going to shard the collection, we’ll need to set up a shard key. We will use the ‘date’ field as our shard key since this is the field that will define how the documents are distributed across shards:

Defining the cutoff date between tiers The cutoff point between “current” data and “tier-2” data is a point in time that we will update periodically to keep the most recent documents in our “current” shard. We will start with a cutoff of July 1, 2001, saved as an ISO Date ISODate(“2001-07-01”). Once we add the data to our collection, we will set this as the tag range. Going forward, when we add documents to the “messages” collection, any documents newer than July 1, 2001 will end up on the “current” shard, and documents older than that will end up on the “tier-2” shard.

It’s important that the two ranges overlap at exactly the same point in time. The lower bound of a tag range is inclusive, and the upper bound is exclusive. This means a document that has an date of exactly ‘ISODate(“2001-07-01”)’ will go on the “current” shard, not the “tier-2” shard.

Below you will see each of the shard’s new tag ranges:

As a final check, look in the config database for the tag range definitions.

Now, that all the shards and ranges are defined, we are ready to load the message data into the server. The collection will follow the instructions given by the tag ranges and land on the correct machines.

Now, let’s check the sharding status to see where the documents reside

That’s it! The mongos process automatically moves documents to comply with the tag ranges. In this example, it took all documents still on the “current” shard with an ISODate older than ISODate(“2001-07-01T00:00:00Z”) and move them to the “tier-2” shard.

The tag ranges must be updated on a regular basis to keep the cutoff point at the correct interval of time in the past (1 year, in our case). In order to do this, both ranges need to be updated. To perform this change the balancer should temporarily be disabled, so there is no point where the ranges overlap. Stopping the balancer temporarily is a safe operation - it will not affect the application or the experience of users.

If you wanted to move the cutoff back another month, to August 1, 2001, you just need to follow these three steps:

Stop the balancer sh.setBalancerState(false) Create a chunk split at August 1 sh.splitAt('enron.messages', {"date" : ISODate("2001-08-01")}) Move the cutoff date to ISODate(“2001-08-01T00:00:00Z”) var configdb=db.getSiblingDB("config"); configdb.tags.update({tag:"tier-2"},{$set:{'max.date':ISODate("2001-08-01")}}) configdb.tags.update({tag:"current"},{$set:{'min.date':ISODate("2001-08-01")}}) Re-start the balancer sh.setBalancerState(true) Verify the sharding status

By updating the chunk split to August 1, we have migrated all the documents with a date after July 1 but before August 1 from the “current” shard to the “tier-2” shards. The good news is that we were able to perform this operation without changing our application code and with no database downtime. We can also see that it would be simple to schedule this process to run automatically through an external process.

From Operational Headache to Simplicity

The end result is one collection spread across three shards and two different storage systems. This solution allows you to lower your storage costs without adding complexity to the architecture of your system. Instead of a complex setup with different databases on different machines we have one database to query, and instead of a data migration we update some simple rules to control the location of data in the system.

Like what you see? Sign up for the MongoDB Newsletter

Background Indexing on Secondaries and Orphaned Document Cleanup in MongoDB 2.6

Jan 27 • Posted 8 months ago

By Alex Komyagin, Technical Services Engineer in MongoDB’s New York Office

The MongoDB Support Team has broad visibility into the community’s use of MongoDB, issues they encounter, feature requests, bug fixes and the work of the engineering team. This is the first of a series of posts to help explain, from our perspective, what is changing in 2.6 and why.

Many of these changes are available today for testing in the 2.5.4 Development Release, which is available as of November 18, 2013 (2.5.5 release, coming soon, will be feature complete). Development Releases have odd-numbered minor versions (e.g., 2.1, 2.3, 2.5), and Production Releases have even-numbered minor versions (e.g., 2.2, 2.4, 2.6). MongoDB 2.6 will become available a little later this year.

Community testing helps MongoDB improve. You can test the development of MongoDB 2.5.4 today. Downloads are available here, and you can log Server issues in Jira.

Background indexes on secondaries (SERVER-2771)

Suppose you have a production replica set with two secondary servers, and you have a large, 1TB collection. At some point, you may decide that you need a new index to reflect a recent change in your application, so you build one in the background:

db.col.ensureIndex({..},{background : true})

Let’s also suppose that your application uses secondary reads (users should take special care with this feature, especially in sharded systems; for an example of why, see the next section in this post). After some time you observe that some of your application reads have started to fail, and replication lag on both secondaries has started to grow. While you are searching Google Groups for answers, everything magically goes back to normal by itself. Secondaries have caught up, and application reads on your secondaries are working fine. What happened?

One would expect that building indexes in the background would allow the replica set to continue serving regular operations during the index build. However, in all MongoDB releases before 2.6, background index builds on primaries become foreground index builds on secondaries, as noted in the documentation. Foreground index building is resource intensive and it can also affect replication and read/write operations on the database (see the FAQ on the MongoDB Docs). The good news is that impact can be minimized if your indexed collections are small enough for index builds to be relatively fast (on the order of minutes to complete).

The only way to make sure that indexing operations are not affecting the replica set in earlier versions of MongoDB was to build indexes in a rolling fashion. This works perfectly for most users, but not for everyone. For example, it wouldn’t work well for those who use a write concern “w:all”.

Starting with MongoDB 2.6, a background index build on the primary becomes a background index build on the secondaries. This behavior is much more intuitive and will improve the replica set robustness. We feel this will be a welcome enhancement for many users.

Please note that background index building normally takes longer than foreground building, because it allows other operations on the database to run. Keep in mind that, like most database systems, indexing in MongoDB is resource intensive and will increase the load on your system, whether it is a foreground or background process. Accordingly, it is best to perform these operations during a maintenance window or during off-peak hours.

The actual time needed to build a background index varies with the active load on your system, number of documents, database size and your hardware specs. Therefore, for production systems with large collections users can still take advantage of building indexes in a rolling fashion, or building them in foreground during maintenance windows if they believe a background index build will require more time than is acceptable.

Orphaned documents cleanup command (SERVER-8598)

MongoDB provides horizontal scaling through a feature called sharding. If you’re unfamiliar with sharding and how it works, I encourage you to read the nice new introduction to this feature the documentation team added a few months ago. Let me try and summarize some of the key concepts:

  • MongoDB partitions documents across shards.
  • Periodically the system runs a balancing process to ensure documents are uniformly distributed across the shards.
  • Groups of documents, called chunks, are the unit of a balancing job.
  • In certain failure scenarios stale copies of documents may remain on shards, which we call “orphaned documents.”

Under normal circumstances, there will be no orphaned documents in your system. However, in some production systems, “normal circumstances” are a very rare event, and migrations can fail (e.g., due to network connectivity issues), thus leaving orphaned documents behind.

The presence of orphaned documents can produce incorrect results for some queries. While orphaned documents are safe to delete, in versions prior to 2.6 there was no simple way to do so. In MongoDB 2.6 we implemented a new administrative command for sharded clusters: cleanupOrphaned(). This command removes orphaned documents from the shard in a single range of data.

The scenario where users typically encounter issues related to orphaned documents is when issuing secondary reads. In a sharded cluster, primary replicas for each shard are aware of the chunk placements, while secondaries are not. If you query the primary (which is the default read preference), you will not see any issues as the primary will not return orphaned documents even if it has them. But if you are using secondary reads, the presence of orphaned documents can produce unexpected results, because secondaries are not aware of the chunk ownerships and they can’t filter out orphaned documents. This scenario does not affect targeted queries (those having the shard key included), as mongos automatically routes them to correct shards.

To illustrate this discussion with an example, one of our clients told us that after a series of failed migrations he noticed that his queries were returning duplicate documents. He was using scatter-gather queries, meaning that they did not contain the shard key and were broadcast by mongos to all shards, as well as secondary reads. Shards return all the documents matching the query (including orphaned documents), which in this situation lead to duplicate entries in the final result set.

A short term solution was to remove orphaned documents (we used to have a special script for this). But a long term workaround for this particular client was to make their queries targeted, by including the shard key in each query. This way, mongos could efficiently route each query to the correct shard, not hitting the orphaned data. Routed queries are a best practice in any system as they also scale much better than scatter-gather queries.

Unfortunately, there are a few cases where there is no good way to make queries targeted, and you would need to either switch to primary reads or implement a regular process for removing orphaned documents.

The cleanupOrphaned() command is the first step on the path to automated cleanup of orphaned documents. This command should be run on the primary server and will clean up one unowned range on this shard. The idea is to run the command repeatedly, with a delay between calls to tune the cleanup rate.

In some configurations secondary servers might not be able to keep up with the rate of delete operations, resulting in replication lag. In order to control the lag, cleanupOrphaned() waits for the majority of the replica set members after the range removal is complete. Additionally, you can use the secondaryThrottle option, and each individual delete operation will be made with write concern w:2 (waits for one secondary). This may be useful for reducing the impact of removing orphaned documents on your regular operations.

You can find command usage examples and more information about the command in the 2.6 documentation.

I hope you will find these features helpful. We look forward to hearing your feedback on these features. If you would like to test them out, download MongoDB 2.5.4, the most recent Development Release of MongoDB.

MongoDB Sharding Visualizer

Sep 14 • Posted 2 years ago

We’re happy to share with you the initial release of the MongoDB sharding visualizer. The visualizer is a Google Chrome app that provides an intuitive overview of a sharded cluster. This project provides an alternative to the printShardingStatus() utility function available in the MongoDB shell.

Features

The visualizer provides two different perspectives of the cluster’s state.

The collections view is a grid where each rectangle represents a collection. Each rectangle’s area is proportional to that collection’s size relative to the other collections in the cluster. Inside each rectangle a pie chart shows the distribution of that collection’s chunks over all the shards in the cluster.

The shards view is a bar graph where each bar represents a shard and each segment inside the shard represents a collection. The size of each segment is relative to the other collections on that shard.

Additionally, the slider underneath each view allows rewinding the state of the cluster. select and view the state of the cluster at a specific time.

Installation

To install the plugin, download and unzip the source code from 10gen labs. In Google Chrome, go to Preferences > Extensions, enable Developer Mode, and click “Load unpacked extension…”. When prompted, select the “plugin” directory. Then, open a new tab in Chrome and navigate to the Apps page and launch the visualizer.

Feedback

We very much look forward to hearing feedback and encourage everyone to look at the source code which is available https://github.com/10gen-labs/shard-viz .

blog comments powered by Disqus