Posts tagged:

release

MongoDB 2.6: Our Biggest Release Ever

Apr 8 • Posted 3 months ago

By Eliot Horowitz, CTO and Co-founder, MongoDB

Discuss on Hacker News

In the five years since the initial release of MongoDB, and after hundreds of thousands of deployments, we have learned a lot. The time has come to take everything we have learned and create a basis for continued innovation over the next ten years.

Today I’m pleased to announce that, with the release of MongoDB 2.6, we have achieved that goal. With comprehensive core server enhancements, a groundbreaking new automation tool, and critical enterprise features, MongoDB 2.6 is by far our biggest release ever.

You’ll see the benefits in better performance and new innovations. We re-wrote the entire query execution engine to improve scalability, and took our first step in building a sophisticated query planner by introducing index intersection. We’ve made the codebase easier to maintain, and made it easier to implement new features. Finally, MongoDB 2.6 lays the foundation for massive improvements to concurrency in MongoDB 2.8, including document-level locking.

From the very beginning, MongoDB has offered developers a simple and elegant way to manage their data. Now we’re bringing that same simplicity and elegance to managing MongoDB. MongoDB Management Service (MMS), which already provides 35,000 MongoDB customers with monitoring and alerting, now provides backup and point-in-time restore functionality, in the cloud and on-premises.

We are also announcing a game-changing feature coming later this year: automation, also with hosted and on-premises options. Automation will allow you to provision and manage MongoDB replica sets and sharded clusters via a simple yet sophisticated interface.

MongoDB 2.6 brings security, integration and analytics enhancements to ease deployment in enterprise environments. LDAP, x.509 and Kerberos authentication are critical enhancements for organizations that require a single authentication mechanism across their entire infrastructure. To enhance security, MongoDB 2.6 implements TLS encryption, user-defined roles, auditing and field-level redaction, a critical building block for trusted systems. IBM Guardium also now offers integration with MongoDB, providing more extensive auditing abilities.

These are only a few of the key improvements; read the full official release notes for more details.

MongoDB 2.6 was a major endeavor and bringing it to fruition required hard work and coordination across a rapidly growing team. Over the past few years we have built and invested in that team, and I can proudly say we have the experience, drive and determination to deliver on this and future releases. There is much still to be done, and with MongoDB 2.6, we have a foundation for the next decade of database innovation.

Like what you see? Get MongoDB updates straight to your inbox

Background Indexing on Secondaries and Orphaned Document Cleanup in MongoDB 2.6

Jan 27 • Posted 5 months ago

By Alex Komyagin, Technical Services Engineer in MongoDB’s New York Office

The MongoDB Support Team has broad visibility into the community’s use of MongoDB, issues they encounter, feature requests, bug fixes and the work of the engineering team. This is the first of a series of posts to help explain, from our perspective, what is changing in 2.6 and why.

Many of these changes are available today for testing in the 2.5.4 Development Release, which is available as of November 18, 2013 (2.5.5 release, coming soon, will be feature complete). Development Releases have odd-numbered minor versions (e.g., 2.1, 2.3, 2.5), and Production Releases have even-numbered minor versions (e.g., 2.2, 2.4, 2.6). MongoDB 2.6 will become available a little later this year.

Community testing helps MongoDB improve. You can test the development of MongoDB 2.5.4 today. Downloads are available here, and you can log Server issues in Jira.

Background indexes on secondaries (SERVER-2771)

Suppose you have a production replica set with two secondary servers, and you have a large, 1TB collection. At some point, you may decide that you need a new index to reflect a recent change in your application, so you build one in the background:

db.col.ensureIndex({..},{background : true})

Let’s also suppose that your application uses secondary reads (users should take special care with this feature, especially in sharded systems; for an example of why, see the next section in this post). After some time you observe that some of your application reads have started to fail, and replication lag on both secondaries has started to grow. While you are searching Google Groups for answers, everything magically goes back to normal by itself. Secondaries have caught up, and application reads on your secondaries are working fine. What happened?

One would expect that building indexes in the background would allow the replica set to continue serving regular operations during the index build. However, in all MongoDB releases before 2.6, background index builds on primaries become foreground index builds on secondaries, as noted in the documentation. Foreground index building is resource intensive and it can also affect replication and read/write operations on the database (see the FAQ on the MongoDB Docs). The good news is that impact can be minimized if your indexed collections are small enough for index builds to be relatively fast (on the order of minutes to complete).

The only way to make sure that indexing operations are not affecting the replica set in earlier versions of MongoDB was to build indexes in a rolling fashion. This works perfectly for most users, but not for everyone. For example, it wouldn’t work well for those who use a write concern “w:all”.

Starting with MongoDB 2.6, a background index build on the primary becomes a background index build on the secondaries. This behavior is much more intuitive and will improve the replica set robustness. We feel this will be a welcome enhancement for many users.

Please note that background index building normally takes longer than foreground building, because it allows other operations on the database to run. Keep in mind that, like most database systems, indexing in MongoDB is resource intensive and will increase the load on your system, whether it is a foreground or background process. Accordingly, it is best to perform these operations during a maintenance window or during off-peak hours.

The actual time needed to build a background index varies with the active load on your system, number of documents, database size and your hardware specs. Therefore, for production systems with large collections users can still take advantage of building indexes in a rolling fashion, or building them in foreground during maintenance windows if they believe a background index build will require more time than is acceptable.

Orphaned documents cleanup command (SERVER-8598)

MongoDB provides horizontal scaling through a feature called sharding. If you’re unfamiliar with sharding and how it works, I encourage you to read the nice new introduction to this feature the documentation team added a few months ago. Let me try and summarize some of the key concepts:

  • MongoDB partitions documents across shards.
  • Periodically the system runs a balancing process to ensure documents are uniformly distributed across the shards.
  • Groups of documents, called chunks, are the unit of a balancing job.
  • In certain failure scenarios stale copies of documents may remain on shards, which we call “orphaned documents.”

Under normal circumstances, there will be no orphaned documents in your system. However, in some production systems, “normal circumstances” are a very rare event, and migrations can fail (e.g., due to network connectivity issues), thus leaving orphaned documents behind.

The presence of orphaned documents can produce incorrect results for some queries. While orphaned documents are safe to delete, in versions prior to 2.6 there was no simple way to do so. In MongoDB 2.6 we implemented a new administrative command for sharded clusters: cleanupOrphaned(). This command removes orphaned documents from the shard in a single range of data.

The scenario where users typically encounter issues related to orphaned documents is when issuing secondary reads. In a sharded cluster, primary replicas for each shard are aware of the chunk placements, while secondaries are not. If you query the primary (which is the default read preference), you will not see any issues as the primary will not return orphaned documents even if it has them. But if you are using secondary reads, the presence of orphaned documents can produce unexpected results, because secondaries are not aware of the chunk ownerships and they can’t filter out orphaned documents. This scenario does not affect targeted queries (those having the shard key included), as mongos automatically routes them to correct shards.

To illustrate this discussion with an example, one of our clients told us that after a series of failed migrations he noticed that his queries were returning duplicate documents. He was using scatter-gather queries, meaning that they did not contain the shard key and were broadcast by mongos to all shards, as well as secondary reads. Shards return all the documents matching the query (including orphaned documents), which in this situation lead to duplicate entries in the final result set.

A short term solution was to remove orphaned documents (we used to have a special script for this). But a long term workaround for this particular client was to make their queries targeted, by including the shard key in each query. This way, mongos could efficiently route each query to the correct shard, not hitting the orphaned data. Routed queries are a best practice in any system as they also scale much better than scatter-gather queries.

Unfortunately, there are a few cases where there is no good way to make queries targeted, and you would need to either switch to primary reads or implement a regular process for removing orphaned documents.

The cleanupOrphaned() command is the first step on the path to automated cleanup of orphaned documents. This command should be run on the primary server and will clean up one unowned range on this shard. The idea is to run the command repeatedly, with a delay between calls to tune the cleanup rate.

In some configurations secondary servers might not be able to keep up with the rate of delete operations, resulting in replication lag. In order to control the lag, cleanupOrphaned() waits for the majority of the replica set members after the range removal is complete. Additionally, you can use the secondaryThrottle option, and each individual delete operation will be made with write concern w:2 (waits for one secondary). This may be useful for reducing the impact of removing orphaned documents on your regular operations.

You can find command usage examples and more information about the command in the 2.6 documentation.

I hope you will find these features helpful. We look forward to hearing your feedback on these features. If you would like to test them out, download MongoDB 2.5.4, the most recent Development Release of MongoDB.

MongoDB Text Search: Experimental Feature in MongoDB 2.4

Jan 14 • Posted 1 year ago

Text search (SERVER-380) is one of the most requested features for MongoDB 10gen is working on an experimental text-search feature, to be released in v2.4, and we’re already seeing some talk in the community about the native implementation within the server. We view this as an important step towards fulfilling a community need. 

MongoDB text search is still in its infancy and we encourage you to try it out on your datasets. Many applications use both MongoDB and Solr/Lucene, but realize that there is still a feature gap. For some applications, the basic text search that we are introducing may be sufficient. As you get to know text search, you can determine when MongoDB has crossed the threshold for what you need.

Setting up Text Search


You can configure text search in the mongo shell:

db.adminCommand( { setParameter : 1, textSearchEnabled : true } )


Or set a command:

mongod --setParameter textSearchEnabled=true

 

Read more

MongoDB 2.2 Released

Aug 29 • Posted 1 year ago

We are pleased to announce the release of MongoDB version 2.2.  This release includes over 1,000 new features, bug fixes, and performance enhancements, with a focus on improved flexibility and performance. For additional details on the release:

New Features

Aggregation Framework

The Aggregation Framework is available in its first production-ready release as of 2.2. The aggregation framework makes it easier to manipulate and process documents inside of MongoDB, without needing to useMap Reducez,/span>, or separate application processes for data manipulation.

See the aggregation documentation for more information.

Additional “Data Center Awareness” Functionality

2.2 also brings a cluster of features that make it easier to use MongoDB for larger more geographically distributed contexts. The first change is a standardization of read preferences across all drivers and sharded (i.e. mongos) interfaces. The second is the addition of “tag aware sharding,” which makes it possible to ensure that data in a geographically distributed sharded cluster is always closest to the application that will use that data the most.

Improvements to Concurrency

v2.2 eliminates the global lock in the mongod process.  Locking is now per database.  In addition a new subsystem avoids locks under most page-fault events; thus concurrency improves even on systems with a single database.   Parallelism in application of writes on secondaries is enhanced also.  See this video for more details.

We’re looking forward to your feedback on 2.2. Keep the Jira Issues, blog posts, user group posts, and tweets coming.

- Eliot and the 10gen/MongoDBteam

Like what you see? Get MongoDB updates straight to your inbox

Revamp of MongoDB’s Documentation

May 1 • Posted 2 years ago

We’re revamping MongoDB’s documentation. The new design in the MongoDB Manual has an improved reference section and an index for simplified search. It will also eventually support multiple MongoDB versions at the same time.

This project is a work in progress, and things are changing quickly. Our goal is to consolidate, sharpen, organize, and continue to improve the documentation in support of MongoDB. For now, the new docs will live alongside the original MongoDB Wiki. But over the next few months, we’ll be transitioning everything to the new manual.

In the spirit of open source, the docs are housed on Github. Feedback is welcome! Feel free to fork the repository and issue pull requests. You can also open tickets in JIRA, and we’ll promptly address any suggestions.

blog comments powered by Disqus