by Emily Stolfo, Ruby Engineer and Evangelist at 10gen
MongoDB is a popular choice among developers in part because it permits a one-to-one mapping between object-oriented (OO) software objects and database entities. Ruby developers are at a great advantage in using MongoDB because they are already used to working with and designing software that is purely object-oriented.
Most of the discussions I’ve had about MongoDB and Ruby assume Ruby knowledge and explain why MongoDB is a good fit for the Rubyist. This post will do the opposite; I’m going to assume you know a few things about MongoDB but not much about Ruby. In showing the Rubyist’s OO advantage, I’ll share a bit about the Ruby programming language and its popularity, explain specifically how the majority of Ruby developers are using MongoDB, and then talk about the future of the 10gen Ruby driver in the context of the Rails community.
The Ruby programming language was created 20 years ago by Yukihiro Matsumoto, “Matz” to the Ruby community. The language, although not made immensely popular until the introduction of the Rails web framework many years later, is somewhat known by its founding philosophy. Matz has made numerous statements saying that he strived to create a language that follows principles of good user interface design. Namely, Ruby is intended to make the developer experience more pleasant and to facilitate programmer productivity. Matz has said that he wanted to combine the flexibility of Perl with the object-orientation of Smalltalk. The result is an elegant, flexible, and practical language that is indeed a pleasure to use.
Ruby is a purely object-oriented language. This means that while other languages would have primitive types for programming “atoms” such as integers, booleans, and null, Ruby has base types. Classes, once instantiated, are objects that have properties (instance variables) and performable actions (methods). Even classes themselves are instances of the class, Class. Let’s look at an example in the Ruby interpreter:
> 2.object_id => 5 > false.object_id => 0 > true.object_id => 20 > nil.object_id => 8
As you can see, even integers, booleans, and Ruby’s null object “nil” all have object ids. This implies that they are more than just primitives; they are objects complete with a class, properties and methods. Even further, we can see that the nil has methods!
> nil.methods => [:to_i, :to_f, :to_s, :to_a, :to_h, :inspect, :&, :|, :^, :nil?, :to_r, :rationalize, :to_c, :===, :=~, :!~, :eql?, :hash, :, :class, :singleton_class, :clone, :dup, :taint, :tainted?, :untaint, :untrust, :untrusted?, :trust, :freeze, :frozen?, :methods, :singleton_methods, :protected_methods, :private_methods, :public_methods, :instance_variables...etc]
Integers can have methods too:
> i = 0
> 4.times do
> puts "#{i%2 == 0 ? 'heeey' : 'hoooo'}"
> i += 1
> end
heeey
hoooo
heeey
hoooo
In addition to the expected “primitive” types, Ruby provides other base types such as arrays and hashes. The hash will become particularly relevant later on in this post, but for now, I’ll just share the syntax:
> document = {}
=> {}
> document["id"] = "emilys"
=> "emilys"
> document
=> {"id"=>"emilys"}
Ruby software engineers strive to embrace the OO nature of the language, sometimes to the extreme. The language is strongly and dynamically typed. This allows the Rubyist to design software that is highly modular and that focuses on duck-typing— i.e. taking advantage of object behavior rather than statically-defined types. A good Rubyist aims to reduce dependencies and increase the flexibility of code. For example, the language doesn’t support multi-inheritance, but it does provide something called modules, which essentially are a grouping of common methods in a class that cannot be instantiated. This module can be “mixed in” to a class to give it extra functionality. All of these characteristics together— dynamic, object-oriented, flexible, modular, make Ruby code a pleasure to write and maintain.
I mentioned above that Ruby’s popularity really blossomed with the creation of Ruby on Rails in 2003. The web framework was created by a web programmer, David Heinemeier Hansson (“DHH”), who noticed that the web stack was a given and developers were repeating themselves over and over in building technology to wrap it. He decided to extract the common elements of web engineering into a modular, reusable framework. DHH unveiled his framework in a presentation that has become iconic in web programming. He uses a DSL to create a web application in one command and then starts up a server.
> rails new my_app > cd my_app > rails server
So why did Rails become so popular? Rails was built to make web programming faster, easier, and more manageable. By introducing a number of conventions and sticking to OO, web programmers could go from 0 to a full working app in a relatively short amount of time with little configuration. By the same token, they could take an existing app and quickly understand the codebase enough to maintain and develop it. We’ve already discussed the approachability of the Ruby programming language, and Rails follows many of the same principles. Rails a solution for making web programming simpler.
I teach a Ruby on Rails class at Columbia University, and I often tell my students that Ruby on Rails is the gateway drug to web programming. It makes web development more accessible to a newcomes.
MongoDB is a document database that focuses on developer needs. (Notice a common theme yet?) There’s no need for an army of database administrators to maintain a MongoDB cluster and the database’s flexibility allows for application developers to define and manipulate a schema themselves instead of relying on a separate team of dedicated engineers. Assuming that the many advantages of using MongoDB are familiar to you, it might seem natural for all Rails and Ruby developers to choose MongoDB as their first choice of datastore. Unfortunately, MongoDB is far from the default.
The Active Record pattern describes the mapping of an object instance to a row in a relational database table, using accessor methods to retrieve columns/properties, and the ability to create, update, read, and delete entities from the database. It was first named by Martin Fowler in his book, Patterns of Enterprise Application Architecture.
The pattern has numerous limitations referred to as the Object-relational impedance mismatch. Some of these technical difficulties are structural. In OO programming, objects may be composed of other objects. The Active Record pattern maps these sub-objects to separate tables, thus introducing issues concerning the representation of relationships and encapsulated data. Rails’ biggest contribution to web programming was arguably not the framework itself, but rather its Object-relational-mapper, Active Record. Active Record uses macros to create relationships between objects and single table inheritance to represent inheritance.
The best solution to-date for the Object-relational impedance mismatch is Active Record, but this is assuming your datastore is relational. It’s also, fundamentally, a hack. What if we were to use a more OO datastore?
As we see massive growth in data, an increased diversity of content, and a demand for shorter development cycles, the infrastructure developers rely upon must rise to meet new challenges that traditional technologies were not designed to address. MongoDB has gained immense popularity because it fills many of the strongest modern technical demands, while still being developer-friendly and low-cost.
Given all that has been discussed regarding Rails and Ruby, wouldn’t it make sense to use MongoDB with Rails? The answer is yes: it makes a lot of sense. Nevertheless, Rails wasn’t originally built to use a document database so you must use a separate gem in place of Active Record.
MongoMapper and Mongoid are the two leading gems that make it possible use MongoDB as a datastore with Rails. MongoMapper, a project by Jon Nunemaker from Github, is a simple ORM for MongoDB. Mongoid, in particular, has become quite popular since its creation 4 years ago by Durran Jordan. Mongoid’s goal is to provide a familiar API to Active Record, while still leveraging MongoDB’s schema flexibility, document design, atomic modifiers, and rich query interface.
It’s largely due to these two gems that MongoDB can credit its traction in the Rails and Ruby community. In the past, Rails developers had to jump through a number of hoops in order to use one of these alternate ODMs with Rails, but the web framework then further modularized the database abstraction layer (the M component in MVC) to make it possible for a Rails developer to create an app without Active Record. Now all you have to do is:
> rails new my_app --skip-active-record > cd my_app > [edit Gemfile and add mongoid or mongo_mapper] > bundle install > rails server
To further illustrate a brief example using Mongoid, Rails model files are altered to not have the classes inherit from ActiveRecord::Base, must include a module Mongoid::Document, and define the schema in the actual file. Database migrations are not necessary!
class Course include Mongoid::Document field :name, type: String embeds_many :lectures end
Additionally, you have a number of configuration options available, such as allow_dynamic_fields that allows you to define attributes on an object that aren’t in the model file’s schema. You can then add some logic in your model file if you need to do something different depending on the existence or absence of this field.
I’m not going to go into too much detail on using Rails with MongoDB, because that’s a whole blog post in itself and both MongoMapper and Mongoid’s docs are fantastic. Instead, it’d be worth devoting a paragraph or two talking about the future of Ruby and MongoDB.
Rails is not required to use MongoDB with Ruby. You can use either of those two gems or the 10gen Ruby driver directly in another framework, such as Sinatra, or in the context of any other Ruby program. This is where the base Hash class in the Ruby language is relevant: one of the many roles of a MongoDB driver is to serialize/deserialize BSON documents into some native representation of a document in the given language. Luckily, Ruby’s Hash class is both idiomatically familiar to Rubyists and a very close representation of a document. See for yourself:
MongoDB document representing a tweet:
{
"_id" : ObjectId("51073d4c4eeb4f4247b5c8f9"),
"text" : "Just saw Star Trek. It was the best Star Trek movie yet!",
"created_at" : "Wed May 15 19:06:41 +0000 2010",
"entities" : {
"user_mentions" : [
{
"indices" : [
7,
20
],
"screen_name" : "davess",
"id" : 17916546
}
],
"urls" : [ ],
"hashtags" : [ ]
},
"retweeted" : false,
"user" : {
"location" : "United States",
"created_at" : "Wed Apr 01 10:14:11 +0000 2009",
"description" : "MongoDB Ruby driver engineer, Adjunct faculty at Columbia",
"time_zone" : "New York",
"screen_name" : "EmilyS",
"lang" : "en",
"followers_count" : 152,
},
"favorited" : false,
}
The corresponding Ruby hash:
{
"_id" => ObjectId("51073d4c4eeb4f4247b5c8f9"),
"text" => "Just saw Star Trek. It was the best Star Trek movie yet!",
"created_at" => "Wed May 15 19:06:41 +0000 2010",
"entities" => {
"user_mentions" => [
{
"indices" => [
7,
20
],
"screen_name" => "davess",
"id" => 17916546
}
],
"urls" => [ ],
"hashtags" => [ ]
},
"retweeted" => false,
"user" => {
"location" => "United States",
"created_at" => "Wed Apr 01 10:14:11 +0000 2009",
"description" => "MongoDB Ruby driver engineer",
"time_zone" => "New York",
"screen_name" => "EmilyS",
"lang" => "en",
"followers_count" => 152,
},
"favorited" => false,
}
It can’t get any closer than that. Whether they are simple hashes serialized using the driver directly or instantiated classes persisted through an ODM, Ruby objects map seamlessly to MongoDB documents.
Note: Mongoid has it’s own driver, called moped, as of version 3.x. Therefore, if you’re on Mongoid 3.x, you’re not using 10gen’s Ruby driver.

MongoMapper, on the other hand, does use the 10gen driver in all versions.
Given how passionate the Ruby team at 10gen feels about the language being one of the best fits for MongoDB, we want to strengthen our relationship with the Ruby community. We are always looking for opportunities to support even more Rubyists and thus have been working with Durran Jordan to build a new bson and mongo gem that Mongoid will use in the near future. We’ve also see great adoption of MongoMapper and hope that more Rubyists, specifically Rails developers, will benefit from the continued improvement and collaboration on these open source projects.
This is a guest post from Valeri Karpov, a MongoDB Hacker and co-founder of the Ascot Project. For more MEAN Stack wisdom, check out his blog at TheCodeBarbarian.com. Valeri originally coined the term MEAN Stack while writing for the MongoDB blog, and you can find that post here.
If you’re familiar with Ruby on Rails and are using MongoDB to build a NodeJS app, you might miss some slick ActiveRecord features, such as declarative validation. Diving into most of the basic tutorials out there, you’ll find that many basic web development tasks are more work than you like. For example, if we borrow the style of http://howtonode.org/express-mongodb, a route that pulls a document by its ID will look something like this:
app.get('/document/:id', function(req, res) {
db.collection('documents', function(error, collection) {
collection.findOne({ _id : collection.db.bson_serializer.ObjectID.createFromHexString(req.params.id) },
function(error, document) {
if (error || !document) {
res.render('error', {});
} else {
res.render('document', { document : document });
}
});
});
});
In my last guest post MongoDB I touched on MongooseJS, a schema and usability wrapper for MongoDB in NodeJS. MongooseJS was developed by LearnBoost, an education startup based in San Francisco, and maintained by 10gen. MongooseJS lets us take advantage of MongoDB’s flexibility and performance benefits while using development paradigms similar to Ruby on Rails and ActiveRecord. In this post, I’ll go into more detail about how The Ascot Project uses Mongoose for our data, some best practices we’ve learned, and some pitfalls we’ve found that aren’t clearly documented.
Before we dive into the details of working with Mongoose, let’s take a second to define the primary objects that we will be using. Loosely speaking, Mongoose’s schema setup is defined by 3 types: Schema, Connection, and Model.
A Schema is an object that defines the structure of any documents that will be stored in your MongoDB collection; it enables you to define types and validators for all of your data items.
A Connection is a fairly standard wrapper around a database connection.
A Model is an object that gives you easy access to a named collection, allowing you to query the collection and use the Schema to validate any documents you save to that collection. It is created by combining a Schema, a Connection, and a collection name.
Finally, a Document is an instantiation of a Model that is tied to a specific document in your collection.
Okay, now we can jump into the dirty details of MongooseJS. Most MongooseJS apps will start something like this:
var Mongoose = require('mongoose');
var myConnection = Mongoose.createConnection('localhost', 'mydatabase');
var MySchema = new Mongoose.schema({
name : {
type : String,
default : 'Val',
enum : ['Val', 'Valeri', 'Valeri Karpov']
},
created : {
type : Date,
default : Date.now
}
});
var MyModel = myConnection.model('mycollection', MySchema);
var myDocument = new MyModel({});
What makes this code so magical? There are 4 primary advantages that Mongoose has over the default MongoDB wrapper:
1. MongoDB uses named collections of arbitrary objects, and a Mongoose JS Model abstracts away this layer. Because of this, we don’t have to deal with tasks such as asynchronously telling MongoDB to switch to that collection, or work with the annoying createFromHexString function. For example, in the above code, loading and displaying a document would look more like:
app.get('/document/:id', function(req, res) {
Document.findOne({ _id : req.params.id }, function(error, document) {
if (error || !document) {
res.render('error', {});
} else {
res.render('document', { document : document });
}
});
});
2. Mongoose Models handle the grunt work of setting default values and validating data. In the above example myDocument.name = ‘Val’, and if we try to save with a name that’s not in the provided enum, Mongoose will give us back a nice error. If you want to learn a bit more about the cool things you can do with Mongoose validation, you can check out my blog post on how to integrate Mongoose validation with [AngularJS] (http://thecodebarbarian.wordpress.com/2013/05/12/how-to-easily-validate-any-form-ever- using-angularjs/).
3. Mongoose lets us attach functions to our models:
MySchema.methods.greet = function() { return 'Hello, ' + this.name; };
4. Mongoose handles limited sub-document population using manual references (i.e. no MongoDB DBRefs), which gives us the ability to mimic a familiar SQL join. For example:
var UserGroupSchema = new Mongoose.schema({
users : [{ type : Mongoose.Schema.ObjectId, ref : 'mycollection' }]
});
var UserGroup = myConnection.model('usergroups', UserGroupSchema);
var group = new UserGroup({ users : [myDocument._id] });
group.save(function() {
UserGroup.find().populate('users').exec(function(error, groups) {
// Groups contains every document in usergroups with users field populated // Prints 'Val'
console.log(groups[0][0].name)
});
});
In the last few months, my team and I have learned a great deal about working with Mongoose and using it to open up the true power of MongoDB. Like most powerful tools, it can be used well and it can be used poorly, and unfortunately a lot of the examples you can find online fall into the latter. Through trial and error over the course of Ascot’s development, my team has settled on some key principles for using Mongoose the right way:
1 Schema = 1 file
A schema should never be declared in app.js, and you should never have multiple schemas in a single file (even if you intend to nest one schema in another). While it is often expedient to inline everything into app.js, not keeping schemas in separate files makes things more difficult in the long run. Separate files lowers the barrier to entry for understanding your code base and makes tracking changes much easier.
Mongoose can’t handle multi-level population yet, and populated fields are not Documents. Nesting schemas is helpful but it’s an incomplete solution. Design your schemas accordingly.
Let’s say we have a few interconnected Models:
var ImageSchema = new Mongoose.Schema({
url : { type : String},
created : { type : Date, default : Date.now }
});
var Image = db.model('images', ImageSchema);
var UserSchema = new Mongoose.Schema({
username : { type : String },
image : { type : Mongoose.Schema.ObjectId, ref : 'images' }
});
UserSchema.methods.greet = function() {
return 'Hello, ' + this.name;
};
var User = db.model('users', UserSchema);
var Group = new Mongoose.Schema({
users : [{ type : Mongoose.Schema.ObjectId, ref : 'users' }]
});
Our Group Model contains a list of Users, which in turn each have a reference to an Image. Can MongooseJS resolve these references for us? The answer, it turns out, is yes and no.
Group.
find({}).
populate('user').
populate('user.image').
exec(function(error, groups) {
groups[0].users[0].username; // OK
groups[0].users[0].greet(); // ERROR – greet is undefined
groups[0].users[0].image; // Is still an object id, doesn't get populated
groups[0].users[0].image.created; // Undefined
});
In other words, you can call ‘populate’ to easily resolve an ObjectID into the associated object, but you can’t call ‘populate’ to resolve an ObjectID that’s contained in that object. Furthermore, since the populated object is not technically a Document, you can’t call any functions you attached to the schema. Although this is definitely a severe limitation, it can often be avoided by the use of nested schemas. For example, we can define our UserSchema like this:
var UserSchema = new Mongoose.Schema({
username : { type : String },
image : [ImageSchema]
});
In this case, we don’t have to call ‘populate’ to resolve the image. Instead, we can do this:
Group.
find({}).
populate('user').
exec(function(error, groups) {
groups[0].users[0].image.created; // Date associated with image
});
However, nested schemas don’t solve all of our problems, because we still don’t have a good way to handle many-to-many relationships. Nested schemas are an excellent solution for cases where the nested schema can only exist when it belongs to exactly one of a parent schema. In the above example, we implicitly assume that a single image belongs to exactly one user – no other user can reference the exact same image object.
For instance, we shouldn’t have UserSchema as a nested schema of Group’s schema, because a User can be a part of multiple Groups, and thus we’d have to store separate copies of a single User object in multiple Groups. Furthermore, a User ought to be able to exist in our database without being part of any groups.
Declare your models exactly once and use dependency injection; never declare them in a routes file.
This is best expressed in an example:
// GOOD
exports.listUsers = function(User) {
return function(req, res) {
User.find({}, function(error, users) {
res.render('list_users', { users : users });
});
}
};
// BAD
var db = Mongoose.createConnection('localhost', 'database');
var Schema = require('../models/User.js').UserSchema;
var User = db.model('users', Schema);
exports.listUsers = return function(req, res) {
User.find({}, function(error, users) {
res.render('list_users', { users : users });
});
};
The biggest problem with the “bad” version of listUsers shown above is that if you declare your model at the top of this particular file, you have to define it in every file where you use the User model. This leads to a lot of error-prone find-and-replace work for you, the programmer, whenever you want to do something like rename the Schema or change the collection name that underlies the User model.
Early in Ascot’s development we made this mistake with a single file, and ended up with a particularly annoying bug when we changed our MongoDB password several months later. The proper way to do this is to declare your Models exactly once, include them in your app.js, and pass them to your routes as necessary.
In addition, note that the “bad” listUsers is impossible to unit test. The User schema in the “bad” example is inaccessible through calls to require, so we can’t mock it out for testing. In the “good” example, we can write a test easily using Nodeunit:
var UserRoutes = require('./routes/user.js');
exports.testListUsers = function(test) {
mockUser.collection = [{ name : 'Val' }];
var fnToTest = UserRoutes.listUsers(mockUser);
fnToTest( {},
{ render : function(view, params) {
test.equals(mockUser.collection, params.users); test.done();
}
});
};
And speaking of Nodeunit:
Unit tests catch mistakes, encourage you to write modular code, and allow you to easily make sure your logic works. They are your friend.
I’ll be the first to say that writing unit tests can be very annoying. Some tests can seem trivial, they don’t necessarily catch all bugs, and often you write way more test code than actual production code. However, a good suite of tests can save you a lot of worry; you can make changes and then quickly verify that you haven’t broken any of your modules. Ascot Project currently uses Nodeunit for our backend unit tests; Nodeunit is simple, flexible, and works well for us.
And there you have it! Mongoose is an excellent library, and if you’re using MongoDB and NodeJS, you should definitely consider using it. It will save you from writing a lot of extra code, it’ll handle some basic population, and it’ll handle all your validation and object creation grunt work. This adds up to more time spent building awesome stuff, and less time trying to figure out how to get your database interface to work.
Have any questions about the code featured in this post? Want to suggest a better approach? Feel like telling me why the MEAN Stack is the worst thing that ever happened in the history of the world and how horrible I am? Go ahead and leave a comment below, or shoot me an email at valkar207@gmail.com and I’ll do my best to answer any questions you might have. You can also find me on github at https://github.com/vkarpov15. My current venture is called The Ascot Project, and you can find that over at www.AscotProject.com.
By Mike O’Brien, 10gen Software engineer and maintainer of Mongo-Hadoop
With the release of MongoDB 2.4, it’s now pretty simple to take an existing application that already uses MongoDB and add new features that take advantage of text search. Prior to 2.4, adding text search to a MongoDB app would have required writing code to interface with another system like Solr, Lucene, ElasticSearch, or something else. Now that it’s integrated with the database we are already using, we can accomplish the same result with reduced complexity, and fewer moving parts in the deployment.
Here we’ll go through a practical example of adding text search to Planet MongoDB, our blog aggregator site.
10gen introduced MongoDB Backup Service in early May. Creating a backup service for MongoDB was a new challenge, and we used the opportunity to explore new technologies for our stack. The final implementation of the MongoDB Backup Service agent is written in Go, an open-source, natively executable language initiated and maintained by Google.
The Backup Service started as a Java project, but as the project matured, the team wanted to move to a language that compiled natively on the machine. After considering a few options, the team decided that Go was the best fit for its C-like syntax, strong standard library, the resolution of concurrency problems via goroutines, and painless multi-platform distribution.
MongoDB 2.5.0 (an unstable dev build) has a new implementation of the “Matcher”. The old Matcher is the bit of code in Mongo that takes a query and decides if a document matches a query expression. It also has to understand indexes so that it can do things like create a subsets of queries suitable for index covering. However, the structure of the Matcher code hasn’t changed significantly in more than four years and until this release, it lacked the ability to be easily extended. It was also structured in such a way that its knowledge could not be reused for query optimization. It was clearly ready for a rewrite.
The “New Matcher” in 2.5.0 is a total rewrite. It contains three separate pieces: an abstract syntax tree (hereafter ‘AST’) for expression match expressions, a parser from BSON into said AST, and a Matcher API layer that simulates the old Matcher interface while using all new internals. This new version is much easier to extend, easier to reason about, and will allow us to use the same structure for matching as for query analysis and rewriting.
Geometric processing as a field of study has many applications, and has resulted in lots of research, and powerful tools. Many modern web applications have location based components, and require a data storage engines capable of managing geometric information. Typically this requires the introduction of an additional storage engine into your infrastructure, which can be a time consuming and expensive operation.
MongoDB has a set of geometric storage and search features. The MongoDB 2.4 release brought several improvements to MongoDB’s existing geo capabilities and the introduction of the 2dsphere index.
The primary conceptual difference (though there are also many functional differences) between the 2d and 2dsphere indexes, is the type of coordinate system that they consider. Planar coordinate systems are useful for certain applications, and can serve as a simplifying approximation of spherical coordinates. As you consider larger geometries, or consider geometries near the meridians and poles however, the requirement to use proper spherical coordinates becomes important.
In addition to this major conceptional difference, there are also significant functional differences, which are outlined in some depth in the Geospatial Indexes and Queries section of the MongoDB documentation. This post will discuss the new features that have been added in the 2.4 release.
This is a guest post by Sean Reilly. Release your applications with MongoDB more often and get closer to the ultimate goal of deploying applications anytime and why not at 11am on Wednesday mornings?
This article explores how to make use of MongoDB characteristics in order to avoid the downtime traditionally required by migration scripts in the SQL world. This is in order to get closer to the goal of being able to deploy applications with no downtime.
This is a guest post by NYU Information Systems (MSIS) Graduate students Kyle Galloway, Pravish Sood and Dylan Kelemen.
We are pleased to announce the Mongo-ODBC project. As NYU MSIS students in Courant Institute’s Information Technology Projects course, we are working under the guidance of 10gen and our Professor Evan Korth to develop an ODBC (Open-Database-Connectivity) driver for MongoDB.
ODBC was created in order to facilitate the movement of data between applications with different file structures and although it is not as popular as it once was, in part due to more flexible alternatives like MongoDB, but many programs maintain ODBC compliance. The goal of our project is to create an ODBC driver that supports the ODBC functions that can be carried out on MongoDB. This will allow users of programs that don’t yet offer MongoDB support some access to data in MongoDB databases. We believe this will particularly useful for new users and those dependent on programs like Excel and Tableau for simple business analysis reporting.
This is a guest post from Valeri Karpov, a MongoDB Hacker and co-founder of the Ascot Project.
A few weeks ago, a friend of mine asked me for help with PostgreSQL. As someone who’s been blissfully SQL-free for a year, I was quite curious to find out why he wasn’t just using MongoDB instead. It turns out that he thinks MongoDB is too difficult to use for a quick weekend hack, and this couldn’t be farther from the truth. I just finished my second 24 hour hackathon using Mongo and NodeJS (the FinTech Hackathon cosponsored by 10gen) and can confidently say that there is no reason to use anything else for your next hackathon or REST API hack.
This is a guest post from Dharshan Rangegowda, founder of Scalegrid, creators of MongoDirector. This originally appeared on the MongoDirector blog.
Are you hosting your production MongoDB instances on Amazon AWS? At MongoDirector.comwe manage hundreds of production MongoDB instances on AWS and have learnt a few things along the way. Here are a set of 10 questions you need to ask yourself and answer as you continue to manage your deployment. Almost all of the information below is applicable to other cloud service providers as well.
Lots of MongoDB users enjoy the flexibility of custom shard keys in organizing a sharded collection’s documents. For certain common workloads though, like key/value lookup, using the natural choice of _id as a shard key isn’t optimal because default ObjectId’s are ascending, resulting in poor write distribution. Creating randomized _ids or choosing another well-distributed field is always possible, but this adds complexity to an app and is another place where something could go wrong.
To help keep these simple workloads simple, in 2.4 MongoDB added the new Hash-based shard key feature. The idea behind Hash-based shard keys is that MongoDB will do the work to randomize data distribution for you, based on whatever kind of document identifier you like. So long as the identifier has a high cardinality, the documents in your collection will be spread evenly across the shards of your cluster. For heavy workloads with lots of individual document writes or reads (e.g. key/value), this is usually the best choice. For workloads where getting ranges of documents is more important (i.e. find recent documents from all users), other choices of shard key may be better suited.
When you’re preparing a MongoDB deployment, you should try to understand how your application is going to hold up in production. It’s a good idea to develop a consistent, repeatable approach to managing your deployment environment so that you can minimize any surprises once you’re in production.
The best approach incorporates prototyping your setup, conducting load testing, monitoring key metrics, and using that information to scale your setup. The key part of the approach is to proactively monitor your entire system - this will help you understand how your production system will hold up before deploying, and determine where you’ll need to add capacity. Having insight into potential spikes in your memory usage, for example, could help put out a write-lock fire before it starts.
The upcoming release of MongoDB 2.4 brings an exciting change to the JavaScript engine. Previously, MongoDB ran Spidermonkey 1.7, but going forward, MongoDB will be running V8, the open-source high-performance JavaScript engine from Google. This means that from now on, whenever JavaScript is executed, V8 will be running the show.
In this post we’ll examine the following primary impacts of this change:
Previously, every query/command that used the JS interpreter had to acquire a mutex, thus serializing all JS work. Now, with V8 we have improved concurrency by allowing each JavaScript job to run on a separate core.
For example, if a user’s workload commonly involved 24 concurrent $where queries (each from a unique client), and they have a server with 24 cores, they should expect query execution times to be reduced by (roughly) a factor of 24.
The MongoDB Engineering Team is pleased to announce the release of MongoDB 2.4. This is the latest stable release, following the September 2012 release of MongoDB 2.2. This release contains key new features along with performance improvements and bug fixes. We have outlined some of the key features below. For additional details about the release:
Highlights of MongoDB 2.4 include:
MongoDB 2.2 introduced the touch command, which loads data from the data storage layer into memory. The touch command will load a collection’s documents, indexes or both into memory. This can be ideal to preheat a newly started server, in order to avoid page faults and slow performance once the server is brought into production. You can also use this when adding a new secondary to an existing replica set to ensure speedy subsequent reads.
Note that while the touch command is running, a replica set member will enter into a RECOVERING state to prevent reads from clients. When the operation completes, the secondary will return to the SECONDARY(2) state.
You invoke the touch command through the following syntax:
db.runcommand({ touch: “collection_name”, data: true, index: true})