<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description></description><title>The MongoDB NoSQL Database Blog</title><generator>Tumblr (3.0; @mongodb)</generator><link>http://blog.mongodb.org/</link><item><title>Ruby, Rails, MongoDB and the Object-Relational Mismatch</title><description>&lt;p&gt;&lt;em&gt;by &lt;a href="https://github.com/estolfo"&gt;Emily Stolfo&lt;/a&gt;, Ruby Engineer and Evangelist at 10gen&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;MongoDB is a popular choice among developers in part because it permits a one-to-one mapping between object-oriented (OO) software objects and database entities.  Ruby developers are at a great advantage in using MongoDB because they are already used to working with and designing software that is purely object-oriented.&lt;/p&gt;

&lt;p&gt;Most of the discussions I’ve had about MongoDB and Ruby assume Ruby knowledge and explain why MongoDB is a good fit for the Rubyist.  This post will do the opposite; I’m going to assume you know a few things about MongoDB but not much about Ruby.  In showing the Rubyist’s OO advantage, I’ll share a bit about the Ruby programming language and its popularity, explain specifically how the majority of Ruby developers are using MongoDB, and then talk about the future of the 10gen Ruby driver in the context of the Rails community.&lt;/p&gt;

&lt;h4&gt;Ruby who?&lt;/h4&gt;

&lt;p&gt;The Ruby programming language was created 20 years ago by Yukihiro Matsumoto, “Matz” to the Ruby community.  The language, although not made immensely popular until the introduction of the Rails web framework many years later, is somewhat known by its founding philosophy.  Matz has made numerous statements saying that he strived to create a language that follows principles of good user interface design.  Namely, Ruby is intended to make the developer experience more pleasant and to facilitate programmer productivity.  Matz has said that he wanted to combine the flexibility of Perl with the object-orientation of Smalltalk.  The result is an elegant, flexible, and practical language that is indeed a pleasure to use.&lt;/p&gt;

&lt;p&gt;Ruby is a purely object-oriented language.  This means that while other languages would have primitive types for programming “atoms” such as integers, booleans, and null, Ruby has base types.  Classes, once instantiated, are objects that have properties (instance variables) and performable actions (methods). Even classes themselves are instances of the class, Class.  Let’s look at an example in the Ruby interpreter:&lt;/p&gt;

&lt;pre&gt;&amp;gt; 2.object_id
 =&amp;gt; 5 
&amp;gt; false.object_id
 =&amp;gt; 0 
&amp;gt; true.object_id
 =&amp;gt; 20
&amp;gt; nil.object_id
 =&amp;gt; 8&lt;/pre&gt;

&lt;p&gt;As you can see, even integers, booleans, and Ruby’s null object “nil” all have object ids.  This implies that they are more than just primitives; they are objects complete with a class, properties and methods.  Even further, we can see that the nil has methods!&lt;/p&gt;

&lt;pre&gt;&amp;gt; nil.methods
 =&amp;gt; [:to_i, :to_f, :to_s, :to_a, :to_h, :inspect, :&amp;amp;, :|, :^, :nil?, :to_r, :rationalize, :to_c, :===, :=~, :!~, :eql?, :hash, :, :class, :singleton_class, :clone, :dup, :taint, :tainted?, :untaint, :untrust, :untrusted?, :trust, :freeze, :frozen?, :methods, :singleton_methods, :protected_methods, :private_methods, :public_methods, :instance_variables...etc] &lt;/pre&gt;

&lt;p&gt;Integers can have methods too:&lt;/p&gt;

&lt;pre&gt;&amp;gt; i = 0
&amp;gt; 4.times do
&amp;gt;     puts "#{i%2 == 0 ? 'heeey' : 'hoooo'}" 
&amp;gt;   i += 1
&amp;gt; end
heeey
hoooo
heeey
hoooo&lt;/pre&gt;

&lt;p&gt;In addition to the expected “primitive” types, Ruby provides other base types such as arrays and hashes.  The hash will become particularly relevant later on in this post, but for now, I’ll just share the syntax:&lt;/p&gt;

&lt;pre&gt;&amp;gt; document = {}
 =&amp;gt; {} 
&amp;gt; document["id"] = "emilys"
 =&amp;gt; "emilys" 
&amp;gt; document
 =&amp;gt; {"id"=&amp;gt;"emilys"} &lt;/pre&gt;

&lt;p&gt;Ruby software engineers strive to embrace the OO nature of the language, sometimes to the extreme.  The language is strongly and dynamically typed.  This allows the Rubyist to design software that is highly modular and that focuses on duck-typing&amp;#8212; i.e. taking advantage of object behavior rather than statically-defined types.  A good Rubyist aims to reduce dependencies and increase the flexibility of code.  For example, the language doesn’t support multi-inheritance, but it does provide something called modules, which essentially are a grouping of common methods in a class that cannot be instantiated.  This module can be “mixed in” to a class to give it extra functionality.  All of these characteristics together&amp;#8212; dynamic, object-oriented, flexible, modular, make Ruby code a pleasure to write and maintain.&lt;/p&gt;

&lt;h4&gt;Rails&lt;/h4&gt;

&lt;p&gt;I mentioned above that Ruby’s popularity really blossomed with the creation of Ruby on Rails in 2003.  The web framework was created by a web programmer, David Heinemeier Hansson (“DHH”), who noticed that the web stack was a given and developers were repeating themselves over and over in building technology to wrap it.  He decided to extract the common elements of web engineering into a modular, reusable framework.  DHH unveiled his framework in a presentation that has become iconic in web programming.  He uses a DSL to create a web application in one command and then starts up a server.&lt;/p&gt;

&lt;pre&gt;&amp;gt; rails new my_app
&amp;gt; cd my_app
&amp;gt; rails server&lt;/pre&gt;

&lt;p&gt;So why did Rails become so popular?  Rails was built to make web programming faster, easier, and more manageable.  By introducing a number of conventions and sticking to OO, web programmers could go from 0 to a full working app in a relatively short amount of time with little configuration.  By the same token, they could take an existing app and quickly understand the codebase enough to maintain and develop it.  We’ve already discussed the approachability of the Ruby programming language, and Rails follows many of the same principles.  Rails a solution for making web programming simpler.&lt;/p&gt;

&lt;p&gt;I teach a Ruby on Rails class at Columbia University, and I often tell my students that Ruby on Rails is the gateway drug to web programming. It makes web development more accessible to a newcomes.&lt;/p&gt;

&lt;h4&gt;MongoDB&lt;/h4&gt;

&lt;p&gt;MongoDB is a document database that focuses on developer needs.  (Notice a common theme yet?)  There’s no need for an army of database administrators to maintain a MongoDB cluster and the database’s flexibility allows for application developers to define and manipulate a schema themselves instead of relying on a separate team of dedicated engineers.  Assuming that the many advantages of using MongoDB are familiar to you, it might seem natural for all Rails and Ruby developers to choose MongoDB as their first choice of datastore.  Unfortunately, MongoDB is far from the default.&lt;/p&gt;

&lt;h4&gt;The Object-relational impedance mismatch and Active Record&lt;/h4&gt;

&lt;p&gt;The Active Record pattern describes the mapping of an object instance to a row in a relational database table, using accessor methods to retrieve columns/properties, and the ability to create, update, read, and delete entities from the database.  It was first named by Martin Fowler in his book, Patterns of Enterprise Application Architecture.&lt;br/&gt;
The pattern has numerous limitations referred to as the Object-relational impedance mismatch.  Some of these technical difficulties are structural.  In OO programming, objects may be composed of other objects.  The Active Record pattern maps these sub-objects to separate tables, thus introducing issues concerning the representation of relationships and encapsulated data.  Rails’ biggest contribution to web programming was arguably not the framework itself, but rather its Object-relational-mapper, Active Record.  Active Record uses macros to create relationships between objects and single table inheritance to represent inheritance.&lt;br/&gt;
The best solution to-date for the Object-relational impedance mismatch is Active Record, but this is assuming your datastore is relational.  It’s also, fundamentally, a hack.  What if we were to use a more OO datastore?&lt;/p&gt;

&lt;h4&gt;MongoDB and Rails take on the Object relational impedance mismatch&lt;/h4&gt;

&lt;p&gt;As we see massive growth in data, an increased diversity of content, and a demand for shorter development cycles, the infrastructure developers rely upon must rise to meet new challenges that traditional technologies were not designed to address.  MongoDB has gained immense popularity because it fills many of the strongest modern technical demands, while still being developer-friendly and low-cost.&lt;/p&gt;

&lt;p&gt;Given all that has been discussed regarding Rails and Ruby, wouldn’t it make sense to use MongoDB with Rails?  The answer is yes: it makes a lot of sense.  Nevertheless, Rails wasn’t originally built to use a document database so you must use a separate gem in place of Active Record.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://mongomapper.com/"&gt;MongoMapper&lt;/a&gt; and &lt;a href="http://mongoid.org/en/mongoid/index.html"&gt;Mongoid&lt;/a&gt; are the two leading gems that make it possible use MongoDB as a datastore with Rails. MongoMapper, a project by Jon Nunemaker from Github, is a simple ORM for MongoDB. Mongoid, in particular, has become quite popular since its creation 4 years ago by Durran Jordan.  Mongoid’s goal is to provide a familiar API to Active Record, while still leveraging MongoDB’s schema flexibility, document design, atomic modifiers, and rich query interface.&lt;/p&gt;

&lt;p&gt;It’s largely due to these two gems that MongoDB can credit its traction in the Rails and Ruby community.  In the past, Rails developers had to jump through a number of hoops in order to use one of these alternate ODMs with Rails, but the web framework then further modularized the database abstraction layer (the M component in MVC) to make it possible for a Rails developer to create an app without Active Record.  Now all you have to do is:&lt;/p&gt;

&lt;pre&gt;&amp;gt; rails new my_app --skip-active-record
&amp;gt; cd my_app
&amp;gt; [edit Gemfile and add mongoid or mongo_mapper]
&amp;gt; bundle install
&amp;gt; rails server&lt;/pre&gt;

&lt;p&gt;To further illustrate a brief example using Mongoid, Rails model files are altered to not have the classes inherit from ActiveRecord::Base, must include a module Mongoid::Document, and define the schema in the actual file.  Database migrations are not necessary!&lt;/p&gt;

&lt;pre&gt;class Course
  include Mongoid::Document
  field :name, type: String
  embeds_many :lectures
end&lt;/pre&gt;

&lt;p&gt;Additionally, you have a number of configuration options available, such as &lt;code&gt;allow_dynamic_fields&lt;/code&gt; that allows you to define attributes on an object that aren’t in the model file’s schema.  You can then add some logic in your model file if you need to do something different depending on the existence or absence of this field.&lt;/p&gt;

&lt;p&gt;I’m not going to go into too much detail on using Rails with MongoDB, because that’s a whole blog post in itself and both MongoMapper and Mongoid’s docs are fantastic.  Instead, it’d be worth devoting a paragraph or two talking about the future of Ruby and MongoDB.&lt;/p&gt;

&lt;h4&gt;Future&lt;/h4&gt;

&lt;p&gt;Rails is not required to use MongoDB with Ruby.  You can use either of those two gems or the 10gen Ruby driver directly in another framework, such as Sinatra, or in the context of any other Ruby program.  This is where the base Hash class in the Ruby language is relevant: one of the many roles of a MongoDB driver is to serialize/deserialize BSON documents into some native representation of a document in the given language.  Luckily, Ruby’s Hash class is both idiomatically familiar to Rubyists and a very close representation of a document.  See for yourself:&lt;/p&gt;

&lt;p&gt;MongoDB document representing a tweet:&lt;/p&gt;

&lt;pre&gt;{
    "_id" : ObjectId("51073d4c4eeb4f4247b5c8f9"),
    "text" : "Just saw Star Trek.  It was the best Star Trek movie yet!",
    "created_at" : "Wed May 15 19:06:41 +0000 2010",
    "entities" : {
        "user_mentions" : [
            {
                "indices" : [
                    7,
                    20
                ],
                "screen_name" : "davess",
                "id" : 17916546
            }
        ],
        "urls" : [ ],
        "hashtags" : [ ]
    },
    "retweeted" : false,
    "user" : {
        "location" : "United States",
        "created_at" : "Wed Apr 01 10:14:11 +0000 2009",
        "description" : "MongoDB Ruby driver engineer, Adjunct faculty at Columbia",
        "time_zone" : "New York",
        "screen_name" : "EmilyS",
        "lang" : "en",
        "followers_count" : 152,
    },
    "favorited" : false,
}&lt;/pre&gt;

&lt;p&gt;The corresponding Ruby hash:&lt;/p&gt;

&lt;pre&gt;{
    "_id" =&amp;gt; ObjectId("51073d4c4eeb4f4247b5c8f9"),
    "text" =&amp;gt; "Just saw Star Trek.  It was the best Star Trek movie yet!",
    "created_at" =&amp;gt; "Wed May 15 19:06:41 +0000 2010",
    "entities" =&amp;gt; {
        "user_mentions" =&amp;gt; [
            {
                "indices" =&amp;gt; [
                    7,
                    20
                ],
                "screen_name" =&amp;gt; "davess",
                "id" =&amp;gt; 17916546
            }
        ],
        "urls" =&amp;gt; [ ],
        "hashtags" =&amp;gt; [ ]
    },
    "retweeted" =&amp;gt; false,
    "user" =&amp;gt; {
        "location" =&amp;gt; "United States",
        "created_at" =&amp;gt; "Wed Apr 01 10:14:11 +0000 2009",
        "description" =&amp;gt; "MongoDB Ruby driver engineer",
        "time_zone" =&amp;gt; "New York",
        "screen_name" =&amp;gt; "EmilyS",
        "lang" =&amp;gt; "en",
        "followers_count" =&amp;gt; 152,
    },
    "favorited" =&amp;gt; false,
}&lt;/pre&gt;

&lt;p&gt;It can’t get any closer than that.  Whether they are simple hashes serialized using the driver directly or instantiated classes persisted through an ODM, Ruby objects map seamlessly to MongoDB documents.&lt;/p&gt;

&lt;p&gt;Note: Mongoid has it’s own driver, called &lt;em&gt;moped&lt;/em&gt;, as of version 3.x.  Therefore, if you’re on Mongoid 3.x, you’re not using 10gen’s Ruby driver.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/adebe5811ac0287464b19a083418cc66/tumblr_inline_mok7whsNAH1qz4rgp.jpg" alt=""/&gt;&lt;/p&gt;

&lt;p&gt;MongoMapper, on the other hand, does use the 10gen driver in all versions.&lt;/p&gt;

&lt;h4&gt;Mongoid, MongoMapper and Beyond&lt;/h4&gt;

&lt;p&gt;Given how passionate the Ruby team at 10gen feels about the language being one of the best fits for MongoDB, we want to strengthen our relationship with the Ruby community.  We are always looking for opportunities to support even more Rubyists and thus have been working with Durran Jordan to build a new bson and mongo gem that Mongoid will use in the near future.  We’ve also see great adoption of MongoMapper and hope that more Rubyists, specifically Rails developers, will benefit from the continued improvement and collaboration on these open source projects.&lt;/p&gt;</description><link>http://blog.mongodb.org/post/53271876885</link><guid>http://blog.mongodb.org/post/53271876885</guid><pubDate>Tue, 18 Jun 2013 08:00:00 -0400</pubDate></item><item><title>The MEAN Stack: Mistakes You're Probably Making With MongooseJS, And How To Fix Them</title><description>&lt;p&gt;&lt;em&gt;This is a guest post from Valeri Karpov, a MongoDB Hacker and co-founder of &lt;/em&gt;&lt;a href="http://ascotproject.com/" target="_blank"&gt;the Ascot Project&lt;/a&gt;&lt;span&gt;.&lt;/span&gt;&lt;em&gt;  For more MEAN Stack wisdom, check out his blog at &lt;/em&gt;&lt;a href="https://thecodebarbarian.wordpress.com/" target="_blank"&gt;TheCodeBarbarian.com&lt;/a&gt;&lt;span&gt;.  &lt;/span&gt;&lt;em&gt;Valeri originally coined the term MEAN Stack while writing for the MongoDB blog, and you can find that post &lt;/em&gt;&lt;a href="http://blog.mongodb.org/post/49262866911/the-mean-stack-mongodb-expressjs-angularjs-and" target="_blank"&gt;here&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If you’re familiar with Ruby on Rails and are using MongoDB to build a NodeJS app, you might miss some slick ActiveRecord features, such as declarative validation. Diving into most of the basic tutorials out there, you’ll find that many basic web development tasks are more work than you like. For example, if we borrow the style of &lt;a href="http://howtonode.org/express-mongodb,"&gt;&lt;a href="http://howtonode.org/express-mongodb,"&gt;http://howtonode.org/express-mongodb,&lt;/a&gt;&lt;/a&gt; a route that pulls a document by its ID will look something like this:&lt;/p&gt;
&lt;pre&gt;app.get('/document/:id', function(req, res) { 
  db.collection('documents', function(error, collection) {
    collection.findOne({ _id : collection.db.bson_serializer.ObjectID.createFromHexString(req.params.id) },
        function(error, document) {
          if (error || !document) {
            res.render('error', {});
          } else { 
            res.render('document', { document : document });
          }
        });
  });
});
&lt;/pre&gt;
&lt;p&gt;In my last guest post &lt;a href="http://blog.mongodb.org/post/49262866911/the-mean-stack-mongodb-expressjs-angularjs-and"&gt;MongoDB&lt;/a&gt; I touched on MongooseJS, a schema and usability wrapper for MongoDB in NodeJS. MongooseJS was developed by &lt;a href="http://learnboost.com/"&gt;LearnBoost&lt;/a&gt;, an education startup based in San Francisco, and maintained by 10gen. MongooseJS lets us take advantage of MongoDB’s flexibility and performance benefits while using development paradigms similar to Ruby on Rails and ActiveRecord. In this post, I’ll go into more detail about how &lt;a href="http://www.ascotproject.com"&gt;The Ascot Project&lt;/a&gt; uses Mongoose for our data, some best practices we’ve learned, and some pitfalls we’ve found that aren’t clearly documented.&lt;/p&gt;
&lt;p&gt;Before we dive into the details of working with Mongoose, let’s take a second to define the primary objects that we will be using. Loosely speaking, Mongoose’s schema setup is defined by 3 types: Schema, Connection, and Model.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;
&lt;p&gt;A Schema is an object that defines the structure of any documents that will be stored in your MongoDB collection; it enables you to define types and validators for all of your data items.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A Connection is a fairly standard wrapper around a database connection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A Model is an object that gives you easy access to a named collection, allowing you to query the collection and use the Schema to validate any documents you save to that collection. It is created by combining a Schema, a Connection, and a collection name.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Finally, a Document is an instantiation of a Model that is tied to a specific document in your collection.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Okay, now we can jump into the dirty details of MongooseJS. Most MongooseJS apps will start something like this:&lt;/p&gt;
&lt;pre&gt;var Mongoose = require('mongoose'); var myConnection = Mongoose.createConnection('localhost', 'mydatabase');

var MySchema = new Mongoose.schema({ name : {
    type : String,
    default : 'Val',
    enum : ['Val', 'Valeri', 'Valeri Karpov']
  },
created : {
    type : Date,
    default : Date.now
  }
}); var MyModel = myConnection.model('mycollection', MySchema);
var myDocument = new MyModel({});
&lt;/pre&gt;
&lt;p&gt;What makes this code so magical? There are 4 primary advantages that Mongoose has over the default MongoDB wrapper:&lt;/p&gt;
&lt;p&gt;&lt;span&gt;1. MongoDB uses named collections of arbitrary objects, and a Mongoose JS Model abstracts away this layer. Because of this, we don’t have to deal with tasks such as asynchronously telling MongoDB to switch to that collection, or work with the annoying createFromHexString function. For example, in the above code, loading and displaying a document would look more like:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;app.get('/document/:id', function(req, res) { 
  Document.findOne({ _id : req.params.id }, function(error, document) {
    if (error || !document) {
      res.render('error', {});
    } else { 
      res.render('document', { document : document });
    }
  });
});&lt;/pre&gt;
&lt;p&gt;2. Mongoose Models handle the grunt work of setting default values and validating data. In the above example myDocument.name = ‘Val’, and if we try to save with a name that’s not in the provided enum, Mongoose will give us back a nice error. If you want to learn a bit more about the cool things you can do with Mongoose validation, you can check out my blog post on how to integrate Mongoose validation with [AngularJS] (&lt;a href="http://thecodebarbarian.wordpress.com/2013/05/12/how-to-easily-validate-any-form-ever-"&gt;&lt;a href="http://thecodebarbarian.wordpress.com/2013/05/12/how-to-easily-validate-any-form-ever-"&gt;http://thecodebarbarian.wordpress.com/2013/05/12/how-to-easily-validate-any-form-ever-&lt;/a&gt;&lt;/a&gt;&lt;span&gt; using-angularjs/).&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;span&gt;3. Mongoose lets us attach functions to our models:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;MySchema.methods.greet = function() { return 'Hello, ' + this.name; }; &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;span&gt;&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;4. Mongoose handles limited sub-document population using manual references (i.e. no MongoDB DBRefs), which gives us the ability to mimic a familiar SQL join. For example:&lt;/p&gt;
&lt;pre&gt;var UserGroupSchema = new Mongoose.schema({ 
  users : [{ type : Mongoose.Schema.ObjectId, ref : 'mycollection' }]
}); 

var UserGroup = myConnection.model('usergroups', UserGroupSchema);
var group = new UserGroup({ users : [myDocument._id] });
group.save(function() {
  UserGroup.find().populate('users').exec(function(error, groups) { 
    // Groups contains every document in usergroups with users field populated // Prints 'Val' 
    console.log(groups[0][0].name)
  });
});
&lt;/pre&gt;
&lt;p&gt;In the last few months, my team and I have learned a great deal about working with Mongoose and using it to open up the true power of MongoDB. Like most powerful tools, it can be used well and it can be used poorly, and unfortunately a lot of the examples you can find online fall into the latter. Through trial and error over the course of Ascot’s development, my team has settled on some key principles for using Mongoose the right way:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;1 Schema = 1 file&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;A schema should never be declared in app.js, and you should never have multiple schemas in a single file (even if you intend to nest one schema in another). While it is often expedient to inline everything into app.js, not keeping schemas in separate files makes things more difficult in the long run. Separate files lowers the barrier to entry for understanding your code base and makes tracking changes much easier.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Mongoose can’t handle multi-level population yet, and populated fields are not Documents. Nesting schemas is helpful but it’s an incomplete solution. Design your schemas accordingly.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Let’s say we have a few interconnected Models:&lt;/p&gt;
&lt;pre&gt;var ImageSchema = new Mongoose.Schema({
  url : { type : String}, 
  created : { type : Date, default : Date.now }
}); 
var Image = db.model('images', ImageSchema);

var UserSchema = new Mongoose.Schema({ 
  username : { type : String }, 
  image : { type : Mongoose.Schema.ObjectId, ref : 'images' }
}); 

UserSchema.methods.greet = function() {
  return 'Hello, ' + this.name;
};

var User = db.model('users', UserSchema);

var Group = new Mongoose.Schema({ 
  users : [{ type : Mongoose.Schema.ObjectId, ref : 'users' }]
});
&lt;/pre&gt;
&lt;p&gt;Our Group Model contains a list of Users, which in turn each have a reference to an Image. Can MongooseJS resolve these references for us? The answer, it turns out, is yes and no.&lt;/p&gt;
&lt;pre&gt;Group.
  find({}).
  populate('user').
  populate('user.image').
  exec(function(error, groups) {
    groups[0].users[0].username; // OK 
    groups[0].users[0].greet(); // ERROR – greet is undefined
    
    groups[0].users[0].image; // Is still an object id, doesn't get populated
    groups[0].users[0].image.created; // Undefined
  });
&lt;/pre&gt;
&lt;p&gt;In other words, you can call ‘populate’ to easily resolve an ObjectID into the associated object, but you can’t call ‘populate’ to resolve an ObjectID that’s contained in that object. Furthermore, since the populated object is not technically a Document, you can’t call any functions you attached to the schema. Although this is definitely a severe limitation, it can often be avoided by the use of nested schemas. For example, we can define our UserSchema like this:&lt;/p&gt;
&lt;pre&gt;var UserSchema = new Mongoose.Schema({
  username : { type : String }, 
  image : [ImageSchema]
});
&lt;/pre&gt;
&lt;p&gt;In this case, we don’t have to call ‘populate’ to resolve the image. Instead, we can do this:&lt;/p&gt;
&lt;pre&gt;Group.
  find({}).
  populate('user').
  exec(function(error, groups) {
    groups[0].users[0].image.created; // Date associated with image
  });&lt;/pre&gt;
&lt;p&gt;However, nested schemas don’t solve all of our problems, because we still don’t have a good way to handle many-to-many relationships. Nested schemas are an excellent solution for cases where the nested schema can only exist when it belongs to exactly one of a parent schema. In the above example, we implicitly assume that a single image belongs to exactly one user – no other user can reference the exact same image object.&lt;/p&gt;
&lt;p&gt;For instance, we shouldn’t have UserSchema as a nested schema of Group’s schema, because a User can be a part of multiple Groups, and thus we’d have to store separate copies of a single User object in multiple Groups. Furthermore, a User ought to be able to exist in our database without being part of any groups.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Declare your models exactly once and use dependency injection; never declare them in a routes file.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is best expressed in an example:&lt;/p&gt;
&lt;pre&gt;// GOOD exports.listUsers = function(User) {
  return function(req, res) {
    User.find({}, function(error, users) {
      res.render('list_users', { users : users });
    });
  }
};
&lt;/pre&gt;
&lt;pre&gt;// BAD 
var db = Mongoose.createConnection('localhost', 'database');
var Schema = require('../models/User.js').UserSchema; 
var User = db.model('users', Schema); 

exports.listUsers = return function(req, res) {
  User.find({}, function(error, users) {
    res.render('list_users', { users : users });
  });
};
&lt;/pre&gt;
&lt;p&gt;The biggest problem with the “bad” version of listUsers shown above is that if you declare your model at the top of this particular file, you have to define it in every file where you use the User model. This leads to a lot of error-prone find-and-replace work for you, the programmer, whenever you want to do something like rename the Schema or change the collection name that underlies the User model.&lt;/p&gt;
&lt;p&gt;Early in Ascot’s development we made this mistake with a single file, and ended up with a particularly annoying bug when we changed our MongoDB password several months later. The proper way to do this is to declare your Models exactly once, include them in your app.js, and pass them to your routes as necessary.&lt;/p&gt;
&lt;p&gt;In addition, note that the “bad” listUsers is impossible to unit test. The User schema in the “bad” example is inaccessible through calls to require, so we can’t mock it out for testing. In the “good” example, we can write a test easily using Nodeunit:&lt;/p&gt;
&lt;pre&gt;var UserRoutes = require('./routes/user.js');

exports.testListUsers = function(test) {
  mockUser.collection = [{ name : 'Val' }]; 
  var fnToTest = UserRoutes.listUsers(mockUser);
  fnToTest( {},
    { render : function(view, params) {
        test.equals(mockUser.collection, params.users); test.done();
      }
    });
};
&lt;/pre&gt;
&lt;p&gt;And speaking of Nodeunit:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Unit tests catch mistakes, encourage you to write modular code, and allow you to easily make sure your logic works. They are your friend.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I’ll be the first to say that writing unit tests can be very annoying. Some tests can seem trivial, they don’t necessarily catch all bugs, and often you write way more test code than actual production code. However, a good suite of tests can save you a lot of worry; you can make changes and then quickly verify that you haven’t broken any of your modules. Ascot Project currently uses Nodeunit for our backend unit tests; Nodeunit is simple, flexible, and works well for us.&lt;/p&gt;
&lt;p&gt;And there you have it! Mongoose is an excellent library, and if you’re using MongoDB and NodeJS, you should definitely consider using it. It will save you from writing a lot of extra code, it’ll handle some basic population, and it’ll handle all your validation and object creation grunt work. This adds up to more time spent building awesome stuff, and less time trying to figure out how to get your database interface to work.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Have any questions about the code featured in this post? Want to suggest a better approach? Feel like telling me why the MEAN Stack is the worst thing that ever happened in the history of the world and how horrible I am? Go ahead and leave a comment below, or shoot me an email at valkar207@gmail.com and I’ll do my best to answer any questions you might have. You can also find me on github at &lt;a href="https://github.com/vkarpov15."&gt;&lt;a href="https://github.com/vkarpov15."&gt;https://github.com/vkarpov15.&lt;/a&gt;&lt;/a&gt; My current venture is called The Ascot Project, and you can find that over at &lt;a href="http://www.AscotProject.com"&gt;&lt;a href="http://www.AscotProject.com"&gt;www.AscotProject.com&lt;/a&gt;&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</description><link>http://blog.mongodb.org/post/52299826008</link><guid>http://blog.mongodb.org/post/52299826008</guid><pubDate>Thu, 06 Jun 2013 09:57:00 -0400</pubDate><category>node.js</category><category>node.js tutorial</category><category>node.js database</category><category>mongoose model</category><category>mongoose</category></item><item><title>Integrating MongoDB Text Search with a Python App</title><description>&lt;p&gt;&lt;em&gt;By Mike O’Brien, 10gen Software engineer and maintainer of &lt;a href="https://github.com/mongodb/mongo-hadoop"&gt;Mongo-Hadoop&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;With the release of MongoDB 2.4, it’s now pretty simple to take an existing application that already uses MongoDB and add new features that take advantage of text search. Prior to 2.4, adding text search to a MongoDB app would have required writing code to interface with another system like Solr, Lucene, ElasticSearch, or something else. Now that it’s integrated with the database we are already using, we can accomplish the same result with reduced complexity, and fewer moving parts in the deployment.&lt;/p&gt;
&lt;p&gt;Here we’ll go through a practical example of adding text search to &lt;a href="http://planet.mongodb.org/"&gt;Planet MongoDB&lt;/a&gt;, our blog aggregator site.&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;Planet MongoDB is built in Python, uses the excellent Flask web framework, and stores feed content in a collection called &lt;code&gt;posts&lt;/code&gt;. We’ll add some code that enables us to search over &lt;code&gt;posts&lt;/code&gt; for any keyword terms we want. As you’ll see, the amount of code and configuration that needs to be added to accomplish this is quite small.&lt;/p&gt;
&lt;h4&gt;Initial Setup&lt;/h4&gt;
&lt;p&gt;Before you can actually use any text search features, you have to explicitly enable it. You can do this by just restarting &lt;code&gt;mongod&lt;/code&gt; with the additional command line options &lt;code&gt;--setParameter textSearchEnabled=true&lt;/code&gt;, or just from the mongo shell by running &lt;code&gt;db.runCommand({setParameter:1, textSearchEnabled:true})&lt;/code&gt;. Since you’re hopefully developing and testing on a different database than you use for production, don’t forget to do this on both.&lt;/p&gt;
&lt;h4&gt;Creating Indexes&lt;/h4&gt;
&lt;p&gt;The next critical step is to create the text search index on the field you want to make searchable. In our case, we want our searches to be able to find hits in the article titles as well as the content. However, since the article titles are more prominent, we want to consider matches in the title to rank a bit higher overall in the search than matches in the content body. We can do this by setting &lt;strong&gt;weights&lt;/strong&gt; on the fields.&lt;/p&gt;
&lt;p&gt;To do this, we’ll add a line of python code to the application that is executed upon startup which creates the index we need, if it doesn’t already exist:&lt;/p&gt;
&lt;pre&gt;db.posts.ensure_index([
      ('body', 'text'),
      ('title', 'text'),
  ],
  name="search_index",
  weights={
      'title':100,
      'body':25
  }
)
&lt;/pre&gt;
&lt;h4&gt;Running searches&lt;/h4&gt;
&lt;p&gt;At this point, we now have a collection of data, and we’ve created a text index that can be used to do searches on arbitrary keywords. We just need to write some code that will actually run searches and render the results.&lt;/p&gt;
&lt;p&gt;Unlike regular MongoDB queries, text search is implemented as a special command that returns a document containing a ‘results’ field, an array of the highest-scoring documents that matched. To use it, run the command with the additional field &lt;code&gt;search&lt;/code&gt; which contains the keywords to match against. To use this in the app, we just grab the request parameter containing what the user typed into the search box and pass it as an argument to the text search command, and then render a page containing the search results.&lt;/p&gt;
&lt;pre&gt;@app.route('/search')
def search():
    query = request.form['q']
    text_results = db.command('text', 'posts', search=query, limit=SEARCH_LIMIT)
    doc_matches = (res['obj'] for res in text_results['results'])
    return render_template("search.html", results=results)
&lt;/pre&gt;
&lt;h4&gt;Filtering&lt;/h4&gt;
&lt;p&gt;In addition to finding docs that match text queries, you may want to filter the result set even further based on other criteria and fields in the documents. To do this, add a &lt;code&gt;filter&lt;/code&gt; field to the text search command containing the additional filtering logic, in the exact same style as a regular &lt;code&gt;find()&lt;/code&gt; query. In this case, we want to restrict the results to only the blog posts that are related to MongoDB, which is determined by a field in the posts called &lt;code&gt;related&lt;/code&gt;. Modifying the call to &lt;code&gt;db.command&lt;/code&gt; to include this, we get:&lt;/p&gt;
&lt;pre&gt;text_results = db.command('text', 'posts', search=query, filter={'related':True}, limit=SEARCH_LIMIT)
&lt;/pre&gt;
&lt;h4&gt;Pagination&lt;/h4&gt;
&lt;p&gt;In practice, most applications want to just show a few results on a page at a time, and then provide some kind of “previous/next” links to navigate through multiple pages of matches. We can tweak the existing code to accomplish this too, by adding a parameter &lt;code&gt;page&lt;/code&gt; to indicate where we are in the results, and rendering 10 results at a time.&lt;/p&gt;
&lt;p&gt;So now, we’ll parse out the &lt;code&gt;page&lt;/code&gt; param and slice out the necessary items from the array returned in &lt;code&gt;results&lt;/code&gt;, using an additional arg &lt;code&gt;limit&lt;/code&gt; to return only as many documents as needed. On the results page, we can then just generate a link to the next page of results by constructing the same search link but incrementing &lt;code&gt;page&lt;/code&gt; in the Jinja template.&lt;/p&gt;
&lt;pre&gt;PAGE_SIZE = 10
try:
    page = int(request.args.get("page", 0))
except:
    page = 0

start = page * PAGE_SIZE
end = (page + 1) * PAGE_SIZE
text_results = db.command('text', 'posts', search=query, filter={'related':True}, limit=end)
doc_matches = text_results[start:end]
&lt;/pre&gt;
&lt;h5&gt;Wrap-up&lt;/h5&gt;
&lt;p&gt;The rest of the work to be done to finish up is all on the user-interface side. We add a &lt;code&gt;form&lt;/code&gt; with a single &lt;code&gt;input&lt;/code&gt; element for the user to type in the query, and write the code to display the posts returned in the text search command, and it’s already up and running. Although it was very quick and easy to add a functional text-search feature to the app, this only scratches the surface of how it all works. To learn more, refer to the &lt;a href="http://docs.mongodb.org/manual/core/text-search/"&gt;docs on text search&lt;/a&gt;.&lt;/p&gt;</description><link>http://blog.mongodb.org/post/52139821470</link><guid>http://blog.mongodb.org/post/52139821470</guid><pubDate>Tue, 04 Jun 2013 10:14:00 -0400</pubDate><category>python</category><category>text search</category><category>text</category><category>indexing</category><category>blog</category><category>aggregator</category><category>mongodb search</category><category>mongodb</category><category>mongodb mongodb mongodb</category></item><item><title>Go Agent, Go</title><description>&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=5786605" target="_blank"&gt;&lt;em&gt;Discuss on Hacker News&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;10gen introduced &lt;a href="http://www.10gen.com/products/mongodb-backup-service"&gt;MongoDB Backup Service&lt;/a&gt; in early May. Creating a backup service for MongoDB was a new challenge, and we used the opportunity to explore new technologies for our stack. The final implementation of the MongoDB Backup Service agent is written in &lt;a href="http://golang.org/"&gt;Go&lt;/a&gt;, an open-source, natively executable language initiated and maintained by Google.&lt;/p&gt;
&lt;h4&gt;Why did we Go with Go?&lt;/h4&gt;
&lt;p&gt;&lt;a href="http://www.10gen.com/products/mongodb-backup-service"&gt;The Backup Service&lt;/a&gt; started as a Java project, but as the project matured, the team wanted to move to a language that compiled natively on the machine. After considering a few options, the team decided that Go was the best fit for its C-like syntax, strong standard library, the resolution of concurrency problems via goroutines, and painless multi-platform distribution.&lt;!-- more --&gt;&lt;/p&gt;
&lt;h4&gt;mgo&lt;/h4&gt;
&lt;p&gt;As an open-source company, 10gen is fortunate to work with MongoDB developers around the world who build open-source tools for new and emerging languages to provide users with a breadth of options to access MongoDB. One of the &lt;a href="http://www.mongodb.org/about/community/masters/"&gt;MongoDB Masters&lt;/a&gt;, &lt;a href="https://twitter.com/gniemeyer"&gt;Gustavo Niemeyer&lt;/a&gt;, has spent over two years building &lt;a href="http://labix.org/mgo"&gt;mgo&lt;/a&gt;, the MongoDB driver for Go. In that time he’s developed a great framework for accessing MongoDB through Go and Gustavo has been a valuable resource as we’ve built out the Backup Service. &lt;a href="http://labix.org/tmp/gopher-powered-backups.html"&gt;In his own words&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;“It’s great to see not only 10gen making good use of the Go language for first-class services, but contributing to that community of developers by providing its support for the development of the Go driver in multiple ways.”&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Programming the backup agent in Go and the mgo driver has been extremely satisfying. Between the lightweight syntax, the first-class concurrency and the well documented, idiomatic libraries such as mgo, Go is a great choice for writing anything from small scripts to large distributed applications.&lt;/p&gt;
&lt;p&gt;Starting a Java project often begins with a group debate: “Maven or Ant? JUnit or TestNG? Spring or Guice?” Go has a number of conventions through which Go team has created a sensible, uniform development experience with the holy trinity of tools: go build, test and fmt.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;
&lt;p&gt;The organization of source code and libraries is standardized to allow using the &lt;code&gt;go build&lt;/code&gt; tool. &lt;a href="http://golang.org/doc/code.html#Organization"&gt;See details here&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Name test files as XXX_test.go with functions named TestXXX can be run automatically with &lt;code&gt;go test&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Braces are required on if statements and the first brace goes along with the if condition. E.g.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;pre&gt;if x {
     doSomething()
}&lt;/pre&gt;
&lt;p&gt;instead of:&lt;/p&gt;
&lt;pre&gt;if x 
{
    doSomething()
}&lt;/pre&gt;
&lt;ul&gt;&lt;li&gt;Methods that end with an f (e.g. Printf, Fatalf) means a string formatted method will be validated in &lt;code&gt;go vet&lt;/code&gt; that the number of substitutions (e.g. %v) matches the number of inputs to the function.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;mgo is a real pleasure to use with high-quality code, thorough documentation and an API that is a thoughtful, natural blend of idiomatic Go and MongoDB. &lt;span&gt;Our team owes a lot of thanks to Gustavo for his hard work on this project.&lt;/span&gt;&lt;span&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;There are other Go projects being explored at the moment and we hope to see more people using mgo in production going forward.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;By the 10gen Backup Team&lt;/em&gt;&lt;/p&gt;</description><link>http://blog.mongodb.org/post/51643994762</link><guid>http://blog.mongodb.org/post/51643994762</guid><pubDate>Wed, 29 May 2013 10:32:00 -0400</pubDate><category>golang</category><category>mongodb</category><category>concurrency</category><category>conventions</category><category>backup</category><category>go language</category><category>go programming language</category><category>programming language go</category></item><item><title>MongoDB's New Matcher</title><description>&lt;p&gt;&lt;em&gt;&lt;a href="https://news.ycombinator.com/item?id=5781774" target="_blank"&gt;Discuss on Hacker News&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;MongoDB 2.5.0 (an unstable dev build) has a &lt;/span&gt;&lt;a href="https://jira.mongodb.org/browse/SERVER-6400"&gt;new implementation of the “Matcher”&lt;/a&gt;&lt;span&gt;. The old Matcher is the bit of code in Mongo that takes a query and decides if a document matches a query expression. It also has to understand indexes so that it can do things like create a subsets of queries suitable for index covering. However, the structure of the Matcher code hasn’t changed significantly in more than four years and until this release, it lacked the ability to be easily extended. It was also structured in such a way that its knowledge could not be reused for query optimization. It was clearly ready for a rewrite.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The “New Matcher” in 2.5.0 is a total rewrite. It contains three separate pieces: an abstract syntax tree (hereafter ‘AST’) for expression match expressions, a parser from BSON into said AST, and a Matcher API layer that simulates the old Matcher interface while using all new internals. This new version is much easier to extend, easier to reason about, and will allow us to use the same structure for matching as for query analysis and rewriting.&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;This matcher rewrite is part of a larger project to restructure query execution, to optimize them, and to lay the groundwork for more advanced queries in the future. One planned optimization is index intersection. For example, if you have an index on each of ‘a’ and ‘b’ attributes, we want a query of the form { a&amp;#160;: 5 , b&amp;#160;: 6 } to do an index intersection of the two indexes rather than just use one index and discard the documents from that index that don’t match. Index intersection would also be suitable for merging geo-spatial, text and regular indexes together in fun and interesting ways (i.e. a query to return all the users in a 3.5 mile radius of a location with a greater than #x# reputation who are RSVP’ed ‘yes’ for an event).&lt;/p&gt;
&lt;p&gt;A good example of an extension we’d like to enable is self referential queries, such as finding all documents where a = b + c. (This would be written { a&amp;#160;: { $sum&amp;#160;: [ “$b” , “$c” ] } }). With the new Matcher, such queries are easy to implement as a native part of the language.&lt;/p&gt;
&lt;p&gt;Now that the Matcher re-write is ready for testing, we’d love people to help test it by trying out MongoDB 2.5.0. (&lt;a href="http://docs.mongodb.org/manual/release-notes/2.6/"&gt;Release Notes&lt;/a&gt;)&lt;/p&gt;
&lt;h3&gt;Code&lt;/h3&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="https://github.com/mongodb/mongo/blob/master/src/mongo/db/matcher/expression.h"&gt;AST Root&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/mongodb/mongo/blob/master/src/mongo/db/matcher/expression_parser.h"&gt;Parser Root&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;em&gt;By Eliot Horowitz, 10gen CTO, MongoDB core contributor. You can find the original post on &lt;a href="http://www.eliothorowitz.com/blog/2013/05/24/mongos-new-matcher/" target="_blank"&gt;his personal blog&lt;/a&gt;. &lt;/em&gt;&lt;/p&gt;</description><link>http://blog.mongodb.org/post/51574091391</link><guid>http://blog.mongodb.org/post/51574091391</guid><pubDate>Tue, 28 May 2013 14:33:00 -0400</pubDate><category>mongodb</category><category>mongodb 2.5.0</category><category>matcher</category><category>query</category><category>query optimization</category><category>BSON</category></item><item><title>New Geo Features in MongoDB 2.4 </title><description>&lt;h3&gt;&lt;span&gt;Motivation&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Geometric processing as a field of study has many applications, and has resulted in lots of research, and powerful tools. Many modern web applications have location based components, and require a data storage engines capable of managing geometric information. Typically this requires the introduction of an additional storage engine into your infrastructure, which can be a time consuming and expensive operation.&lt;/p&gt;
&lt;p&gt;MongoDB has a set of geometric storage and search &lt;a href="http://docs.mongodb.org/manual/core/geospatial-indexes/"&gt;features&lt;/a&gt;. The &lt;a href="http://blog.mongodb.org/post/45754637343/mongodb-2-4-released"&gt;MongoDB 2.4 release&lt;/a&gt; brought several improvements to MongoDB’s existing geo capabilities and the introduction of the &lt;code&gt;2dsphere&lt;/code&gt; &lt;a href="http://docs.mongodb.org/manual/core/2dsphere/"&gt;index&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The primary conceptual difference (though there are also many functional differences) between the &lt;code&gt;2d&lt;/code&gt; and &lt;code&gt;2dsphere&lt;/code&gt; indexes, is the type of coordinate system that they consider. &lt;a href="http://en.wikipedia.org/wiki/Cartesian_coordinate_system" target="_blank"&gt;Planar coordinate systems&lt;/a&gt; are useful for certain applications, and can serve as a simplifying approximation of spherical coordinates. As you consider larger geometries, or consider geometries near the meridians and poles however, the requirement to use proper &lt;a href="http://en.wikipedia.org/wiki/Spherical_coordinates" target="_blank"&gt;spherical coordinates&lt;/a&gt; becomes important.&lt;/p&gt;
&lt;p&gt;In addition to this major conceptional difference, there are also significant functional differences, which are outlined in some depth in the &lt;a href="http://docs.mongodb.org/manual/reference/operator/query-geospatial/"&gt;Geospatial Indexes and Queries&lt;/a&gt; section of the MongoDB documentation. This post will discuss the new features that have been added in the 2.4 release.&lt;!-- more --&gt;&lt;/p&gt;
&lt;h3&gt;What’s New&lt;/h3&gt;
&lt;h4&gt;Storing non-point geometries&lt;/h4&gt;
&lt;p&gt;Unlike the &lt;code&gt;2d&lt;/code&gt; index, which only allowed the storage of points, the &lt;code&gt;2dsphere&lt;/code&gt; index allows the storage and querying of points, lines, and polygons. To support the storage of different geometries, instead of introducing a proprietary format, MongoDB conforms to the &lt;a href="http://geojson.org/"&gt;GeoJSON&lt;/a&gt; standard. GeoJSON is a collaborative community project that produced a specification for encoding entities in JSON. It has garnered significant support, including the &lt;a href="http://openlayers.org/dev/examples/vector-formats.html" target="_blank"&gt;OpenLayers project&lt;/a&gt;, &lt;a href="http://www.postgis.org/"&gt;PostGIS&lt;/a&gt;, and has growing language support for &lt;a href="https://pypi.python.org/pypi/geojson/1.0.1/"&gt;python&lt;/a&gt; and &lt;a href="http://rubyforge.org/projects/georuby/"&gt;ruby&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here are a few simple examples of GeoJSON embedded documents:&lt;/p&gt;
&lt;p&gt;A BSON Document with a GeoJSON Point embedded in the &lt;code&gt;geo&lt;/code&gt; field:&lt;/p&gt;
&lt;pre&gt;    {
        geo: {
            type: “Point”,
            coordinates: [100.0, 0.0]
        }
    }
&lt;/pre&gt;
&lt;p&gt;A BSON Document with a GeoJSON LineString embedded in the &lt;code&gt;geo&lt;/code&gt; field:&lt;/p&gt;
&lt;pre&gt;    {
        geo: {
            type: “LineString”,
            coordinates: [ [100.0, 0.0], [101.0, 1.0] ]
        }
    }

A BSON Document with a GeoJSON Polygon embedded in the `geo` field:

    {
        geo: {
            type: “Polygon”,
            coordinates: [
                [ [100.0, 0.0], [101.0, 0.0],
                  [101.0, 1.0], [100.0, 1.0],
                  [100.0, 0.0] ]
            ]
        }
    }
&lt;/pre&gt;
&lt;p&gt;Note: A GeoJSON Polygon’s coordinates are an array of arrays of point specifications. Each array of point specifications should have the same starting and ending point to form a closed loop. The first array of point specifications defines the polygon’s exterior geometry, and each subsequent array of point specifications defines a “hole” in the polygon. Polygons should be non self-intersecting, and holes should be fully contained by the polygon.&lt;/p&gt;
&lt;h4&gt;Inclusion searches on a sphere&lt;/h4&gt;
&lt;p&gt;The new &lt;code&gt;$geoWithin&lt;/code&gt; operator, which takes a Polygon geometry as a specifier, returns any geometries of any type that are fully contained within the polygon. It will work well without any index, but must look at every document in the collection to do so.&lt;/p&gt;
&lt;h4&gt;Intersecting geometries on a sphere&lt;/h4&gt;
&lt;p&gt;The new &lt;code&gt;$geoIntersects&lt;/code&gt; operator, which takes any geometry as a specifier, returns any geometries that have a non-empty intersection with the specifier. &lt;code&gt;$geoIntersects&lt;/code&gt; also works well without an index, and must also look at each document in the collection.&lt;/p&gt;
&lt;h4&gt;Better support for compound indexes&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;2d&lt;/code&gt; index can only be used in a compound index if 1. it is the first field, 2. there are exactly two fields in the compound index, and 3. if the second field isn’t a &lt;code&gt;2d&lt;/code&gt; index. &lt;code&gt;2dsphere&lt;/code&gt; indexes aren’t limited in this way, which allows us to pre-filter based on a non-geo field - which is often more efficient.&lt;/p&gt;
&lt;p&gt;Consider the following queries: Find me Hot Dog Stands in New York state i.e. use a compound index: (business_type, location). Find me geometries in New York state that are Hot Dog stands i.e. use the compound index: (location, business_type)&lt;/p&gt;
&lt;p&gt;The first query will be much more efficient than the second, because business_type is a simple text field, and greatly reduces the set of geometries to search.&lt;/p&gt;
&lt;p&gt;Additionally, we can have multiple &lt;code&gt;2dsphere&lt;/code&gt; indexes in the same compound index. This allows queries like: “Find routes with a start location within 50 miles from JFK, and an end location within 100 miles of YYC”.&lt;/p&gt;
&lt;h3&gt;How it Works&lt;/h3&gt;
&lt;p&gt;Everything starts when you insert a geometry into a &lt;code&gt;2dsphere&lt;/code&gt; index. We use the open source &lt;a href="https://code.google.com/p/s2-geometry-library/" target="_blank"&gt;&lt;code&gt;s2&lt;/code&gt; C++ library&lt;/a&gt; from google to select a minimal set of cells that fully cover a geometry. This set of grid cells is called a covering, and the size of the cells is dynamic (between 500m and 100km on a side) based upon the size of the polygon being covered.&lt;/p&gt;
&lt;p&gt;&lt;img alt="image" src="http://media.tumblr.com/e39f546479a86456868450b1c484ee02/tumblr_inline_mn4hfrRnhd1qz4rgp.png"/&gt;&lt;/p&gt;
&lt;p&gt;fig 3 - A very low granularity covering of the entire United Kingdom&lt;/p&gt;
&lt;p&gt;&lt;img alt="image" src="http://media.tumblr.com/8ff6c16d3c9c27fd8090c9dc84573efd/tumblr_inline_mn4hg3y3Ml1qz4rgp.png"/&gt;&lt;/p&gt;
&lt;p&gt;fig 4 - A fairly granular covering of the A4 around Trafalgar Square. &lt;/p&gt;
&lt;p&gt;Each cell in these coverings is now added to a standard B-tree index, with a key that is easily calculable by the location on surface of the sphere - more granular(smaller) cells will have the same prefix as a larger cell that occupies the same area of the surface of the sphere.&lt;/p&gt;
&lt;h4&gt;Intersection &amp;amp; Within searches&lt;/h4&gt;
&lt;p&gt;Finding geometries that may be intersecting or within a search polygon becomes as easy as generating a covering for the search specifier, and for each cell in that covering, query the B-tree for any geometries that interact with these cells. Once the list of possibly interacting geometries has been retrieved from the index, each geometry in checked in turn to see if it should be included in the result set.&lt;/p&gt;
&lt;h4&gt;Near searches&lt;/h4&gt;
&lt;p&gt;The near search provided by the &lt;code&gt;$near&lt;/code&gt; operator is implemented by doing &lt;code&gt;$within&lt;/code&gt; searches on concentrically growing donuts (circular polygons with with circular holes).&lt;/p&gt;
&lt;p&gt;&lt;img alt="image" src="http://media.tumblr.com/125360009ea4a7260c418fd39c2fd47b/tumblr_inline_mn4hgsoDGr1qz4rgp.gif"/&gt; &lt;/p&gt;
&lt;p&gt;We encourage user feedback and testing on these new Geo features and are excited to see what the community builds.&lt;/p&gt;

&lt;p&gt;Map images ⓒ &lt;a href="http://openstreetmap.org/copyright"&gt;OpenStreetMap&lt;/a&gt; contributors, licensed under the Creative &lt;a href="http://creativecommons.org/licenses/by-sa/2.0/"&gt;Commons Attribution-ShareAlike 2.0&lt;/a&gt; license (CC-BY-SA). &lt;/p&gt;
&lt;p&gt;Map data ⓒ &lt;a href="http://openstreetmap.org/copyright"&gt;OpenStreetMap&lt;/a&gt; contributors, licensed under the &lt;a href="http://opendatacommons.org/licenses/odbl/"&gt;Open Data Commons Open Database License&lt;/a&gt; (ODbL).&lt;/p&gt;</description><link>http://blog.mongodb.org/post/50984169045</link><guid>http://blog.mongodb.org/post/50984169045</guid><pubDate>Tue, 21 May 2013 08:00:00 -0400</pubDate><category>geospatial</category><category>MongoDB</category><category>2dsphere</category><category>geojson</category><category>bson</category></item><item><title>MongoDB, build parties, and deploying your web application at 11am on a Wednesday</title><description>&lt;p&gt;&lt;em&gt;This is a guest post by Sean Reilly. Release your applications with MongoDB more often and get closer to the ultimate goal of deploying applications anytime and why not at 11am on Wednesday mornings?&lt;/em&gt;&lt;/p&gt;
&lt;h4&gt;What you will learn&amp;#8230;&lt;/h4&gt;
&lt;p&gt;This article explores how to make use of MongoDB characteristics in order to avoid the downtime traditionally required by migration scripts in the SQL world. This is in order to get closer to the goal of being able to deploy applications with no downtime.&lt;!-- more --&gt;&lt;/p&gt;
&lt;h4&gt;What you should know&lt;/h4&gt;
&lt;p&gt;The basics of MongoDB&lt;/p&gt;
&lt;p&gt;Many software developers reading this article will be familiar with the concept of the “build party”. For those who aren’t familiar with the term, this story should explain:&lt;/p&gt;
&lt;p&gt;In another life, I worked for a medium sized startup in Canada, as part of a web application development team. The product had a typical software stack for the mid 2000’s — a monolithic application written in managed code (c# in this case) deployed to a cluster of application servers, backed by a fairly large relational database. As we were actively developing the product, we released new versions of it quite often.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;&lt;span&gt;Our development team was in the middle of North America (time zone: UTC-6). As the application became more popular throughout North America, we had to account for users on the West Coast (UTC-8), and the East Coast (UTC-5). This shrunk our “maintenance window” to approximately 9 hours. Then the application started winning users in the United Kingdom (UTC-0). This caused moved the end of our maintenance window pretty drastically. Then we started getting traction in Hawaii (UTC-10), and the &lt;em&gt;beginning &lt;/em&gt;of our maintenance window moved drastically.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The details of the product aren’t particularly relevant, except that the target market was businesses, so we originally started with a fairly wide “maintenance window”. The policy was that our “maintenance window” (the times when we could deploy new releases of the application) was “outside business hours”, which was defined as approximately 8pm-8am. So, a few times each month we would have some developers stay late, order some pizza, put the application into maintenance mode, and deploy the new version of the web application. This is the build party.&lt;/p&gt;
&lt;p&gt;Originally, our maintenance window gave us plenty of time to do whatever we needed to deploy the app, without inconveniencing our users. And for a few years, this pattern served us fairly well. Then our application became more popular. Popularity in terms of sheer numbers of users was never a problem — what hurt us was when we gained a significant user base across different time zones.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;&lt;span&gt;Then Australian users (UTC+10) started to complain when we put the application into maintenance mode in the middle of their day.&lt;/span&gt;&lt;/p&gt;
&lt;p class="MsoNormal"&gt;&lt;span&gt;Various discussions and strategies were considered. In the end we settled on a maintenance window from 2:30am-5am local time; performing deployments between the end of business in Hawaii, and the beginning of business on the eastern edge of Australia.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;span&gt;If you can release a new version of your software Wednesday morning at 11am without inconveniencing your customers, you can release it whenever you want.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;At the same time, our organisation was becoming more Agile, which means that the natural cadence for us to release a new version of the app was every sprint — in our case, weekly.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;&lt;span&gt;This is when I realised that the best time to release a web application to production is &lt;strong&gt;Wednesday morning at 11am&lt;/strong&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;So we were faced with the spectre of having a small team of developers come into the office after midnight, prepping for a time sensitive operation that started downtime at 2:30am, and absolutely needed to be done by 4:30am (if it wasn’t, we had to rollback the deployment in time for the end of our maintenance window at 5am). Every week.&lt;/p&gt;
&lt;p&gt;This is when I realised that the best time to release a web application to production is Wednesday morning at 11am.&lt;/p&gt;
&lt;p&gt;That may sound crazy, but there are a few reasons why:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Wednesday is the day of the week least likely to be a holiday. When your team is located within a single time zone, by 11am everyone on the team should be in the office, well caffeinated, with morning email checks and standup meetings out of the way. Should something should go wrong with a release, Wednesday at 11am gives you great odds that the entire team will be on hand to help out.&lt;/li&gt;
&lt;li&gt;Developers are just as happy to eat pizza at 11am as they are between 2:30-5am, and despite what many of them will claim, they do better work when they are not sleep deprived.&lt;/li&gt;
&lt;li&gt;If you can release a new version of your software Wednesday morning at 11am without inconveniencing your customers, you can release it whenever you want.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Our original “maintenance window” was based on the idea that a release usually required the application to be taken offline. If an upgrade could be performed without users noticing, then we could do it whenever we liked (within reason).&lt;/p&gt;
&lt;p&gt;As the years went by, our deployment process became quite sophisticated. Rolling deploys to application servers behind a load balancer eliminated the possibility of broken pages when an individual server was upgraded, and automating the process kept it as quick as possible. The one hurdle that we never managed to overcome was the downtime imposed by database migration scripts.&lt;/p&gt;
&lt;p&gt;All SQL databases divide their language into two kinds of statements. Data Modification Language (or DML) are the traditional INSERT, SELECT, UPDATE, and DELETE statements. Data Definition Language (or DDL) are the statements like ALTER TABLE that are used to modify the database schema. While DML operations can be written to affect a single row at a time, DDL operations by their very nature affect (and usually exclusively lock) an entire table at once. It is impossible for a SELECT statement to successfully return data while an ALTER TABLE statement is executing. If your migration script is unlucky enough to have to run an ALTER TABLE statement on a table that contains several million rows, that entire table will be unavailable for quite so&lt;span&gt;Fast forward several years, into the brave new NoSQL world where products like MongoDB are available, and we now have other options. Despite some of the hype around NoSQL products being “schema less”, in most cases where you are writing software that uses MongoDB you will want to enforce constraints on how your data is stored. There are, however, some significant differences:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;MongoDB itself is usually unaware of the schema requirements for a table or collection — schema is now usually enforced by the application.&lt;/li&gt;
&lt;li&gt;2. It is possible for different documents in a single MongoDB collection to each conform to a different schema. We can take advantage of these differences to avoid the downtime required by a SQL migration script, by having our application migrate data itself without downtime.&lt;/li&gt;
&lt;/ol&gt;&lt;h4&gt;Data Migration as a responsibility of the Application&lt;/h4&gt;
&lt;p&gt;With this pattern, when a new version of the application is released, it is responsible for migrating individual documents one at a time from the previous schema, to the new schema. This should happen naturally, as each document is loaded by the application by normal user action.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Design the application so that each collection is only accessed from a single class (or highly cohesive set of classes). The repository pattern is a good example of how to achieve this. (&lt;a href="http://martinfowler.com/eaaCatalog/repository.html"&gt;http://martinfowler.com/eaaCatalog/repository.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;When the class retrieves data from the MongoDB collection, documents that conform to the new schema are loaded normally. Documents that conform to the previous schema are transparently migrated to the new format, before being converted into objects. This behaviour can (and should) be unit tested to ensure it is working as expected.&lt;/li&gt;
&lt;li&gt;When the class saves data back to the MongoDB collection, documents are always written in the new format.&lt;/li&gt;
&lt;li&gt;Cross table migration issues such as foreign keys and joins are not a problem. Since MongoDB doesn’t have these concepts, they will not trip you up.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;span&gt;With this pattern, data migration will start to happen automatically as soon as the application accesses data. The change will be invisible to the users of the application, and the most frequently used (or updated) data will be migrated first. As the new version of the application is used, more and more data will be upgraded, although it is likely that not all data will be migrated right away&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Once the application deployment is complete, you can wait for all of the documents to be migrated naturally, or you can migrate leftover data in a background thread. I suggest at least considering this option, as in the long term it will become onerous if you have to support many different schema versions for documents in the same collection.&lt;/p&gt;
&lt;h4&gt;Special cases that require a little more attention:&lt;/h4&gt;
&lt;p&gt;When the migration edits the primary key of the document: In a normal migration operation, data is loaded, the format is changed, and data is saved back to the same place. When the primary key is changed as part of a migration, the document is also *moved*. This means that the new document needs to be saved, and the old version must be deleted. It’s also necessary to check in two places in order to be sure that a document does not exist. Your code will need to issue two find operations (or perhaps one, with an $or clause) when retrieving documents to be sure to catch documents that have (and have not) been migrated already.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Expensive migrations:&lt;/strong&gt;&lt;/p&gt;
&lt;p class="MsoNormal"&gt;&lt;span&gt;Some migration operations (hopefully most) are so cheap that they can be performed over and over at very little cost. Renaming a field is the classic example of this sort of operation. In these cases, migrating whenever a document is loaded is fine, even if the document isn’t saved — the cost of performing the migration multiple times isn’t an issue. However, some migrations might be more expensive. A migration might need to call an external service or perform some other costly or complicated operation. In these cases, it’s probably best to save the migrated document back to &lt;em&gt;MongoDB&lt;/em&gt; immediately.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Incompatible unique indexes: Unique indexes are a valuable tool for maintaining data integrity in MongoDB. However, unique indexes are problematic when a migration renames document fields, as a document missing the fields has an implied field value of NULL, and only one document is allowed the value NULL with a unique index. Fortunately, MongoDB introduced the concept of sparse indexes in version 1.8, and unique indexes can also be sparse. With a sparse unique index, documents that don’t contain the indexed field at all are exempt from the unique requirement. Sparse Unique indexes are a valuable tool for document-at-a-time migration solutions.&lt;/p&gt;
&lt;p&gt;Expensive migrations: Some migration operations (hopefully most) are so cheap that they can be performed over and over at very little cost. Renaming a field is the classic example of this sort of operation. In these cases, migrating whenever a document is loaded is fine, even if the document isn’t saved — the cost of performing the migration multiple times isn’t an issue. However, some migrations might be more expensive. A migration might need to call an external service or perform some other costly or complicated operation. In these cases, it’s probably best to save the migrated document back to MongoDB immediately.&lt;/p&gt;
&lt;p&gt;Map/Reduce or Aggregation Framework operations: Since these operations occur within the database itself, they cannot take advantage of a layer of the application transparently migrating documents so that the entire collection has the same schema. In a future release, MongoDB might include computed views that will make this situation easier to deal with, but until then the best solution is probably to perform the operation twice: once with the data that is still in the legacy schema, and once with the data that has been converted into the new schema. Then perform a final aggregation of the two outputs into a single result in your application code.&lt;/p&gt;
&lt;p&gt;One final tip that can make this pattern easier is to use a schemaVersion field. When the collection can have documents that conform to many different schema, each document should track which schema it conforms to.&lt;/p&gt;
&lt;p&gt;So a document that looks like this before it is migrated:&lt;/p&gt;
&lt;pre&gt;{
    “_id”: 12345,
    “name”: “Sean”,
    “lastName”: “Reilly”
}
&lt;/pre&gt;
&lt;p&gt;might look like this after migration:&lt;/p&gt;
&lt;pre&gt; {
    “_id”: 12345,
    “name”:  {
         “first”: “Sean”,
         “last”: “Reilly”
    },
    schemaVersion: 1
&lt;/pre&gt;
&lt;p class="MsoNormal"&gt;&lt;span&gt;This pattern seems like more work, and at first, it is. But it can free you from a number of headaches over time, and the benefits are clear:&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;If the current version of your application doesn’t have a schemaVersion field in every document, then treat the absence of the field as an implied version 0. This makes it very simple to find non-migrated documents, and might be especially valuable with Map/Reduce or Aggregation Framework querie&lt;/p&gt;
&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;span&gt;Avoid the IO and CPU load of a massive migration that loads and migrates an entire collection at once.&lt;/span&gt;&lt;/li&gt;
&lt;li class="MsoNormal"&gt;&lt;span&gt;Avoid downtime due to “stop the world” migration scripts.&lt;/span&gt;&lt;/li&gt;
&lt;li class="MsoNormal"&gt;&lt;span&gt;Migration responsibilities are part of the application itself, and can even be unit tested!&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;I hope that this is a pattern that you will find useful, and even more than that, I hope that it’s a pattern that will save your customers from maintenance windows, save your development team from some sleepless nights, and allow you to release your applications more often… and get closer to the ultimate goal of deploying applications at 11am on Wednesday mornings.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Sean Reilly is a software developer and consultant for Equal Experts, one of the fastest growing technology companies in the UK. He specialises in lightweight, Agile, enterprise web application development, and has been using MongoDB in anger since version 1.4.&lt;/em&gt;&lt;/p&gt;</description><link>http://blog.mongodb.org/post/50303776825</link><guid>http://blog.mongodb.org/post/50303776825</guid><pubDate>Sun, 12 May 2013 20:42:00 -0400</pubDate><category>agile</category><category>software development</category><category>release schedule</category><category>releases</category></item><item><title>ODBC Connector for MongoDB </title><description>&lt;p&gt;&lt;em&gt;This is a guest post by &lt;a href="http://cs.nyu.edu/webapps/content/academic/graduate/msis" target="_blank"&gt;NYU Information Systems (MSIS)&lt;/a&gt; Graduate students Kyle Galloway, Pravish Sood and Dylan Kelemen.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We are pleased to announce &lt;a href="https://github.com/NYUITP/sp13_10g" target="_blank"&gt;the Mongo-ODBC project&lt;/a&gt;. As NYU MSIS students in Courant Institute&amp;#8217;s Information Technology Projects course, we are working under the guidance of 10gen and our Professor Evan Korth to develop an &lt;a href="http://en.wikipedia.org/wiki/ODBC" target="_blank"&gt;ODBC (Open-Database-Connectivity)&lt;/a&gt; driver for MongoDB.&lt;/p&gt;
&lt;p&gt;ODBC was created in order to facilitate the movement of data between applications with different file structures and although it is not as popular as it once was, in part due to more flexible alternatives like MongoDB, but many programs maintain ODBC compliance. The goal of our project is to create an ODBC driver that supports the ODBC functions that can be carried out on MongoDB. This will allow users of programs that don&amp;#8217;t yet offer MongoDB support some access to data in MongoDB databases. We believe this will particularly useful for new users and those dependent on programs like Excel and Tableau for simple business analysis reporting.&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;Developing the driver has presented some interesting and at times frustrating issues, many but not all of which are due to the fundamental differences between relational and document-based databases. At the moment we are working on parsing SQL where statements into mongo queries, mapping SQL statements to MongoDB c++ driver functions and handling the ODBC return format specifications.&lt;/p&gt;
&lt;p&gt;The NYU MSIS program is composed of course work on the core concepts of computing and business in order to provide students the knowledge necessary for successful careers in technically demanding management roles. Information Technology Projects (ITP) is the final piece of our program, where students work to apply their technical skills in a practical team-oriented context to build real world IT solutions for businesses, government agencies, or non-profit organizations.&lt;/p&gt;
&lt;p&gt;This is the third collaboration between the &lt;a href="http://cs.nyu.edu/webapps/content/academic/graduate/msis" target="_blank"&gt;NYU Information Systems Master&amp;#8217;s program&lt;/a&gt; and 10gen. Prior student groups worked with 10gen on the &lt;a href="https://github.com/mongodb/mongo-hadoop" target="_blank"&gt;MongoDB adapter for Hadoop&lt;/a&gt; and the &lt;a href="https://github.com/mongodb/mongo-disco" target="_blank"&gt;MongoDB Disco Adapter&lt;/a&gt;. We are excited to be working on an open source project with 10gen and look forward to continuing the successful cooperation between 10gen and &lt;a href="http://www.cs.nyu.edu/web/index.html" target="_blank"&gt;NYU&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="https://github.com/NYUITP/sp13_10g" target="_blank"&gt;&lt;em&gt;Find the MongoDB ODBC Project on Github&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description><link>http://blog.mongodb.org/post/49852036514</link><guid>http://blog.mongodb.org/post/49852036514</guid><pubDate>Tue, 07 May 2013 08:59:00 -0400</pubDate><category>odbc</category></item><item><title>The MEAN Stack: MongoDB, ExpressJS, AngularJS and Node.js</title><description>&lt;p&gt;&lt;em&gt;This is a guest post from Valeri Karpov, a MongoDB Hacker and co-founder of &lt;a href="http://ascotproject.com/" target="_blank"&gt;the Ascot Project&lt;/a&gt;. &lt;/em&gt;&lt;/p&gt;
&lt;p&gt;A few weeks ago, a friend of mine asked me for help with PostgreSQL. As someone who&amp;#8217;s been blissfully SQL-­free for a year, I was quite curious to find out why he wasn’t just using MongoDB instead. It turns out that he thinks MongoDB is too difficult to use for a quick weekend hack, and this couldn’t be farther from the truth. I just finished my second 24 hour hackathon using Mongo and NodeJS (&lt;a href="http://francescak.me/blog/2013/04/09/fintech-hackathon-recap/" target="_blank"&gt;the FinTech Hackathon&lt;/a&gt; co­sponsored by &lt;a href="http://10gen.com" target="_blank"&gt;10gen&lt;/a&gt;) and can confidently say that there is no reason to use anything else for your next hackathon or REST API hack.&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;First of all, there are huge advantages to using a uniform language throughout your stack. My team uses a set of tools that we affectionately call the MEAN stack:­ MongoDB, &lt;a href="http://expressjs.com/" target="_blank"&gt;ExpressJS&lt;/a&gt;, &lt;a href="http://angularjs.org/" target="_blank"&gt;AngularJS&lt;/a&gt;, and &lt;a href="http://nodejs.org/" target="_blank"&gt;Node.js&lt;/a&gt;. By coding with Javascript throughout, we are able to realize performance gains in both the software itself and in the productivity of our developers. With MongoDB, we can store our documents in a JSON-­like format, write JSON queries on our ExpressJS and NodeJS based server, and seamlessly pass JSON documents to our AngularJS frontend. Debugging and database administration become a lot easier when the objects stored in your database are essentially identical to the objects your client Javascript sees. Even better, somebody working on the client side can easily understand the server side code and database queries; using the same syntax and objects the whole way through frees you from having to consider multiple sets of language best practices and reduces the barrier to entry for understanding your codebase. This is especially important in a hackathon setting: the team may not have much experience working together, and with such little time to integrate all the pieces of your project, anything that makes the development process easier is gold.&lt;/p&gt;
&lt;p&gt;Another big reason to go with MongoDB is that you can use it in the same way you would a MySQL database (at least at a high level). My team likes to describe MongoDB as a “gateway drug” for NoSQL databases  because it is so easy to make the transition from SQL to MongoDB. I wish someone had told me this when I first starting looking into NoSQL databases, because it would have saved me a lot of headaches. Like many people, I was under the impression that CouchDB would be easier to use, while the performance improvements from MongoDB were something I could take advantage only once I had gotten my feet wet with CouchDB. Instead CouchDB ended up being much more difficult to work with than I anticipated, largely because it uses custom Map­Reduce functions to query data, rather than the more traditional SQL queries I  was used to. When I finally switched I was surprised to find that with MongoDB I could still write queries and build indices; the only difference is that the queries are written in JSON and query a flexible NoSQL database.&lt;/p&gt;
&lt;p&gt;As a NoSQL database, MongoDB also allows us to define our schema entirely on the code side. With a No SQL database you’re faced with the inescapable fact that the objects in your database are stored in a format that is unusable by your front­end and vice versa. This wastes precious time and mental energy when you inevitably run into a data issue or need to do some database administration. For example, if you change your ActiveRecord schema in Ruby on Rails, you have to run the “rake” command to make sure your SQL columns stay in sync with your schemas. This is a clear violation of the age­-old programming principle D.R.Y.­ (Don’t Repeat Yourself). In contrast, MongoDB doesn’t care what format the documents in your collections take (for the most part anyway). This means that you spend a lot less time worrying about schema migrations, because adding or removing data items from your schema doesn’t really require you to do anything on the database side.&lt;/p&gt;
&lt;p&gt;At this point I should note that to get the most out of MongoDB in your MEAN stack, you’re going to want to take advantage of &lt;a href="http://mongoosejs.com/" target="_blank"&gt;MongooseJS&lt;/a&gt;. Mongoose is a schema and general usability tool for Node that lets you use MongoDB while being as lazy as you want. For example, with Mongoose we can define a schema in JSON:&lt;/p&gt;
&lt;pre&gt; var UserSchema = new Mongoose.Schema({ username : { type : String, validate: /\S+/, index : { unique : true } }, profile : {
name : { first : { type : String, default : “” } last : { type : String, default : “” }
} }
});
&lt;/pre&gt;
&lt;p&gt;We can then create a model by mapping our schema to our MongoDB collection:&lt;/p&gt;
&lt;pre&gt; var User = db.model('users', UserSchema);&lt;/pre&gt;
&lt;p&gt;For all of the Ruby on Rails + ActiveRecord fans out there, note that this User object we’ve created above now allows us easy access to the basic features you would normally associate with ActiveRecord. For example, we can do thing like:&lt;/p&gt;
&lt;pre&gt;User.findOne({ username : 'vkarpov' }, function(error, user) { /* user is either undefined or a user with username vkarpov */ });&lt;br/&gt;&lt;br/&gt; User.findOne({ _id : req.params.id }, function(error, user) { /* user with ID defined by the hex string in req.params.id */ });
&lt;br/&gt;User.find({ 'profile.name.first' : 'Valeri' }, function(error, users) { /* users is a list with users with first name Valeri */ });
&lt;br/&gt;var user = new User({ username : 'vkarpov' }); user.save(function(error, user) { /* Saves user with default values for profile.name.first and .last into 'users' collection */ });
&lt;br/&gt;var user2 = new User({ username : 'v karpov' }); user2.save(function(error, user) { /* Error – regular expression validation for username failed */ });
&lt;/pre&gt;
&lt;p&gt;Another powerful tool that MongoDB and MongooseJS provide is the ability to nest schemas. You&amp;#8217;ll notice that in the User schema above we have “profile” and “name” objects that are part of a nested schema. This is a simple and useful design choice that illustrates how powerful nested schemas can be. Suppose that we want to give our user the ability to edit their first and last name, but not their username. Assuming the website has a /profile route where our user can the change first and last names, the Javascript front­end will get a JSON object as the result of a call to User.findOne on the backend. After the user modifies their profile, the front­end makes a POST request to /profile.json with the user object in JSON as the body. Now on the backend (using ExpressJS syntax) we can simply use:&lt;/p&gt;
&lt;pre&gt;function(req, res) { user.profile = req.body.profile; user.save.function(error, user) { res.redirect('/profile'); }); } }&lt;/pre&gt;
&lt;p&gt;That’s it. Mongoose takes care of validating of the profile information, so we don&amp;#8217;t have to change the POST /user.json route if we change the User schema, and there is no way the username field can be modified. We could do the same thing when using Ruby on Rails and ActiveRecord, but this would require having a separate Profile schema in a separate table, meaning that among other things we&amp;#8217;d incur a performance penalty because of the extra underlying join operation.&lt;/p&gt;
&lt;p&gt;MongoDB is the bedrock of our MEAN stack, and you should strongly consider using it for your next project. You can write your entire stack in one language, have schemas for ease of use on top of a flexible and performant NoSQL database, and nest schemas when you don&amp;#8217;t truly need to have separate collections. All of this adds up to you spending more of your precious hackathon hours building the other cool aspects of your product and less time figuring out how your objects translate between different levels of the stack.&lt;/p&gt;
&lt;p&gt;By the way, MongoDB and MEAN are useful well beyond hackathons­ we use this approach for all of our commercial products, most recently &lt;a href="http://www.ascotproject.com/" target="_blank"&gt;The Ascot Project&lt;/a&gt;. Want to read more about the MEAN stack or how we use MongoDB? Check out &lt;a href="http://thecodebarbarian.wordpress.com/" target="_blank"&gt;my development blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Check out the Ascot Project at the next &lt;a href="http://www.meetup.com/New-York-MongoDB-User-Group/events/115028512/" target="_blank"&gt;MongoDB User Group in New York City&lt;/a&gt;. &lt;/em&gt;&lt;/p&gt;</description><link>http://blog.mongodb.org/post/49262866911</link><guid>http://blog.mongodb.org/post/49262866911</guid><pubDate>Tue, 30 Apr 2013 11:49:00 -0400</pubDate><category>nodejs</category><category>mongodb</category><category>angularjs</category><category>expressjs</category><category>mean stack</category><category>prototype</category><category>development</category><category>agile</category><category>agile development</category></item><item><title>10 questions to ask (and answer) when hosting MongoDB on AWS</title><description>&lt;p&gt;This is a guest post from &lt;em&gt;&lt;span&gt;Dharshan Rangegowda, founder of Scalegrid, creators of MongoDirector. This originally appeared on the &lt;a href="http://blog.mongodirector.com/10-questions-to-ask-and-answer-when-hosting-mongodb-on-aws/" target="_blank"&gt;MongoDirector blog&lt;/a&gt;. &lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;/em&gt;&lt;span&gt;Are you hosting your production MongoDB instances on Amazon AWS? At&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;a href="http://www.mongodirector.com/"&gt;MongoDirector.com&lt;/a&gt;&lt;span&gt;we manage hundreds of production MongoDB instances on AWS and have learnt a few things along the way. Here are a set of 10 questions you need to ask yourself and answer as you continue to manage your deployment. Almost all of the information below is applicable to other cloud service providers as well.&lt;!-- more --&gt;&lt;/span&gt;&lt;/p&gt;
&lt;div class="entry-content"&gt;
&lt;p&gt;&lt;strong&gt;1. What is your high availability (HA) plan?&lt;/strong&gt;&lt;br/&gt;If you are using a single instance it might be time to look at replica sets. When using replica sets take care to ensure that you deploy each replica set in a different availability zone.&lt;br/&gt;&lt;strong&gt;2. What is your disaster recovery (DR) plan?&lt;/strong&gt;&lt;br/&gt;If you are deploying all your entire replica sets in one region what happens when an entire AWS region melts down as it happened in April 2011? You might want to look into distributing your replica sets across regions.&lt;br/&gt;&lt;strong&gt;3. Have you tested your DR plan?&lt;/strong&gt;&lt;br/&gt;Simulate machine, network and disk failures to understand your cluster behavior under failure conditions. You don’t want to encounter your first failover in production.&lt;br/&gt;&lt;strong&gt;4. Are you backing up your instances?&lt;/strong&gt;&lt;br/&gt;Yes you need backups even if you have replica sets. Backups are necessary to deal with accidental erasure or when a new version of your app corrupts all your data. Make sure you are backing up regularly – preferably every few hours. You can backup from the secondary so that there is not a big impact on the primary.&lt;br/&gt;&lt;strong&gt;5.Do your backup’s work?&lt;/strong&gt;&lt;br/&gt;Have you tried a recovery? How long does it take to recover and have all your replica’s resync? If you don’t know the answer now is a good time to do a dry run and try an end to end recovery.&lt;br/&gt;&lt;strong&gt;6. How do you test application upgrades with production data? &lt;/strong&gt;&lt;br/&gt;One of the trickiest parts of application upgrade is testing with existing production data. Build a sequestered production like environment in which you can test your application upgrades with production data.&lt;br/&gt;&lt;strong&gt;7. What sort of EBS volumes are you using? &lt;/strong&gt;&lt;br/&gt;If you are using standard EBS volumes consider switching to the new provisioned IOPS volumes. It is a little bit more expensive but worth every penny. You will see a lot less fluctuation in IO performance and sleep easier through the night.&lt;br/&gt;&lt;strong&gt;8. Have you benchmarked the performance of your MongoDB instances? &lt;/strong&gt;&lt;br/&gt;If you haven’t you can benchmark using mongoperf or the Yahoo cloud server benchmark (YCSB). It’s good to know what your getting from your databases.&lt;br/&gt;&lt;strong&gt;9. How do you monitor your instances? &lt;/strong&gt;&lt;br/&gt;If you are not monitoring your instances now would be a good time to start. 10gen has a freely available Mongo Monitoring Service (MMS) that you can start using to monitor your mongo clusters.&lt;br/&gt;&lt;strong&gt;10. Are you exposing to your databases to the internet? &lt;/strong&gt;&lt;br/&gt;Today’s powerful CPUs and password cracking tools will crack open your password in a matter of a few hours. Use Amazon security groups to lock down access to your database and only give your front/mid tier access to the DB.&lt;/p&gt;
&lt;p&gt;At MongoDirector.com we have helped answer a number of these questions for our customers. We provide a single click deployment of Mongo replica sets across availability zones or regions. We have automated backup and recovery process. We only use provisioned IOPS and provide easy ways for our customers to benchmark Mongo and also simulate failover in mongo clusters. If you have other questions/comments or feature requests we would love to hear from you. You can email us at support@MongoDirector.com&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;span&gt;Dharshan Rangegowda is the founder of &lt;/span&gt;&lt;a href="http://www.scalegrid.net/" target="_blank"&gt;Scalegrid&lt;/a&gt;&lt;span&gt; which develops Database as a service solutions for Service providers. ScaleGrid owns and operates &lt;/span&gt;&lt;a href="http://www.mongodirector.com/" target="_blank"&gt;MongoDirector.com&lt;/a&gt;&lt;span&gt; - a &lt;/span&gt;&lt;a href="http://www.mongodirector.com/" target="_blank"&gt;Mongodb hosting&lt;/a&gt;&lt;span&gt; platform and management solution for the public and private cloud. Dharshan worked extensively with several public cloud infrastructure providers in his previous role at Microsoft&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;</description><link>http://blog.mongodb.org/post/48612482609</link><guid>http://blog.mongodb.org/post/48612482609</guid><pubDate>Mon, 22 Apr 2013 10:16:00 -0400</pubDate></item><item><title>New Hash-based Sharding Feature in MongoDB 2.4</title><description>&lt;p&gt;&lt;span&gt;Lots of MongoDB users enjoy the flexibility of custom shard keys in organizing a sharded collection’s documents. For certain common workloads though, like key/value lookup, using the natural choice of _id as a shard key isn’t optimal because default ObjectId’s are ascending, resulting in poor write distribution.  Creating randomized _ids or choosing another well-distributed field is always possible, but this adds complexity to an app and is another place where something could go wrong.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;To help keep these simple workloads simple, in 2.4 MongoDB added the new Hash-based shard key feature.  The idea behind Hash-based shard keys is that MongoDB will do the work to randomize data distribution for you, based on whatever kind of document identifier you like.  So long as the identifier has a &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Cardinality"&gt;&lt;span&gt;high cardinality&lt;/span&gt;&lt;/a&gt;&lt;span&gt;, the documents in your collection will be spread evenly across the shards of your cluster.  For heavy workloads with lots of individual document writes or reads (e.g. key/value), this is usually the best choice.  For workloads where getting ranges of documents is more important (i.e. find recent documents from all users), other choices of shard key may be better suited.&lt;!-- more --&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h4&gt;&lt;span&gt;Hash-based sharding in an existing collection&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;&lt;span&gt;To start off with Hash-based sharding, you need the name of the collection you’d like to shard and the name of the hashed &amp;#8220;identifier&amp;#8221; field for the documents in the collection.  For example, we might want to create a sharded &amp;#8220;mydb.webcrawler&amp;#8221; collection, where each document is usually found by a &amp;#8220;url&amp;#8221; field.  We can populate the collection with sample data using:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;shell$ wget &lt;a href="http://en.wikipedia.org/wiki/Web_crawler"&gt;http://en.wikipedia.org/wiki/Web_crawler&lt;/a&gt; -O web_crawler.html

shell$ mongo 

connecting to: /test

&amp;gt; use mydb

switched to db mydb

&amp;gt; cat("web_crawler.html").split("\n").forEach( function(line){

... var regex = /a href=\"([^\"]*)\"/; if (regex.test(line)) { db.webcrawler.insert({ "url" : regex.exec(line)[1] }); }})

&amp;gt; db.webcrawler.find()

...

{ "_id" : ObjectId("5162fba3ad5a8e56b7b36020"), "url" : "/wiki/OWASP" }

{ "_id" : ObjectId("5162fba3ad5a8e56b7b3603d"), "url" : "/wiki/Image_retrieval" }

{ "_id" : ObjectId("5162fba3ad5a8e56b7b3603e"), "url" : "/wiki/Video_search_engine" }

{ "_id" : ObjectId("5162fba3ad5a8e56b7b3603f"), "url" : "/wiki/Enterprise_search" }

{ "_id" : ObjectId("5162fba3ad5a8e56b7b36040"), "url" : "/wiki/Semantic_search" }

...

&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;Just for this example, we multiply this data ~x2000 (otherwise we won’t get any pre-splitting in the collection because it’s too small):&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&lt;span&gt;&amp;gt; &lt;/span&gt;&lt;span&gt;for (var i = 0; i &amp;lt; 12; i++) { db.webcrawler.find().toArray().forEach( function(doc) { db.webcrawler.insert({ url : doc.url }) }) }&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;Next, we create a hashed index on this field:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&lt;span&gt;&amp;gt; db.webcrawler.ensureIndex({ url : "hashed" })&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;As usual, the creation of the hashed index doesn’t prevent other types of indices from being created as well.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Then we shard the &amp;#8220;mydb.webcrawler&amp;#8221; collection using the same field as a Hash-based shard key:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&amp;gt; db.printShardingStatus(true)

--- Sharding Status ---

sharding version: {

  "_id" : 1,

  "version" : 3,

  "minCompatibleVersion" : 3,

  "currentVersion" : 4,

  "clusterId" : ObjectId("5163032a622c051263c7b8ce")

}

shards:

  {  "_id" : "test-rs0",  "host" : "test-rs0/nuwen:31100,nuwen:31101" }

  {  "_id" : "test-rs1",  "host" : "test-rs1/nuwen:31200,nuwen:31201" }

  {  "_id" : "test-rs2",  "host" : "test-rs2/nuwen:31300,nuwen:31301" }

  {  "_id" : "test-rs3",  "host" : "test-rs3/nuwen:31400,nuwen:31401" }

databases:

  {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

  {  "_id" : "mydb",  "partitioned" : true,  "primary" : "test-rs0" }

      mydb.webcrawler

          shard key: { "url" : "hashed" }

          chunks:

              test-rs0    4

          { "url" : { "$minKey" : 1 } } --&amp;gt;&amp;gt; { "url" : NumberLong("-4837773290201122847") } on : test-rs0 { "t" : 1, "i" : 3 }

          { "url" : NumberLong("-4837773290201122847") } --&amp;gt;&amp;gt; { "url" : NumberLong("-2329535691089872938") } on : test-rs0 { "t" : 1, "i" : 4 }

          { "url" : NumberLong("-2329535691089872938") } --&amp;gt;&amp;gt; { "url" : NumberLong("3244151849123193853") } on : test-rs0 { "t" : 1, "i" : 1 }

          { "url" : NumberLong("3244151849123193853") } --&amp;gt;&amp;gt; { "url" : { "$maxKey" : 1 } } on : test-rs0 { "t" : 1, "i" : 2 }

&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;you can see that the chunk boundaries are 64-bit integers (generated by hashing the &amp;#8220;url&amp;#8221; field).  When inserts or queries target particular urls, the query can get routed using the url hash to the correct chunk.&lt;/span&gt;&lt;/p&gt;
&lt;h4&gt;&lt;span&gt;Sharding a new collection&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;&lt;span&gt;Above we’ve sharded an existing collection, which will result in all the chunks of a collection &lt;/span&gt;&lt;span&gt;initially&lt;/span&gt;&lt;span&gt; living on the same shard.  The balancer takes care of moving the chunks around, as usual, until we get an even distribution of data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Much of the time though, it’s better to shard the collection &lt;/span&gt;&lt;span&gt;before&lt;/span&gt;&lt;span&gt; we add our data - this way MongoDB doesn’t have to worry about moving around existing data.  Users of sharded collections are familiar with &lt;/span&gt;&lt;a href="http://docs.mongodb.org/manual/tutorial/manage-chunks-in-sharded-cluster/#create-chunks-pre-splitting"&gt;&lt;span&gt;pre-splitting&lt;/span&gt;&lt;/a&gt;&lt;span&gt; - where empty chunks can be quickly balanced around a cluster before data is added.  When sharding a new collection using Hash-based shard keys, MongoDB will take care of the presplitting for you. Similarly sized ranges of the Hash-based key are distributed to each existing shard, which means that no initial balancing is needed (unless of course new shards are added).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Let’s see what happens when we shard a new collection webcrawler_empty the same way:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&amp;gt; sh.stopBalancer()

Waiting for active hosts...

Waiting for the balancer lock...

Waiting again for active hosts after balancer is off...

&amp;gt; db.webcrawler_empty.ensureIndex({ url : "hashed" })

&amp;gt; sh.shardCollection("mydb.webcrawler_empty", { url : "hashed" })

{ "collectionsharded" : "mydb.webcrawler_empty", "ok" : 1 }

&amp;gt; db.printShardingStatus(true)

--- Sharding Status ---

...

      mydb.webcrawler_empty

          shard key: { "url" : "hashed" }

          chunks:

              test-rs0    2

              test-rs1    2

              test-rs2    2

              test-rs3    2

          { "url" : { "$minKey" : 1 } } --&amp;gt;&amp;gt; { "url" : NumberLong("-6917529027641081850") } on : test-rs0 { "t" : 4, "i" : 2 }

          { "url" : NumberLong("-6917529027641081850") } --&amp;gt;&amp;gt; { "url" : NumberLong("-4611686018427387900") } on : test-rs0 { "t" : 4, "i" : 3 }

          { "url" : NumberLong("-4611686018427387900") } --&amp;gt;&amp;gt; { "url" : NumberLong("-2305843009213693950") } on : test-rs1 { "t" : 4, "i" : 4 }

          { "url" : NumberLong("-2305843009213693950") } --&amp;gt;&amp;gt; { "url" : NumberLong(0) } on : test-rs1 { "t" : 4, "i" : 5 }

          { "url" : NumberLong(0) } --&amp;gt;&amp;gt; { "url" : NumberLong("2305843009213693950") } on : test-rs2 { "t" : 4, "i" : 6 }

          { "url" : NumberLong("2305843009213693950") } --&amp;gt;&amp;gt; { "url" : NumberLong("4611686018427387900") } on : test-rs2 { "t" : 4, "i" : 7 }

          { "url" : NumberLong("4611686018427387900") } --&amp;gt;&amp;gt; { "url" : NumberLong("6917529027641081850") } on : test-rs3 { "t" : 4, "i" : 8 }

          { "url" : NumberLong("6917529027641081850") } --&amp;gt;&amp;gt; { "url" : { "$maxKey" : 1 } } on : test-rs3 { "t" : 4, "i" : 9 }

&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;As you can see, the new empty collection is already well-distributed and ready to use.  Be aware though - any balancing currently in progress can interfere with moving the empty initial chunks off the initial shard, balancing will take priority (hence the initial stopBalancer step). Like before, eventually the balancer will distribute all empty chunks anyway, but if you are preparing for a immediate data load it’s probably best to stop the balancer beforehand.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;That’s it - you now have a pre-split collection on four shards using Hash-based shard keys.  Queries and updates on exact urls go to randomized shards and are balanced across the cluster:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&amp;gt; db.webcrawler_empty.find({ url: "/wiki/OWASP" }).explain()

{

  "clusteredType" : "ParallelSort",

  "shards" : {

      "test-rs2/nuwen:31300,nuwen:31301" : [ ... ]

...

&lt;/pre&gt;
&lt;p&gt;However, the trade-off with Hash-based shard keys is that ranged queries and multi-updates must hit all shards:&lt;/p&gt;
&lt;pre&gt;&amp;gt; db.webcrawler_empty.find({ url: /^\/wiki\/OWASP/ }).explain()

{

  "clusteredType" : "ParallelSort",

  "shards" : {

      "test-rs0/nuwen:31100,nuwen:31101" : [ ... ],

     "test-rs1/nuwen:31200,nuwen:31201" : [ ... ],

     "test-rs2/nuwen:31300,nuwen:31301" : [ ... ],

     "test-rs3/nuwen:31400,nuwen:31401" : [ ... ]

...

&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;&amp;#8230;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h4&gt;&lt;span&gt;Manual chunk assignment and other caveats&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;&lt;span&gt;The core benefits of the new Hash-based shard keys are:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;
&lt;p&gt;&lt;span&gt;Easy setup of randomized shard key&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;span&gt;Automated pre-splitting of empty collections&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;span&gt;Better distribution of chunks on shards for isolated document writes and reads&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;span&gt;The standard split and moveChunk functions do work with Hash-based shard keys, so it’s still possible to balance your collection’s chunks in any way you like.  However, the usual “find” mechanism used to select chunks can behave a bit unexpectedly since the specifier is a document which is hashed to get the containing chunk.  To keep things simple, just use the new “bounds” parameter when manually manipulating chunks of hashed collections (or all collections, if you prefer):&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&amp;gt; use admin

&amp;gt; db.runCommand({ split : "mydb.webcrawler_empty", bounds : [{ "url" : NumberLong("2305843009213693950") }, { "url" : NumberLong("4611686018427387900") }] })

&amp;gt; db.runCommand({ moveChunk : "mydb.webcrawler_empty", bounds : [{ "url" : NumberLong("2305843009213693950") }, { "url" : NumberLong("4611686018427387900") }], to : "test-rs3" })

&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;There are a few other caveats as well - in particular with &lt;/span&gt;&lt;a href="http://docs.mongodb.org/manual/core/tag-aware-sharding/"&gt;&lt;span&gt;tag-aware sharding&lt;/span&gt;&lt;/a&gt;&lt;span&gt;.  Tag-aware sharding is a feature we released in MongoDB 2.2, which allows you to attach labels to a subset of shards in a cluster. This is valuable for &amp;#8220;pinning&amp;#8221; collection data to particular shards (which might be hosted on more powerful hardware, for example).  You can also tag ranges of a collection differently, such that a collection sharded by { &amp;#8220;countryCode&amp;#8221;&amp;#160;: 1 } would have chunks only on servers in that country.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Hash-based shard keys are compatible with tag-aware sharding.  As in any sharded collection, you may assign chunks to specific shards, but since the chunk ranges are based on the value of the randomized hash of the shard key instead of the shard key itself, this is usually only useful for tagging the whole range to a specific set of shards:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&lt;span&gt;&amp;gt; sh.addShardTag("test-rs2", "DC1")&lt;/span&gt;

sh.addShardTag("test-rs3", "DC1")&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;The above commands assign a hypothetical data center tag “DC1” to shards -rs2 and -rs3, which could indicate that -rs2 and -rs3 are in a particular location.  Then, by running:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&lt;span&gt;&amp;gt; sh.addTagRange("mydb.webcrawler_empty", { url : MinKey }, { url : MaxKey }, "DC1" )&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;we indicate to the cluster that the mydb.webcrawler_empty collection should only be stored on “DC1” shards.  After letting the balancer work:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&amp;gt; db.printShardingStatus(true)

--- Sharding Status ---

...

      mydb.webcrawler_empty

          shard key: { "url" : "hashed" }

          chunks:

              test-rs2    4

              test-rs3    4

          { "url" : { "$minKey" : 1 } } --&amp;gt;&amp;gt; { "url" : NumberLong("-6917529027641081850") } on : test-rs2 { "t" : 5, "i" : 0 }

          { "url" : NumberLong("-6917529027641081850") } --&amp;gt;&amp;gt; { "url" : NumberLong("-4611686018427387900") } on : test-rs3 { "t" : 6, "i" : 0 }

          { "url" : NumberLong("-4611686018427387900") } --&amp;gt;&amp;gt; { "url" : NumberLong("-2305843009213693950") } on : test-rs2 { "t" : 7, "i" : 0 }

          { "url" : NumberLong("-2305843009213693950") } --&amp;gt;&amp;gt; { "url" : NumberLong(0) } on : test-rs3 { "t" : 8, "i" : 0 }

          { "url" : NumberLong(0) } --&amp;gt;&amp;gt; { "url" : NumberLong("2305843009213693950") } on : test-rs2 { "t" : 4, "i" : 6 }

          { "url" : NumberLong("2305843009213693950") } --&amp;gt;&amp;gt; { "url" : NumberLong("4611686018427387900") } on : test-rs2 { "t" : 4, "i" : 7 }

          { "url" : NumberLong("4611686018427387900") } --&amp;gt;&amp;gt; { "url" : NumberLong("6917529027641081850") } on : test-rs3 { "t" : 4, "i" : 8 }

          { "url" : NumberLong("6917529027641081850") } --&amp;gt;&amp;gt; { "url" : { "$maxKey" : 1 } } on : test-rs3 { "t" : 4, "i" : 9 }

           tag: DC1  { "url" : { "$minKey" : 1 } } --&amp;gt;&amp;gt; { "url" : { "$maxKey" : 1 } }

&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;Again, it doesn’t usually make a lot of sense to tag anything other than the full hashed shard key collection to particular shards - by design, there’s no real way to know or control what data is in what range.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Finally, remember that Hash-based shard keys can (right now) only distribute documents based on the value of a single field.  So, continuing the example above, it isn’t directly possible to use &amp;#8220;url&amp;#8221; + &amp;#8220;timestamp&amp;#8221; as a Hash-based shard key without storing the combination in a single field in your application, for example:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&lt;span&gt;url_and_ts : { url : &amp;lt;url&amp;gt;, timestamp : &amp;lt;timestamp&amp;gt; }&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;The sub-document will be hashed as a unit.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If you&amp;#8217;re interested in learning more about Hash-based sharding, register for the &lt;a href="http://www.10gen.com/events/webinar/mongodb-2-4-demo-sharding" target="_blank"&gt;Hash-based sharding feature demo&lt;/a&gt; on May 2. &lt;/strong&gt;&lt;br/&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;</description><link>http://blog.mongodb.org/post/47633823714</link><guid>http://blog.mongodb.org/post/47633823714</guid><pubDate>Wed, 10 Apr 2013 14:34:00 -0400</pubDate></item><item><title>Deployment Best Practices: Monitor your resources </title><description>&lt;p&gt;&lt;span&gt;When you’re preparing a MongoDB deployment, you should try to understand how your application is going to hold up in production. It’s a good idea to develop a consistent, repeatable approach to managing your deployment environment so that you can minimize any surprises once you’re in production.&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span&gt;The best approach incorporates prototyping your setup, conducting load testing, monitoring key metrics, and using that information to scale your setup. The key part of the approach is to proactively monitor your entire system - this will help you understand how your production system will hold up before deploying, and determine where you’ll need to add capacity. Having insight into potential spikes in your memory usage, for example, could help put out a write-lock fire before it starts.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;To monitor your deployment you can use several different tools. 10gen provides a free, hosted monitoring service &lt;a href="http://www.10gen.com/products/mongodb-monitoring-service" target="_blank"&gt;MongoDB Monitoring Service (MMS&lt;/a&gt;) that provides a dashboard and gives you a view of the metrics from your entire cluster. Alternatively you can also build your own tools with nagios, munin or SNMP. Several tools are provided along with MongoDB that allow you to gain insight into the performance of your deployment.&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;strong&gt;mongostat: &lt;/strong&gt;&lt;span&gt;this utility will check the status of all running mongod and mongos instances and will capture and return counters of database operations. These include inserts, queries, updates, deletes, and cursors. mongostat will also show when you’re hitting page faults, and showcase your lock percentage. This typically means you’re running low on memory, are hitting write capacity or have a similar performance issue.&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;mongotop: &lt;/strong&gt;&lt;span&gt;this will track and report the read and write activity of your MongoDB instance on a collection basis. mongotop returns information each second by default, but you can force mongotop to return information less frequently by specifying a specific number: &lt;/span&gt;&lt;span&gt;&lt;code&gt;mongotop 20&lt;/code&gt; &lt;/span&gt;&lt;span&gt;will return values every 20 seconds. You should check that this read and write activity matches your application intention, and you’re not firing too many writes to the database at a time, reading too frequently from disk, or are exceeding your working set size.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;iostat: &lt;/strong&gt;&lt;span&gt;On Linux, use iostat to monitor your storage system performance. This will help identify any bottlenecks in your disk I/O and subsequently in your database. Metrics like %util will tell you the percentage of time your drive is being used, and avreq-sz will indicate the average request size. There are several others that may also be important to monitor for your deployment.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;span&gt;If you’re using MMS or another Monitoring service you should also closely monitor the following:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Op Counters:&lt;/strong&gt;&lt;span&gt; These include inserts, updates, deletes, reads, and cursor usage.&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;strong&gt;Resident Memory:&lt;/strong&gt; &lt;/span&gt;&lt;span&gt;You should always keep an eye on your memory allocation. Resident memory should always be lower than physical memory. If you go out of memory you’ll experience page faults and index misses and have much slower times on query returns.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Working set size: &lt;/strong&gt;Keep a close eye on your working set, which is the total body of data used by your application. For optimal performance, your active working set should fit into RAM. In MongoDB 2.4, there is a new &lt;a href="http://docs.mongodb.org/manual/reference/server-status/#server-status-workingset" target="_blank"&gt;working set analyzer&lt;/a&gt; which will help reveal when documents are being “paged out,” or removed from physical memory by the operating system. You can decrease your working set size by optimizing your queries and indexing patterns to prevent large scans, or plan to add larger RAM when you expect your working set to increase.&lt;/li&gt;
&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Queues: &lt;/strong&gt;&lt;span&gt;MongoDB’s concurrency model uses a readers-writer lock to provide simultaneous reads but exclusive access to a single write operation. Given that approach queues can often form behind a single writer, with those queues containing readers, writers or both. During lengthy write operations MongoDB will periodically yield to allow readers to get through in order to avoid starvation. Monitoring this metric along with “Lock Percentage” will give you an idea of the concurrency your deployment is seeing. If the “Lock Percentage” and the queues are trending upwards (e.g. spiking) then you may be dealing with contention within the database. Data model changes or “batch” operations can have a significant positive impact on concurrency.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;span&gt;Your testing period is critical to preparing your application for success. By monitoring these metrics closely during pre-launch, you’ll be better prepared for when your application hits heavy usage in the future. If you’re already in production, monitoring your current application usage with a tool like MMS will give you insight into production patterns. Going through your indexing patterns, CRUD behavior and indexes will help you better understand your applications flow for when there is a hiccup.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;For more insights into Deployment Best Practices register for Sandeep Parikh’s &lt;a href="http://www.10gen.com/webinar/deployment-best-practices" target="_blank"&gt;webinar on April 9&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Read the &lt;a href="http://info.10gen.com/rs/10gen/images/10gen-MongoDB_Operations_Best_Practices.pdf" target="_blank"&gt;Operations Best Practices White Paper&lt;/a&gt; for more insights into orchestrating a successful MongoDB deployment. &lt;/li&gt;
&lt;/ul&gt;</description><link>http://blog.mongodb.org/post/46504096409</link><guid>http://blog.mongodb.org/post/46504096409</guid><pubDate>Thu, 28 Mar 2013 09:37:00 -0400</pubDate></item><item><title>MongoDB 2.4 Javascript Changes</title><description>&lt;p&gt;The upcoming release of MongoDB 2.4 brings an exciting change to the JavaScript engine. Previously, MongoDB ran &lt;a href="https://developer.mozilla.org/en-US/docs/SpiderMonkey"&gt;Spidermonkey&lt;/a&gt; 1.7, but going forward, MongoDB will be running &lt;a href="https://developers.google.com/v8/"&gt;V8&lt;/a&gt;, the open-source high-performance JavaScript engine from Google. This means that from now on, whenever JavaScript is executed, V8 will be running the show.&lt;/p&gt;

&lt;p&gt;In this post we’ll examine the following primary impacts of this change:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;concurrency improvements&lt;/li&gt;
&lt;li&gt;modernized JavaScript implementation&lt;/li&gt;
&lt;li&gt;impacted features&lt;/li&gt;
&lt;/ol&gt;&lt;h3&gt;Concurrency improvements&lt;/h3&gt;

&lt;p&gt;Previously, every query/command that used the JS interpreter had to acquire a mutex, thus serializing all JS work. Now, with V8 we have improved concurrency by allowing each JavaScript job to run on a separate core.&lt;/p&gt;

&lt;p&gt;For example, if a user’s workload commonly involved 24 concurrent &lt;code&gt;$where&lt;/code&gt; queries (each from a unique client), and they have a server with 24 cores, they should expect query execution times to be reduced by (roughly) a factor of 24.&lt;!-- more --&gt;&lt;/p&gt;

&lt;h3&gt;Modernized JavaScript Implementation (ES5)&lt;/h3&gt;

&lt;p&gt;Today’s modern JavaScript environments like Google Chrome, Node.js, Mozilla Firefox, Microsoft Internet Explorer and Apple Safari all implement the 5th edition of &lt;a href="http://www.ecma-international.org/publications/standards/Ecma-262.htm"&gt;ECMAscript&lt;/a&gt;. This edition, abbreviated as &lt;a href="http://www.ecma-international.org/publications/standards/Ecma-262.htm"&gt;ES5&lt;/a&gt;, adds many new language features like standardized &lt;a href="http://www.ecma-international.org/ecma-262/5.1/#sec-15.12.1"&gt;JSON&lt;/a&gt;, &lt;a href="http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2"&gt;strict mode&lt;/a&gt;, &lt;a href="http://www.ecma-international.org/ecma-262/5.1/#sec-15.3.4.5"&gt;function.bind()&lt;/a&gt;, &lt;a href="http://www.ecma-international.org/ecma-262/5.1/#sec-15.4.4.16"&gt;array extensions&lt;/a&gt;, getters/setters and much more. With the switch to V8, MongoDB now also supports ES5, making it easier to develop and maintain migration scripts, MapReduce jobs, and more. Read the &lt;a href="http://docs.mongodb.org/manual/release-notes/2.4-javascript/"&gt;full release notes&lt;/a&gt; for more details.&lt;/p&gt;

&lt;h3&gt;Impacted features&lt;/h3&gt;

&lt;p&gt;Moving to V8 brings with it a few changes to be aware of when migrating applications to MongoDB version &amp;gt;= 2.4.&lt;/p&gt;

&lt;h4&gt;Additional Limitations for Map-Reduce and $where Operations&lt;/h4&gt;

&lt;p&gt;In MongoDB 2.4 a number of global functions and properties available in the shell, such as &lt;strong&gt;db&lt;/strong&gt;, are no longer available to &lt;code&gt;Map/Reduce/Finalize&lt;/code&gt;, &lt;code&gt;$where&lt;/code&gt; and &lt;code&gt;group&lt;/code&gt;. When upgrading to MongoDB 2.4, you will need to refactor your code if you are using any global shell functions or properties that are no longer available.&lt;/p&gt;

&lt;p&gt;The following &lt;strong&gt;are available&lt;/strong&gt; to &lt;code&gt;MapReduce&lt;/code&gt;, &lt;code&gt;group&lt;/code&gt;, and &lt;code&gt;$where&lt;/code&gt; in MongoDB 2.4:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;properties&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;args&lt;/li&gt;
&lt;li&gt;MaxKey&lt;/li&gt;
&lt;li&gt;MinKey&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;functions&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;assert()&lt;/li&gt;
&lt;li&gt;BinData()&lt;/li&gt;
&lt;li&gt;DBPointer()&lt;/li&gt;
&lt;li&gt;DBRef()&lt;/li&gt;
&lt;li&gt;doassert()&lt;/li&gt;
&lt;li&gt;emit()&lt;/li&gt;
&lt;li&gt;gc()&lt;/li&gt;
&lt;li&gt;HexData()&lt;/li&gt;
&lt;li&gt;hex_md5()&lt;/li&gt;
&lt;li&gt;isNumber()&lt;/li&gt;
&lt;li&gt;isObject()&lt;/li&gt;
&lt;li&gt;ISODate()&lt;/li&gt;
&lt;li&gt;isString()&lt;/li&gt;
&lt;li&gt;Map()&lt;/li&gt;
&lt;li&gt;MD5()&lt;/li&gt;
&lt;li&gt;NumberInt()&lt;/li&gt;
&lt;li&gt;NumberLong()&lt;/li&gt;
&lt;li&gt;ObjectId()&lt;/li&gt;
&lt;li&gt;print()&lt;/li&gt;
&lt;li&gt;sleep()&lt;/li&gt;
&lt;li&gt;Timestamp()&lt;/li&gt;
&lt;li&gt;UUID()&lt;/li&gt;
&lt;li&gt;version()&lt;/li&gt;
&lt;/ul&gt;&lt;h4&gt;Non-standard Spidermonkey features removed&lt;/h4&gt;

&lt;p&gt;Spidermonkey implemented several non-standard JavaScript language extensions not found in V8. These removed features were not documented so minimal impact is expected. To find out more about the differences between the V8 and Spidermonkey MongoDB implementations, please read &lt;a href="http://docs.mongodb.org/manual/release-notes/2.4-javascript/#removed-non-standard-spidermonkey-features"&gt;this&lt;/a&gt;.&lt;/p&gt;

&lt;h5&gt;With V8, MongoDB supports the ES5 implementation of Javascript with the following exceptions:&lt;/h5&gt;

&lt;p&gt;The following do not work on documents &lt;strong&gt;returned from MongoDB queries&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="https://jira.mongodb.org/browse/SERVER-8216"&gt;Object.seal()&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jira.mongodb.org/browse/SERVER-8223"&gt;Object.freeze()&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jira.mongodb.org/browse/SERVER-8215"&gt;Object.preventExtensions()&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jira.mongodb.org/browse/SERVER-8214"&gt;enumerable properties&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Additional detail is available in the &lt;a href="http://docs.mongodb.org/manual/release-notes/2.4-javascript/"&gt;release notes&lt;/a&gt;. Learn more about the 2.4 release in the upcoming &lt;a href="http://www.10gen.com/events/webinar/mongodb-24-webinar-series"&gt;webinar series&lt;/a&gt;.&lt;/p&gt;</description><link>http://blog.mongodb.org/post/46419580374</link><guid>http://blog.mongodb.org/post/46419580374</guid><pubDate>Wed, 27 Mar 2013 09:51:00 -0400</pubDate><category>mongodb</category><category>node.js</category><category>javascript</category><category>v8</category><category>spidermonkey</category><category>performance</category><category>high performance</category></item><item><title>MongoDB 2.4 Released</title><description>&lt;p&gt;&lt;span&gt;The MongoDB Engineering Team is pleased to announce the release of MongoDB 2.4. This is the latest stable release, following the September 2012 release of MongoDB 2.2. This release contains key new features along with performance improvements and bug fixes. We have outlined some of the key features below. For additional details about the release:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;span&gt;&lt;a href="http://docs.mongodb.org/manual/release-notes/2.4/" target="_blank"&gt;2.4 release notes&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.mongodb.org/downloads" target="_blank"&gt;&lt;span&gt;MongoDB Downloads&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.10gen.com/events/webinar/mongodb-24-webinar-series" target="_blank"&gt;&lt;span&gt;MongoDB 2.4 webinars&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;span&gt;Highlights of MongoDB 2.4 include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;span&gt;Hash-based Sharding&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;/span&gt;&lt;span&gt;Capped Arrays&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Text Search (Beta)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Geospatial Enhancements&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Faster Counts&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;/span&gt;&lt;span&gt;Working Set Analyzer&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;V8 JavaScript engine&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Security&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="http://docs.mongodb.org/manual/release-notes/2.4/#release-hashed-indexes"&gt;Hash-based Sharding&lt;/a&gt;:&lt;/strong&gt;&lt;span&gt; MongoDB 2.4 adds Hash-based Sharding, built on top of our range based sharding. Using a hashed shard key allows users to get a good distribution of load and data in a simple manner, in cases where documents are accessed randomly through the key space, or if the access patterns may not be totally predictable.&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="http://docs.mongodb.org/manual/release-notes/2.4/#limit-number-of-elements-in-an-array"&gt;Capped Arrays&lt;/a&gt;:&lt;/strong&gt;&lt;span&gt; Capped arrays declare a fixed size array inside of a document. On a $push operation, users can now specify a $slice modifier, which trims the array to the last N items. You can also specify a sort, which will first sort the array, and then apply the trim.&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="http://test.docs.10gen.cc/manual/release-notes/2.4/#text-search"&gt;Text Search&lt;/a&gt;: &lt;/strong&gt;&lt;span&gt;Text Search has been one of the all time most requested features in MongoDB. Text indexing will offer native, real-time text search with stemming and tokenization in 15 languages. For more details on Text Search and its implementation see the docs and &lt;/span&gt;&lt;a href="http://blog.mongodb.org/post/40513621310/mongodb-text-search-experimental-feature-in-mongodb"&gt;&lt;span&gt;blog post&lt;/span&gt;&lt;/a&gt;&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="http://docs.mongodb.org/manual/release-notes/2.4/#release-geospatial"&gt;Geo Capabilities&lt;/a&gt;&lt;/strong&gt;&lt;span&gt;&lt;strong&gt;:&lt;/strong&gt; MongoDB 2.4 introduces GeoJSON support, a more accurate spherical model and enhanced search including polygon intersection.  Currently 2dsphere supports the Point, LineString and Polygon GeoJSON shapes.  &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Faster Counts&lt;/strong&gt;&lt;span&gt;&lt;strong&gt;:&lt;/strong&gt; In many cases, counts in MongoDB 2.4 are an order of magnitude faster than previous versions. We made numerous optimizations to the query execution engine in order to improve common access patterns. One example is in a single b-tree bucket: if the first and last entry in the bucket match a count range, we know the middle keys do as well, thus we do not have to check them individually.&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="http://docs.mongodb.org/manual/release-notes/2.4/#changes-to-serverstatus-output-including-additional-metrics"&gt;Working Set Analyzer:&lt;/a&gt;&lt;/strong&gt;&lt;span&gt; Capacity planning is critical to running a MongoDB cluster. In MongoDB 2.4 we added a working set size analyzer, making it easy to measure the percentage of resources used. It will tell you how many unique pages the server has needed in the last 15 minutes, so that you can track usage over time. When the amount of data needed in 15 minutes is approaching RAM, its probably time to add more capacity to your cluster.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="http://docs.mongodb.org/manual/release-notes/2.4/#javascript-engine-changed-to-v8"&gt;New V8 Engine:&lt;/a&gt;&lt;/strong&gt;&lt;span&gt; MongoDB 2.4 changed the JavaScript engine used for MapReduce, $where and the shell. We have switched to V8, the JavaScript engine from Google Chrome, which improves concurrency.&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="http://docs.mongodb.org/manual/release-notes/2.4/#security-improvements"&gt;Security:&lt;/a&gt;&lt;/strong&gt;&lt;span&gt; MongoDB 2.4 two major security enhancements: Kerberos Authentication and Role Based Access Control. Kerberos is part of MongoDB Enterprise and allows integration with enterprise level user management systems. Role Based Access Control allows more fine grained privilege management.  &lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span&gt;There were hundreds (692) of improvements, so we encourage those interested to look at the &lt;/span&gt;&lt;a href="https://jira.mongodb.org/secure/IssueNavigator.jspa?reset=true&amp;amp;jqlQuery=project+%3D+SERVER+AND+fixVersion+in+%28%222.3.2%22,+%222.3.1%22,+%222.3.0%22,+%222.4.0-rc0%22,+%222.4.0-rc1%22,+%222.4.0-rc2%22,+%222.4.0-rc3%22%29+ORDER+BY+votes+DESC,+status+DESC,+priority+DESC"&gt;&lt;span&gt;changelog&lt;/span&gt;&lt;/a&gt;&lt;span&gt;. A great deal of important work was done that cannot all be put into one post, so please ask questions about other tickets.  Even with everything in MongoDB 2.4, there is still a lot to do, and the 10gen engineering team is already hard at work on MongoDB 2.6.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;strong&gt;&lt;a href="http://www.mongodb.org/downloads" target="_blank"&gt;Downloads&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;span&gt;&lt;a href="http://docs.mongodb.org/manual/release-notes/2.4" target="_blank"&gt;Release Notes&lt;/a&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;span&gt;&lt;a href="https://jira.mongodb.org/secure/IssueNavigator.jspa?reset=true&amp;amp;jqlQuery=project+%3D+SERVER+AND+fixVersion+in+%28%222.3.2%22,+%222.3.1%22,+%222.3.0%22,+%222.4.0-rc0%22,+%222.4.0-rc1%22,+%222.4.0-rc2%22,+%222.4.0-rc3%22%29+ORDER+BY+votes+DESC,+status+DESC,+priority+DESC" target="_blank"&gt;Full changelog&lt;/a&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;span&gt;For the full scoop on new features in MongoDB version 2.4, register for our &lt;a href="http://www.10gen.com/events/webinar/mongodb-24-webinar-series" target="_blank"&gt;webinar series&lt;/a&gt; on new features Text Search, Geo, Kerberos&lt;/span&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span&gt;We encourage you to provide feedback and testing. Thank you for your ongoing support of MongoDB.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;-Eliot and the MongoDB Engineering Team&lt;/p&gt;</description><link>http://blog.mongodb.org/post/45754637343</link><guid>http://blog.mongodb.org/post/45754637343</guid><pubDate>Tue, 19 Mar 2013 10:00:00 -0400</pubDate></item><item><title>MongoDB Tip: The touch Command</title><description>&lt;p&gt;MongoDB 2.2 introduced the touch command, which loads data from the data storage layer into memory. The touch command will load a collection’s documents, indexes or both into memory. This can be ideal to preheat a newly started server, in order to avoid page faults and slow performance once the server is brought into production. You can also use this when adding a new secondary to an existing replica set to ensure speedy subsequent reads. &lt;br/&gt;&lt;br/&gt;Note that while the touch command is running, a replica set member will enter into a RECOVERING state to prevent reads from clients. When the operation completes, the secondary will return to the SECONDARY(2) state. &lt;br/&gt;&lt;br/&gt;You invoke the touch command through the following syntax:&lt;/p&gt;
&lt;pre&gt;db.runcommand({ touch: “collection_name”, data: true, index: true})&lt;/pre&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;br/&gt;Here you indicate which collection to touch and whether or not you want both documents (data) and/or indexes to be loaded into memory. Index and data are both off by default, so you will need to indicate at least one as “true” to have any effect on your server. Otherwise you will see the following message: &lt;br/&gt;&lt;br/&gt;&lt;/p&gt;
&lt;pre&gt;db.test.runCommand(“touch”)&lt;br/&gt;&lt;br/&gt;        "ok" : 0,&lt;br/&gt;        "errmsg" : "must specify at least one of (data:true, index:true)&lt;/pre&gt;

&lt;p&gt;touch is non-blocking on a mongod process, so you can run it concurrent with other commands.&lt;/p&gt;
&lt;p&gt;&lt;br/&gt;For a full list of MongoDB commands, check out the Database commands in the &lt;a href="http://docs.mongodb.org/manual/reference/command/"&gt;MongoDB Manual&lt;/a&gt;&lt;/p&gt;</description><link>http://blog.mongodb.org/post/44706549534</link><guid>http://blog.mongodb.org/post/44706549534</guid><pubDate>Wed, 06 Mar 2013 09:39:00 -0500</pubDate></item><item><title>MongoDB and Hadoop: A Step-by Step Tutorial Using the Mortar Development Framework</title><description>&lt;p&gt;&lt;em&gt;The following is a guest post from Jeremy Karn. This article is excerpted from &amp;#8216;&lt;a href="http://blog.mortardata.com/post/43080668046/mongodb-hadoop-why-how"&gt;MongoDB + Hadoop: A Step-by-Step Tutorial&lt;/a&gt;&amp;#8217;. Jeremy is a cofounder at &lt;a href="http://mortardata.com/"&gt;Mortar Data&lt;/a&gt;, a Hadoop-as-a-service provider, and creator of mortar, an open source framework for data processing.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;People who are worried about scalability often find themselves looking at two tools: MongoDB for storing large amounts of data easily and Hadoop for processing that data. But a common question is: “How do I combine these two to really get the most out of my data?”&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;Here&amp;#8217;s a step-by-step tutorial that will get you up and running with MongoDB and Hadoop in a matter of minutes. And the best part about this tutorial is that at the end you&amp;#8217;ll be ready to jump right into using your own MongoDB data with Hadoop.&lt;/p&gt;
&lt;p&gt;For this tutorial you’ll be using &lt;a href="http://pig.apache.org/"&gt;Apache Pig&lt;/a&gt;, a high-level data flow language that compiles down into Hadoop MapReduce jobs. It was designed to be easy to learn and simple to write. If you’ve written SQL, Pig will feel familiar, it is like procedural SQL.&lt;/p&gt;
&lt;p&gt;To run your Hadoop jobs, you&amp;#8217;re going to use a free &lt;a href="http://www.mortardata.com/"&gt;Mortar&lt;/a&gt; account. &lt;a href="http://www.mortardata.com/"&gt;Mortar&lt;/a&gt; provides Hadoop as a service, which means you can run your jobs without worrying about how to set up and manage a multi-node Hadoop cluster.&lt;/p&gt;
&lt;p&gt;To get started, we&amp;#8217;ve already set up a small MongoDB instance on MongoLab, populated it with a random sampling of Twitter data from a single day (around 120,000 tweets), and created a read-only user for you.&lt;/p&gt;
&lt;p&gt;We&amp;#8217;ve also set up a public Github repo with a Mortar project that has three Pig scripts ready to run. Here&amp;#8217;s what you need to do:&lt;/p&gt;
&lt;p&gt;&lt;span&gt;If you don’t already have a free Github account - &lt;/span&gt;&lt;span&gt;&lt;a href="https://github.com/"&gt;create&lt;/a&gt;&lt;/span&gt;&lt;span&gt; one.  You’ll need a github username in step 4.&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;&lt;span&gt;Sign into (or &lt;/span&gt;&lt;span&gt;&lt;a href="https://app.mortardata.com/signup"&gt;create&lt;/a&gt;&lt;/span&gt;&lt;span&gt;) your free Mortar account.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;After you receive the confirmation email, log into Mortar at &lt;/span&gt;&lt;span&gt;&lt;a href="https://app.mortardata.com"&gt;&lt;a href="https://app.mortardata.com"&gt;https://app.mortardata.com&lt;/a&gt;&lt;/a&gt;&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://help.mortardata.com/#!/install_mortar_development_framework"&gt;Install&lt;/a&gt; the Mortar Development Framework: 
&lt;pre&gt;gem install mortar&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Clone the example git project and register it as a mortar project: &lt;br/&gt;&lt;/span&gt;
&lt;pre&gt;git clone git@github.com:mortardata/mongo-pig-examples.git&lt;/pre&gt;
&lt;pre&gt;cd mongo-pig-examples&lt;/pre&gt;
&lt;pre&gt;mortar register mongo-pig-examples&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;&lt;h4&gt;Script 1 - Characterize Collection&lt;/h4&gt;
&lt;p&gt;If you’re like most MongoDB users, you may not have a great sense of the different fields, data types, or values in your collection. We built characterize_collection.pig to deeply inspect your collection to extract that information.&lt;/p&gt;
&lt;p&gt;From the base directory of the mongo-pig-examples project you just cloned take a look at pigscripts/characterize_collection.pig. It loads all the data in the collection as a map, sends the map to Python (udfs/python/mongo_util.py) to gather a bunch of metadata, calculates some basic information about the collection, and then it writes the results out to an S3 bucket.&lt;/p&gt;
&lt;p&gt;To see this script in action let&amp;#8217;s run it on a 4 node Hadoop cluster. In your terminal (from the base directory of your mongo-pig-examples project) run:&lt;/p&gt;
&lt;pre&gt;mortar run characterize_collection --clustersize 4&lt;/pre&gt;
&lt;p&gt;This job will take about 10 minutes to finish. You can monitor the job&amp;#8217;s status on the command line or by going to &lt;a href="https://app.mortardata.com/jobs%C2%A0"&gt;https://app.mortardata.com/jobs &lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Once the job has finished, you&amp;#8217;ll receive an email with a link to your job results. Clicking on this link will bring you into the Mortar web app, where you can download the results from s3. The output is described at the top of the characterize_collection script but as an example you can scroll down the output and find:&lt;/p&gt;
&lt;pre&gt;…
user.is_translator	2	false	unicode	118806
user.is_translator	2	true	unicode	31
user.lang	26	en	unicode	114108
user.lang	26	es	unicode	3462
user.lang	26	fr	unicode	532
user.lang	26	pt	unicode	281
user.lang	26	ja	unicode	79
user.listed_count	398	0	int	73757
user.listed_count	398	1	int	18518
&lt;/pre&gt;
&lt;p&gt;Looking at the values for user.lang - we see that there are 26 unique values for the field in our dataset. The most common was “en” with 114108 occurrences, the next most common was “es” with 3462 occurrences, and so on. To see the full results without running the job you can view the output file here.&lt;/p&gt;
&lt;h4&gt;Script 2 - MongoDB Schema Generator&lt;/h4&gt;
&lt;p&gt;It can be tricky to properly declare MongoDB’s highly nested schemas in Pig. Now, Pig is graceful—it can roll without a schema, or with inconsistent, or incorrect schemas. But it’s easier to read and write your Pig code if you have a schema because it allows you (and the Pig optimizer) to focus on just the relevant data.&lt;/p&gt;
&lt;p&gt;So this next script automatically generates a Pig schema by examining your MongoDB collection. If you don’t need the whole schema, you can easily edit it to keep just the fields you want.&lt;/p&gt;
&lt;p&gt;Running this script is similar to running the previous one. If you ran the Characterize Collection script in the past hour, the same cluster you used for that job should still be running. In that case, you can just run:&lt;/p&gt;
&lt;pre&gt;mortar run mongo_schema_generator&lt;/pre&gt;
&lt;p&gt;If you don&amp;#8217;t have a cluster that’s still running, just run the job on a new 4 node cluster like this:&lt;/p&gt;
&lt;pre&gt;mortar run mongo_schema_generator --clustersize 4&lt;/pre&gt;
&lt;h4&gt;Script 3 – Twitter Hourly Coffee Tweets&lt;/h4&gt;
&lt;p&gt;Using a Twitter coffee tweets script (pigscripts/hourly_coffee_tweets.pig), we&amp;#8217;re going to demonstrate how we can use a small subset of the fields in our MongoDB collection. For our example, we’ll look at how often the word “coffee” is tweeted throughout the day. As with the Mongo Schema Generator script, you can run this job on an existing cluster or start up a new one.&lt;/p&gt;
&lt;h4&gt;Next Steps&lt;/h4&gt;
&lt;p&gt;If you already have a mongo instance/cluster based in US-East EC2, the first two example scripts should run on one of your collections with only minor modifications. You&amp;#8217;ll just need to:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Update the MongoLoader connection strings in the pig scripts to connect to your MongoDB collections with one of your own users. If your mongo instance is on a non-standard port (any port other than 27017), just email us at &lt;a href="mailto:support@mortardata.com"&gt;support@mortardata.com&lt;/a&gt; to allow your Mortar account to access that port.&lt;/li&gt;
&lt;li&gt;If you&amp;#8217;d like your jobs to write to one of your own S3 buckets, you can update the AWS keys associated with your Mortar account by following these &lt;a href="http://help.mortardata.com/#!/create_a_new_web_project"&gt;instructions to enable s3 access&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If you run out of free cluster hours with Mortar, you can &lt;a href="http://help.mortardata.com/#!/create_a_new_web_project"&gt;upgrade your account&lt;/a&gt; to get additional free hours each month.&lt;/li&gt;
&lt;li&gt;You can find more resources for learning Pig &lt;a href="http://help.mortardata.com/#!/pig_help_and_resources"&gt;here&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;If you have any questions or feedback, please contact us at &lt;a href="mailto:support@mortardata.com"&gt;support@mortardata.com&lt;/a&gt; or ping us on in-app chat at app.mortardata.com&lt;/li&gt;
&lt;/ol&gt;</description><link>http://blog.mongodb.org/post/43495291219</link><guid>http://blog.mongodb.org/post/43495291219</guid><pubDate>Tue, 19 Feb 2013 12:44:00 -0500</pubDate></item><item><title>Analyzing Your MongoDB Data with Analytica</title><description>&lt;p&gt;&lt;span&gt;&lt;em&gt;This is a guest post by Nosh Petigara, president of Analytica&lt;/em&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://analytica.com"&gt;&lt;span&gt;Analytica&lt;/span&gt;&lt;/a&gt;&lt;span&gt; is an analytics platform that makes it easy to analyze and report on data like user profiles, event logs, product catalogs, user-generated content, financial assets, or anything else you may have stored in you MongoDB database.&lt;/span&gt;&lt;/p&gt;
&lt;div&gt;&lt;a href="http://analytica.com"&gt;Analytica&lt;/a&gt;&lt;span&gt; is built from the ground up for rich document type data and uses a JSON-like representation throughout its architecture. You use Analytica Script &lt;/span&gt;a declarative expression language tailored for JSON data, to tell Analytica how perform calculations, filter, group, and transform your documents into the results you want. You can interact with Analytica using a plug-in to &lt;a href="http://www.analytica.com/technology/excel-client/"&gt;Microsoft Excel&lt;/a&gt; or a &lt;a href="http://www.analytica.com/technology/shell-client/"&gt;command line shell&lt;/a&gt;.  Analytica can also be used through its &lt;a href="http://www.analytica.com/technology/rest-interface/"&gt;REST API&lt;/a&gt;. Browser-based and mobile interfaces are coming soon. &lt;!-- more --&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;To show some of Analytica&amp;#8217;s capabilities, we downloaded all of the tweets sent by the @mongodb twitter account over the last 4 years into a MongoDB database using the Twitter API. Using Analytica, we then developed a dashboard which shows &lt;a href="https://twitter.com/mongodb"&gt;@mongodb&amp;#8217;s&lt;/a&gt; entire twitter history:&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;br/&gt;&lt;a href="https://s3.amazonaws.com/analyticaexample/mongodbtwitterhistory.html"&gt;&lt;img alt="image" src="http://media.tumblr.com/d9e30d4895ba572e8df8821ca379502e/tumblr_inline_mhe9ro5lEn1qz4rgp.png"/&gt;&lt;/a&gt;
&lt;p&gt;&lt;a href="https://s3.amazonaws.com/analyticaexample/mongodbtwitterhistory.html"&gt;&lt;img alt="image" src="http://media.tumblr.com/ba5aa8eeb0bf60aea2a632ec94f17480/tumblr_inline_mhe9s8MQyD1qz4rgp.png"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://s3.amazonaws.com/analyticaexample/mongodbtwitterhistory.html"&gt;&lt;img alt="image" src="http://media.tumblr.com/7f5b3064adac23555f9d0e986194bc09/tumblr_inline_mhe9wzuTh31qz4rgp.png"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div&gt;

&lt;p&gt;&lt;span&gt;Assuming you had a database called &amp;#8216;twitter&amp;#8217; and a collection called &amp;#8216;tweets&amp;#8217;, which contained the JSON documents for @mongodb&amp;#8217;s tweets from the Twitter API- here is how you&amp;#8217;d use Analytica to calculate the most commonly used hashtags with 3 commands:&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;div&gt;
&lt;p&gt;&lt;span&gt; &lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/blockquote&gt;
&lt;pre&gt;SET twitter.byHashtag = group(tweets.by(entities.hashtags.text)) //group our tweets by hashtag and store them in a calculated (virtual) collection called 'byHashtag'
SET twitter.byHashtag.count = count(tweets) // counts up the number of tweets for each hashtags in our virtual collection
SET twitter.tophashtags = orderdesc(byHashtag.by(count)) //sort the results in descending order&lt;br/&gt;&lt;br/&gt;&lt;/pre&gt;
&lt;p&gt;Analytica uses dot notion to specify what collections, documents, or properties to operate on. Each SET command in Analytica results in a computation or the transformation of a set of documents, the results of which are stored in what we call calculated properties or calculated collections. These are intermediate results, stored in Analytica (at the database, collection, or document level - depending on how you specify them), which can be used in subsequent computations. Finally the command &amp;#8216;twitter.tophashtags.(text, count)&amp;#8217; retrieves the text of the hashtags along with the count of how many tweets use that hashtag.&lt;/p&gt;
&lt;p&gt;Since we wanted to graph out our results, we used Analytica&amp;#8217;s plug in for Excel to enter a series of Analytica script expressions. In addition to calculating the most tweeted hashtags, we also looked at the frequency of tweets per month from the @mongodb account, analyzed the content of @mongodb&amp;#8217;s tweets to see how hashtags and URLs were being used, and computed a few other metrics. With this quick analysis, we saw that @mongodb&amp;#8217;s tweeting patterns have changed over time (a lot more tweets recently!), figured out that over 80% of @mongodb&amp;#8217;s tweets are retweeted at least once, and learnt (perhaps not surprisingly!) that the most popular tweets are about new releases. We graphed out the results and generated the &lt;a href="https://s3.amazonaws.com/analyticaexample/mongodbtwitterhistory.html"&gt;HTML page&lt;/a&gt; to share with the MongoDB community.&lt;/p&gt;
&lt;p&gt;We&amp;#8217;re holding a &lt;a href="http://www.10gen.com/events/webinar/analytics-and-bi-with-mongodb-and-analytica"&gt;webinar with 10gen&lt;/a&gt;&lt;span&gt; on February 12 so that you can learn more about Analytica and ask questions. In the &lt;a href="http://www.10gen.com/events/webinar/analytics-and-bi-with-mongodb-and-analytica"&gt;webinar&lt;/a&gt;, we&amp;#8217;ll go through how you can use Analytica on your own data to produce in-depth analyses, dashboards and reports and become a data whiz! In the meantime you can&lt;/span&gt;&lt;span&gt; learn more and download the beta version of &lt;a href="http://analytica.com"&gt;Analytica&lt;/a&gt;. You&amp;#8217;ll be able to run Analytica against your own datasets or in &lt;a href="http://analytica.com/learn/5steps"&gt;an example&lt;/a&gt; we&amp;#8217;ve put together on data from StackOverflow&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;If you are looking for other datasets to try, I&amp;#8217;d recommend checking out &lt;a href="https://dev.twitter.com/"&gt;Twitter&amp;#8217;s API&lt;/a&gt;&lt;span&gt;, &lt;a href="https://developer.foursquare.com/"&gt;Foursquare&amp;#8217;s API&lt;/a&gt;&lt;/span&gt;&lt;span&gt;, the &lt;a href="http://developer.nytimes.com/docs"&gt;NYTimes API&lt;/a&gt;&lt;/span&gt;&lt;span&gt;, or &lt;a href="http://services.sunlightlabs.com/"&gt;Sunlight Labs API&lt;/a&gt;&lt;/span&gt;&lt;span&gt;. Each of these has JSON, CSV or XML data that you can easily import into MongoDB to start analyzing with Analytica or MongoDB&amp;#8217;s query language and aggregation framework. We&amp;#8217;ll also post a step-by-step guide soon, which will describe how you can run an analysis on your own twitter history. We&amp;#8217;d love to hear from you - you can &lt;a href="mailto:info@analytica.com"&gt;email&lt;/a&gt;&lt;/span&gt;&lt;span&gt; with questions or feedback.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.analytica.com/learn/"&gt;Analytica Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.10gen.com/events/webinar/analytics-and-bi-with-mongodb-and-analytica"&gt;Learn more about MongoDB and Analytica in the Webinar on Data Analytics and Business Intelligence with MongoDB and Analytica February 12&lt;/a&gt;  &lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/analytica_inc"&gt;Follow Analytica on Twitter&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description><link>http://blog.mongodb.org/post/41788952252</link><guid>http://blog.mongodb.org/post/41788952252</guid><pubDate>Tue, 29 Jan 2013 11:13:00 -0500</pubDate></item><item><title>Checking Disk Performance with the mongoperf Utility</title><description>&lt;p&gt;&lt;em&gt;Note: while this blog post uses some Linux commands in its examples, mongoperf runs and is useful on just about all operating systems.&lt;/em&gt;&lt;strong&gt;&lt;br/&gt;&lt;br/&gt;&lt;/strong&gt;mongoperf is a utility for checking disk i/o performance of a server independent of MongoDB. It performs simple timed random disk i/o&amp;#8217;s. &lt;br/&gt;&lt;br/&gt;mongoperf has a couple of modes: mmf:false and mmf:true  &lt;/p&gt;
&lt;p&gt;mmf:false mode is a completely generic random physical I/O test &amp;#8212; there is effectively no MongoDB code involved.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;br/&gt;With mmf:true mode, the test is a benchmark of memory-mapped file based I/O.  The code is not the MongoDB code but the actions are analogous.  Thus this is a good baseline test of a system including the operating system virtual memory manager&amp;#8217;s behavior.&lt;/p&gt;
&lt;p&gt;To build the mongoperf tool:&lt;/p&gt;
&lt;pre&gt;scons mongoperf&lt;/pre&gt;
&lt;p&gt;&lt;strong id="internal-source-marker_0.7024937095120549"&gt;&lt;span&gt;(Or, “scons mongoperf.exe” on Windows.)&lt;/span&gt;&lt;br/&gt;&lt;span&gt;&lt;br/&gt;&lt;/span&gt;&lt;/strong&gt;or grab a prebuilt binary &lt;a href="http://www.mongodb.org/display/DOCS/mongoperf"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;br/&gt;&lt;/strong&gt;Then try&lt;/p&gt;
&lt;h3&gt;mmf:false mode&lt;/h3&gt;
&lt;p&gt;Here&amp;#8217;s an example of a test run with 32 threads performing random physical reads. Note that mongoperf gradually adds more threads so that you can see the difference in performance with more concurrency.&lt;/p&gt;
&lt;pre&gt; $ echo "{nThreads:32,fileSizeMB:1000,r:true}" | mongoperf 
mongoperf
use -h for help
parsed options:
{ nThreads: 32, fileSizeMB: 1000, r: true }
creating test file size:1000MB ...
testing...
options:{ nThreads: 32, fileSizeMB: 1000, r: true }
wthr 32
new thread, total running : 1
read:1 write:0
4759 ops/sec 18 MB/sec
4752 ops/sec 18 MB/sec
4760 ops/sec 18 MB/sec
4758 ops/sec 18 MB/sec
4752 ops/sec 18 MB/sec
4754 ops/sec 18 MB/sec
4758 ops/sec 18 MB/sec
4755 ops/sec 18 MB/sec
new thread, total running : 2
9048 ops/sec 35 MB/sec
9039 ops/sec 35 MB/sec
9056 ops/sec 35 MB/sec
9029 ops/sec 35 MB/sec
9047 ops/sec 35 MB/sec
9072 ops/sec 35 MB/sec
9040 ops/sec 35 MB/sec
9042 ops/sec 35 MB/sec
new thread, total running : 4
15116 ops/sec 59 MB/sec
15346 ops/sec 59 MB/sec
15401 ops/sec 60 MB/sec
15448 ops/sec 60 MB/sec
15450 ops/sec 60 MB/sec
15502 ops/sec 60 MB/sec
15474 ops/sec 60 MB/sec
15480 ops/sec 60 MB/sec
read:1 write:0
read:1 write:0
new thread, total running : 8
read:1 write:0
read:1 write:0
15999 ops/sec 62 MB/sec
21811 ops/sec 85 MB/sec
21888 ops/sec 85 MB/sec
21964 ops/sec 85 MB/sec
21876 ops/sec 85 MB/sec
22058 ops/sec 86 MB/sec
21966 ops/sec 85 MB/sec
21976 ops/sec 85 MB/sec
new thread, total running : 16
24316 ops/sec 94 MB/sec
24949 ops/sec 97 MB/sec
25239 ops/sec 98 MB/sec
25032 ops/sec 97 MB/sec
25020 ops/sec 97 MB/sec
25331 ops/sec 98 MB/sec
25175 ops/sec 98 MB/sec
25081 ops/sec 97 MB/sec
new thread, total running : 32
24314 ops/sec 94 MB/sec
24991 ops/sec 97 MB/sec
24779 ops/sec 96 MB/sec
24743 ops/sec 96 MB/sec
24932 ops/sec 97 MB/sec
24947 ops/sec 97 MB/sec
24831 ops/sec 96 MB/sec
24750 ops/sec 96 MB/sec
24843 ops/sec 97 MB/sec
&lt;/pre&gt;
&lt;p&gt;The above test was ran on an SSD volume on a 64 bit Red Hat Enterprise Linux server. Notice how the ops/second increase as we add more threads (to a point). It&amp;#8217;s interesting to look at the output of iostat while this was running:&lt;/p&gt;
&lt;pre&gt;iostat -xm 2

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s   avgrq-sz avgqu-sz   await  svctm  %util
dm-0              0.00     0.00  1532.00  4104.00     5.98    16.03     8.00  2354.34  517.87   0.17  96.30
dm-0              0.00     0.00  4755.00     0.00    18.57     0.00     8.00     0.93    0.19   0.19  92.55
dm-0              0.00     0.00  4755.50     0.00    18.58     0.00     8.00     0.93    0.20   0.20  93.20
dm-0              0.00     0.00  4753.50     0.00    18.57     0.00     8.00     0.93    0.20   0.20  93.30
dm-0              0.00     0.00  6130.50     1.50    23.95     0.01     8.00     1.23    0.20   0.16  95.15
dm-0              0.00     0.00  9047.50     0.00    35.34     0.00     8.00     1.84    0.20   0.11 100.05
dm-0              0.00     0.00  9033.50     0.00    35.29     0.00     8.00     1.84    0.20   0.11  99.95
dm-0              0.00     0.00  9053.50     9.50    35.37     0.04     8.00     2.00    0.22   0.11 100.00
dm-0              0.00     0.00 10901.00     0.00    42.58     0.00     8.00     2.43    0.22   0.09 100.05
dm-0              0.00     0.00 15404.50     0.00    60.17     0.00     8.00     3.56    0.23   0.06 100.05
dm-0              0.00     0.00 15441.50     0.00    60.32     0.00     8.00     3.58    0.23   0.06 100.20
dm-0              0.00     0.00 15476.50     0.00    60.46     0.00     8.00     3.56    0.23   0.06 100.00
dm-0              0.00     0.00 15433.00     0.00    60.29     0.00     8.00     4.87    0.23   0.06 100.05
dm-0              0.00     0.00 21024.00     0.00    82.12     0.00     8.00     7.06    0.39   0.05 100.40
dm-0              0.00     0.00 21917.00     0.00    85.62     0.00     8.00     6.91    0.31   0.05 100.35
dm-0              0.00     0.00 21964.00     0.00    85.80     0.00     8.00     6.96    0.32   0.05 100.30
dm-0              0.00     0.00 22738.00     0.00    88.82     0.00     8.00     8.07    0.34   0.04 100.25
dm-0              0.00     0.00 24893.00     0.00    97.24     0.00     8.00    10.05    0.41   0.04 100.60
dm-0              0.00     0.00 25060.00     0.00    97.89     0.00     8.00    10.21    0.40   0.04 100.20
dm-0              0.00     0.00 25236.50     0.00    98.58     0.00     8.00    10.34    0.40   0.04 100.70
dm-0              0.00     0.00 24802.00     0.00    96.88     0.00     8.00    11.28    0.40   0.04 100.60
dm-0              0.00     0.00 24859.00     0.00    97.11     0.00     8.00    10.08    0.45   0.04 100.70
dm-0              0.00     0.00 24793.50     0.00    96.85     0.00     8.00     9.89    0.39   0.04 101.10
dm-0              0.00     0.00 24881.00     0.00    97.19     0.00     8.00     9.93    0.39   0.04 100.90
dm-0              0.00     0.00 24823.00     0.00    96.96     0.00     8.00     9.79    0.39   0.04 100.50
dm-0              0.00     0.00 24805.00     0.00    96.89     0.00     8.00     9.92    0.40   0.04 100.40
dm-0              0.00     0.00 24901.00     0.00    97.27     0.00     8.00     9.97    0.39   0.04 100.90
&lt;/pre&gt;
&lt;p&gt;A few things stand out.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;span&gt;First, the read per second (&amp;#8220;r/s&amp;#8221;) numbers match our mongoperf results. &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Second, it&amp;#8217;s clear that the &amp;#8220;%util&amp;#8221; column is fairly meaningless in this particular case &amp;#8212; we were able to increase r/s even after %util hit 100. I assume this is because the %util value is a modeled value and the assumptions involved which don&amp;#8217;t hold for this device.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Third, note that if you multiply the r/s value by 4KB, you get the rMB/s value &amp;#8212; so we are really doing 4KB reads in this case. &lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;span&gt;We can now try some writes:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;$ echo "{nThreads:32,fileSizeMB:1000,w:true}|mongoperf
new thread, total running : 1
549 ops/sec 2 MB/sec
439 ops/sec 1 MB/sec
270 ops/sec 1 MB/sec
295 ops/sec 1 MB/sec
281 ops/sec 1 MB/sec
371 ops/sec 1 MB/sec
235 ops/sec 0 MB/sec
379 ops/sec 1 MB/sec
new thread, total running : 2
243 ops/sec 0 MB/sec
354 ops/sec 1 MB/sec
310 ops/sec 1 MB/sec
2491 ops/sec 9 MB/sec
2293 ops/sec 8 MB/sec
2077 ops/sec 8 MB/sec
2559 ops/sec 9 MB/sec
1099 ops/sec 4 MB/sec
new thread, total running : 4
2676 ops/sec 10 MB/sec
2667 ops/sec 10 MB/sec
2536 ops/sec 9 MB/sec
2600 ops/sec 10 MB/sec
2612 ops/sec 10 MB/sec
2498 ops/sec 9 MB/sec
2506 ops/sec 9 MB/sec
2492 ops/sec 9 MB/sec
new thread, total running : 8
2463 ops/sec 9 MB/sec
2439 ops/sec 9 MB/sec
2445 ops/sec 9 MB/sec
2401 ops/sec 9 MB/sec
2271 ops/sec 8 MB/sec
2202 ops/sec 8 MB/sec
2206 ops/sec 8 MB/sec
2181 ops/sec 8 MB/sec
new thread, total running : 16
2105 ops/sec 8 MB/sec
2263 ops/sec 8 MB/sec
2305 ops/sec 9 MB/sec
2408 ops/sec 9 MB/sec
2324 ops/sec 9 MB/sec
2244 ops/sec 8 MB/sec
2013 ops/sec 7 MB/sec
2004 ops/sec 7 MB/sec
new thread, total running : 32
read:0 write:1
2088 ops/sec 8 MB/sec
2091 ops/sec 8 MB/sec
2365 ops/sec 9 MB/sec
2278 ops/sec 8 MB/sec
2322 ops/sec 9 MB/sec
2241 ops/sec 8 MB/sec
2105 ops/sec 8 MB/sec
2241 ops/sec 8 MB/sec
2040 ops/sec 7 MB/sec
1997 ops/sec 7 MB/sec
2062 ops/sec 8 MB/sec
2111 ops/sec 8 MB/sec
2150 ops/sec 8 MB/sec
2253 ops/sec 8 MB/sec
2246 ops/sec 8 MB/sec
2188 ops/sec 8 MB/sec
&lt;/pre&gt;
&lt;p&gt;This relatively old SSD drive can only do 2K random writes per second. It appears we need more than one thread to saturate too; we could run with nThreads:1 for a long time to verify that is true. Here are some mongoperf statistics from a test run on an Amazon EC2 machine with internal SSD storage:&lt;/p&gt;
&lt;pre&gt;         iops, thousands	
threads  read test write test
-------  --------- ----------
1		4	   8
2	 	8	   8
4	 	16	   8
8	 	32	   8	
16	 	64	   8
32	 	70	   8
&lt;/pre&gt;
&lt;p&gt;Here&amp;#8217;s a read test on a RAID-10 volume comprised of four spinning disks (SATA):&lt;/p&gt;
&lt;pre&gt;parsed options:
{ nThreads: 32, fileSizeMB: 1000, r: true }
creating test file size:1000MB ...
new thread, total running : 1
150 ops/sec 0 MB/sec
174 ops/sec 0 MB/sec
169 ops/sec 0 MB/sec
new thread, total running : 2
351 ops/sec 1 MB/sec
333 ops/sec 1 MB/sec
347 ops/sec 1 MB/sec
new thread, total running : 4
652 ops/sec 2 MB/sec
578 ops/sec 2 MB/sec
715 ops/sec 2 MB/sec
new thread, total running : 16
719 ops/sec 2 MB/sec
722 ops/sec 2 MB/sec
493 ops/sec 1 MB/sec
new thread, total running : 32
990 ops/sec 3 MB/sec
955 ops/sec 3 MB/sec
842 ops/sec 3 MB/sec
&lt;/pre&gt;
&lt;p&gt;Note that when testing a volume using spinning disks it is important to make your test file large &amp;#8212; much larger than the 1GB test file in the examples above. Otherwise the test will only be hitting a few adjacent cylinders on the disk and report results that are faster than you would achieve if the disk is used in its entirety. Let&amp;#8217;s try a larger file:&lt;/p&gt;
&lt;pre&gt;{ nThreads: 32, fileSizeMB: 20000, r: true }
new thread, total running : 1
86 ops/sec 0 MB/sec
98 ops/sec 0 MB/sec
91 ops/sec 0 MB/sec
new thread, total running : 2
187 ops/sec 0 MB/sec
188 ops/sec 0 MB/sec
192 ops/sec 0 MB/sec
new thread, total running : 4
295 ops/sec 1 MB/sec
296 ops/sec 1 MB/sec
233 ops/sec 0 MB/sec
new thread, total running : 8
307 ops/sec 1 MB/sec
429 ops/sec 1 MB/sec
414 ops/sec 1 MB/sec
new thread, total running : 16
554 ops/sec 2 MB/sec
501 ops/sec 1 MB/sec
455 ops/sec 1 MB/sec
new thread, total running : 32
893 ops/sec 3 MB/sec
603 ops/sec 2 MB/sec
814 ops/sec 3 MB/sec
&lt;/pre&gt;
&lt;p&gt;Let&amp;#8217;s now try a write test on the RAID-10 spinning disks:&lt;/p&gt;
&lt;pre&gt;parsed options:
{ nThreads: 32, fileSizeMB: 1000, w: true }
creating test file size:1000MB ...
new thread, total running : 1
113 ops/sec 0 MB/sec
117 ops/sec 0 MB/sec
113 ops/sec 0 MB/sec
new thread, total running : 2
120 ops/sec 0 MB/sec
113 ops/sec 0 MB/sec
115 ops/sec 0 MB/sec
new thread, total running : 4
115 ops/sec 0 MB/sec
115 ops/sec 0 MB/sec
112 ops/sec 0 MB/sec
new thread, total running : 8
111 ops/sec 0 MB/sec
110 ops/sec 0 MB/sec
111 ops/sec 0 MB/sec
new thread, total running : 16
116 ops/sec 0 MB/sec
110 ops/sec 0 MB/sec
105 ops/sec 0 MB/sec
new thread, total running : 32
115 ops/sec 0 MB/sec
111 ops/sec 0 MB/sec
114 ops/sec 0 MB/sec
&lt;/pre&gt;
&lt;p&gt;The write result above seems slower than one would expect &amp;#8212; this is an example where more investigation and analysis would then be appropriate, and an example of a case where running mongoperf might prove useful.&lt;/p&gt;
&lt;h3&gt;mmf:true mode&lt;/h3&gt;
&lt;p&gt;mongoperf has another test mode where instead of using direct (physical) i/o, it tests random reads and writes via memory mapped file regions. In this case caching will come into effect &amp;#8212; you should see very high read speeds if the datafile is small, and speeds that begin to approach physical random I/O speed as the datafile becomes larger than RAM. For example:&lt;/p&gt;
&lt;pre&gt;parsed options:
{ recSizeKB: 8, nThreads: 8, fileSizeMB: 1000, r: true, mmf: true }
creating test file size:1000MB ...
new thread, total running : 1
read:1 write:0
65 ops/sec
79 ops/sec
92 ops/sec
107 ops/sec
111 ops/sec
87 ops/sec
125 ops/sec
141 ops/sec
new thread, total running : 2
273 ops/sec
383 ops/sec
422 ops/sec
594 ops/sec
1220 ops/sec
2598 ops/sec
36578 ops/sec
489132 ops/sec
new thread, total running : 4
183926 ops/sec
171128 ops/sec
173286 ops/sec
172908 ops/sec
173187 ops/sec
173322 ops/sec
173961 ops/sec
175195 ops/sec
new thread, total running : 8
389256 ops/sec
396595 ops/sec
398382 ops/sec
402393 ops/sec
400701 ops/sec
404904 ops/sec
400571 ops/sec
&lt;/pre&gt;
&lt;p&gt;The numbers start low as at the beginning of the reading the test file is not in the file system cache (in the Linux version of mongoperf anyway). Data faults into the cache quite quickly as the readahead for the volume is quite large. Once the entire file is in ram the number of accesses per second is quite high.&lt;/p&gt;
&lt;p&gt;We can look at the readahead settings for the device with &amp;#8220;sudo blockdev &amp;#8212;report&amp;#8221;. Note that the value reported by this utility in the &amp;#8220;RA&amp;#8221; field the number of 512 byte sectors.&lt;/p&gt;
&lt;p&gt;During the above test, if we look at iostat, we see large reads occuring because of the readahead setting that was used (the avgrq-sz column, which specifies number of sectors requested):&lt;/p&gt;
&lt;pre&gt;Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdc             148.33     0.00  116.00    0.00    22.30     0.00   393.63     1.68   14.48   7.19  83.40
sdd             130.67     0.00  113.00    0.00    20.38     0.00   369.35     1.54   13.58   7.19  81.23
sde             154.00     0.00  113.67    0.00    21.85     0.00   393.64     1.84   16.23   7.38  83.87
sdb             140.00     0.00  107.00    0.00    20.27     0.00   387.91     1.88   17.58   7.84  83.87
md0               0.00     0.00 1025.33    0.00    85.34     0.00   170.45     0.00    0.00   0.00   0.00
&lt;/pre&gt;
&lt;p&gt;Thus we are reading ahead about approximately 200KB apparently from each spindle on a physical random read I/O.&lt;/p&gt;
&lt;p&gt;Note that if your database is much larger than RAM and you expect there to be cache misses on a regular basis, this readahead setting might be too large &amp;#8212; if the object to be fetched from disk is only 8KB, another ~200KB in this case is being read ahead with it. This is good for cache preheating but that readahead could eject other data from the file system cache; thus if the data read ahead were &amp;#8220;cold&amp;#8221; and unlikely to be used, that would be bad. In that situation, make the readahead setting for your volume smaller. 32KB might be a good setting, perhaps 16KB on a solid state disk. (It is likely never helpful to go below 8KB (four sectors) as MongoDB b-tree buckets are 8KB.)&lt;/p&gt;
&lt;p&gt;One of a couple trade-offs with readahead is that cache preheating will take a long time if the readahead setting is tiny. Consider the following run where there was no readahead (just 4KB reads on faults with no readahead occurring):&lt;/p&gt;
&lt;pre&gt;parsed options:
{ nThreads: 32, fileSizeMB: 1000, r: true, mmf: true }
creating test file size:1000MB ...
testing...
new thread, total running : 1
67 ops/sec
110 ops/sec
184 ops/sec
167 ops/sec
174 ops/sec
159 ops/sec
189 ops/sec
190 ops/sec
new thread, total running : 2
362 ops/sec
393 ops/sec
371 ops/sec
354 ops/sec
374 ops/sec
388 ops/sec
384 ops/sec
394 ops/sec
new thread, total running : 4
486 ops/sec
400 ops/sec
570 ops/sec
589 ops/sec
567 ops/sec
545 ops/sec
576 ops/sec
412 ops/sec
new thread, total running : 8
666 ops/sec
601 ops/sec
499 ops/sec
731 ops/sec
618 ops/sec
448 ops/sec
508 ops/sec
547 ops/sec
new thread, total running : 16
815 ops/sec
802 ops/sec
917 ops/sec
580 ops/sec
955 ops/sec
1006 ops/sec
1048 ops/sec
938 ops/sec
new thread, total running : 32
1993 ops/sec
1186 ops/sec
1331 ops/sec
1317 ops/sec
1298 ops/sec
991 ops/sec
1431 ops/sec
1406 ops/sec
1395 ops/sec
1099 ops/sec
1265 ops/sec
1400 ops/sec
1484 ops/sec
1436 ops/sec
1352 ops/sec
1438 ops/sec
1380 ops/sec
1350 ops/sec
1565 ops/sec
1440 ops/sec
1015 ops/sec
1253 ops/sec
1414 ops/sec
1443 ops/sec
1478 ops/sec
1405 ops/sec
1305 ops/sec
1518 ops/sec
1217 ops/sec
1573 ops/sec
1605 ops/sec
1476 ops/sec
1130 ops/sec
1362 ops/sec
1463 ops/sec
1740 ops/sec
1682 ops/sec
1653 ops/sec
1135 ops/sec
1521 ops/sec
1821 ops/sec
1708 ops/sec
1701 ops/sec
1631 ops/sec
1195 ops/sec
1752 ops/sec
1701 ops/sec

... time passes ...

353038 ops/sec
353508 ops/sec
353159 ops/sec
&lt;/pre&gt;
&lt;p&gt;Near the end of the run, the entire test file is in the file system cache:&lt;/p&gt;
&lt;pre&gt;  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
28564 dwight    20   0 1292m 1.0g 1.0g S 609.1  3.3   0:47.11 mongoperf
&lt;/pre&gt;
&lt;script src="https://gist.github.com/4557233.js" type="text/javascript"&gt;&lt;/script&gt;&lt;p&gt;Note though if fetching only 4KB at a time, and 400 physical random reads per second, we&amp;#8217;ll need up to 1GB / 4KB per page / 400 pages fetched per second = 655 seconds to heat up the cache. (And 1GB is a small cache, imagine a machine with 128GB of RAM and a database that large or larger.) Note there are ways to preheat a cache other than readahead, for more info see: &lt;a href="http://blog.mongodb.org/post/10407828262/cache-reheating-not-to-be-ignored."&gt;http://blog.mongodb.org/post/10407828262/cache-reheating-not-to-be-ignored.&lt;/a&gt; Suggestion: on Linux, we suggest using a recSizeKB of 8 or larger when using mmf:true &amp;#8212; it seems that when only a single 4KB page is touched, certain kernel versions may not perform readahead. (At least the way mongoperf is coded&amp;#8230;)&lt;/p&gt;
&lt;h3&gt;Writes with mmf:true&lt;/h3&gt;
&lt;p&gt;We can also do load testing and simulations of writes via memory-mapped files (analogous to what MongoDB does in its storage engine) with mongoperf. Use mmf:true and w:true for this.&lt;/p&gt;
&lt;p&gt;MongoDB writes are written to the crash recovery log (journal) by mongod almost immediately, however the datafile writes can be deferred up to a minute. mongoperf simulates this behavior by fsync&amp;#8217;ing its test datafile once a minute. Since writes are only allowed to be lazy by that amount, even if the data written fits in RAM, it will be written to disk fairly soon (within a minute) &amp;#8212; thus you may see a good amount of random write I/O when mongoperf is running even if the test datafile fits in RAM. This is one reason SSDs are often popular in MongoDB deployments.&lt;/p&gt;
&lt;p&gt;For example, consider a scenario where we run the following:&lt;/p&gt;
&lt;pre&gt;$ echo "{recSizeKB:8,nThreads:32,fileSizeMB:1000,w:true,mmf:true}" | mongoperf&lt;/pre&gt;
&lt;p&gt;If our drive can write 1GB (the test datafile size) sequentially in less than a minute (not unusual), the test will likely report a very high sustained write rate, even after running more than a minute. However if we then make the file far larger than 1GB, you will likely see a significant slowdown in write speed as the background flushing of data &amp;gt;= 1 minute old will become a factor (at least on spinning disks).&lt;/p&gt;
&lt;h3&gt;Mixed mode&lt;/h3&gt;
&lt;p&gt;Note that mongoperf has some other options, see the &amp;#8212;help option for more info. In particular you can run a test with concurrent reads and writes in the same test, and also you can specify the read and write rates to explicitly simulate a certain scenario you would like to test.&lt;/p&gt;
&lt;h3&gt;Conclusions and Caveats&lt;/h3&gt;
&lt;p&gt;Note that mongoperf is not MongoDB. mmf:false mode is testing physical disk i/o with no caching; because of caching MongoDB will usually perform vastly better than that. Additionally, mmf:true is *not* a perfect simulation of MongoDB. You might get superior performance in MongoDB than mongoperf indicates.&lt;/p&gt;
&lt;p&gt;P.S. The mongoperf utility is very simple (a couple hundred lines of code), so you may wish to take a look at its &lt;a href="https://github.com/mongodb/mongo/blob/master/src/mongo/client/examples/mongoperf.cpp"&gt;source code&lt;/a&gt;&lt;/p&gt;</description><link>http://blog.mongodb.org/post/40769806981</link><guid>http://blog.mongodb.org/post/40769806981</guid><pubDate>Thu, 17 Jan 2013 12:14:00 -0500</pubDate><category>Disk</category><category>performance</category><category>ops</category><category>operations</category></item><item><title>MongoDB Text Search: Experimental Feature in MongoDB 2.4</title><description>&lt;p&gt;&lt;span&gt;Text search (SERVER-380) is one of the most requested features for MongoDB &lt;/span&gt;&lt;span&gt;10gen is working on an experimental text-search feature, to be released in v2.4, &lt;/span&gt;&lt;span&gt;and we’re already seeing some talk in the community about the native implementation within the server&lt;/span&gt;&lt;span&gt;. We view this as an important step towards fulfilling a community need. &lt;/span&gt;&lt;span&gt;&lt;br/&gt;&lt;br/&gt;&lt;a href="http://docs.mongodb.org/manual/release-notes/2.4/#text-indexes"&gt;MongoDB text search&lt;/a&gt; is still in its infancy and we encourage you to try it out on your datasets. Many applications use both MongoDB and Solr/Lucene, but realize that there is still a feature gap. For some applications, the basic text search that we are introducing may be sufficient. As you get to know text search, you can determine when MongoDB has crossed the threshold for what you need. &lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h4&gt;Setting up Text Search&lt;/h4&gt;
&lt;p&gt;&lt;span&gt;&lt;br/&gt;&lt;span&gt;You can configure text search in the mongo shell:&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;db.adminCommand( { setParameter : 1, textSearchEnabled : true } )&lt;/pre&gt;
&lt;p&gt;&lt;span&gt;&lt;br/&gt;&lt;span&gt;Or set a command: &lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;mongod --setParameter textSearchEnabled=true&lt;/pre&gt;
&lt;p&gt;&lt;span&gt; &lt;/span&gt;&lt;/p&gt;
&lt;h4&gt;&lt;!-- more --&gt;&lt;/h4&gt;
&lt;h4&gt;A Simple Example:&lt;/h4&gt;
&lt;p&gt;&lt;span&gt;&lt;br/&gt;&lt;span&gt;In this example, we will insert 3 documents into a collection, add in text indexes and then query for the word &amp;#8220;Australian&amp;#8221;. &lt;/span&gt;&lt;br/&gt;&lt;script src="https://gist.github.com/4549025.js" type="text/javascript"&gt;&lt;/script&gt;&lt;br/&gt;&lt;span&gt;We’ll be organizing a series of project nights through the MongoDB User Group network for anyone interested to come and try out the feature and provide feedback. Here are a list of the upcoming testing sessions:&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;span&gt;&lt;a href="http://www.meetup.com/MongoDB-SV-User-Group/events/96720372/"&gt;Palo Alto&lt;/a&gt; - January 23&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;a href="http://www.meetup.com/London-MongoDB-User-Group/events/96814272/"&gt;London&lt;/a&gt; - January 24&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;span id="internal-source-marker_0.03641563840210438"&gt;&lt;br/&gt;&lt;span&gt;If you’re interested in organizing a project night for text search get in touch with the community team who can help you get set up. All you need is some computers and a few data sets and you’ll be ready to test.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.mongodb.org/downloads"&gt;Download the 2.3.2 development release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://docs.mongodb.org/manual/release-notes/2.4/#text-indexes"&gt;View the docs on text indexes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description><link>http://blog.mongodb.org/post/40513621310</link><guid>http://blog.mongodb.org/post/40513621310</guid><pubDate>Mon, 14 Jan 2013 07:00:00 -0500</pubDate><category>mongodb</category><category>text search</category><category>full text search</category><category>testing</category><category>2.4</category><category>release</category></item><item><title>MongoDB Schema Design: Insights and Tradeoffs from...</title><description>&lt;object id="kaltura_player_1355508638" name="kaltura_player_1355508638" type="application/x-shockwave-flash" height="333" width="400" data="http://www.kaltura.com/index.php/kwidget/cache_st/1355508638/wid/_1067742/uiconf_id/11169042/entry_id/1_ncdvtiqj"&gt;&lt;param name="allowFullScreen" value="true" /&gt;&lt;param name="allowNetworking" value="all" /&gt;&lt;param name="allowScriptAccess" value="always" /&gt;&lt;param name="bgcolor" value="#000000" /&gt;&lt;param name="flashVars" value="&amp;" /&gt;&lt;param name="movie" value="http://www.kaltura.com/index.php/kwidget/cache_st/1355508638/wid/_1067742/uiconf_id/11169042/entry_id/1_ncdvtiqj" /&gt;        &lt;/object&gt;&lt;br/&gt;&lt;br/&gt;&lt;h1&gt;MongoDB Schema Design: Insights and Tradeoffs from Jetlore&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;&lt;span&gt;&lt;br/&gt;MongoDB’s flexible schema is a powerful feature, and to build a successful first application you need to know how to leverage this feature to its full extent. In this presentation, Montse Medina outlines lessons learned from building Jetlore, a social content marketing platform. Some performance tips from this video: &lt;/span&gt;&lt;br/&gt;&lt;span&gt;&lt;/span&gt;&lt;br/&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;span&gt;Sometimes it’s ok to randomize your sharding key. When you have lots of users that want to read from other users, you’ll need to randomize it in order to have fewer disk seeks per shard. &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Reduce collection size by always using short field names as a convention. This will help you save memory over time. &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;Always test your queries with .explain() to check that you’re hitting the right index.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;strong id="internal-source-marker_0.3174995433073491"&gt;&lt;span&gt;&lt;/span&gt;&lt;br/&gt;&lt;/strong&gt;&lt;/p&gt;</description><link>http://blog.mongodb.org/post/38467892360</link><guid>http://blog.mongodb.org/post/38467892360</guid><pubDate>Fri, 21 Dec 2012 10:47:00 -0500</pubDate><category>mongodb schema design</category><category>schema</category><category>schemaless</category><category>schema free</category><category>indexing</category><category>index</category><category>.explain()</category></item></channel></rss>
