Posts tagged:


September Blog, Release and 2.2 Roundup

Oct 2 • Posted 1 year ago

Fast datetimes in MongoDB

Oct 1 • Posted 2 years ago

This was originally posted to Mike Friedman’s blog. Mike is a Perl Evangelist at 10gen, working on the Perl Driver for MongoDB One of the most common complaints about the Perl MongoDB driver is that it tries to be a little too clever. In the current production release of (version 0.46.2 as of this writing), all datetime values retrieved by a query are automatically instantiated as DateTime objects. DateTime is a remarkable CPAN distribution. In fact, I would say that DateTime and its related distributions on CPAN comprise one of the best date and time manipulation libraries in any programming language. But that power comes with a cost. The DateTime codebase is large, and instantiating DateTime objects is expensive. The constructor performs a great deal of validation, and creates a large amouunt of metadata which is stored inside the object. Upcoming changes to the Perl MongoDB driver solve this problem. Read more below. If you need to perform a series of complex arithmetic operations with dates, then the cost of DateTime is justified. But frequently, all you want is a simple read-only value that is sufficient for displaying to a user or saving elsewhere. If you are running queries involving a large number of documents, the automatic instantiation of thousands of complex objects becomes barrier to performance.

Read more

How MongoDB makes custom e-commerce easy

Sep 17 • Posted 2 years ago

The market for open source e-commerce software has gone through a lot of stages already, as you might know it by popular platforms like osCommerce, Magento, Zen Cart, PrestaShop, Spree, just to name a few. These platforms are frequently used as a basis for custom e-commerce apps, and they all require a SQL database. Given the inherent challenge in adapting open source software to custom features, it would seem that MongoDB is poised to play an important role in the next wave of e-commerce innovation.

Kyle Banker was one of the first to blog about MongoDB and e-commerce in April 2010, and there’s been surprisingly little written about it since then. In his blog, Kyle writes about Magento and other SQL based platforms: “What you’ll see is a flurry of tables working together to provide a flexible schema on top of a fundamentally inflexible style of database system.”

To this we must ask, why is a flexible schema so important in e-commerce?

Open source platforms are meant to be adapted to many different designs, conversion flows, and business processes. A flexible schema helps by giving developers a way to relate custom data structures to the platform’s existing model. Without a flexible schema, the developer has to get over high hurdles to make a particular feature possible. When the cost of creating and maintaining a custom feature is too high, the options are: give up the feature, start over with a different platform, or build a platform from scratch. That’s an expensive proposition.

There is a better way

For the past year we’ve been developing Forward, a new open source e-commerce platform combined with MongoDB. It’s been in production use since March 2012, and finally reached a point where we can demonstrate the benefits that MongoDB’s schema-less design brings to custom feature development.

The following examples demonstrate Forward’s REST-like ORM conventions, which are only available in the platform itself, but the underlying concepts map directly to MongoDB’s document structure. In this case, think of get() as db.collection.find() — put() as insert/update() — post() as insert() — and delete() as… delete().

Prototype faster

The majority of e-commerce sites represent small businesses, where moving fast can be the most important aspect of a web platform. When the flexible document structure of MongoDB is carried through the platform’s model interface, adding custom fields becomes easier than ever.

For example, let’s say you need a simple administrative view for adding a couple custom attributes to a product. Here’s a basic example for that purpose, written in Forward’s template syntax:

{args $product_id}

{if $}
    {$product = put("/products/$product_id", [
        spec => $params.spec,
        usage => $params.usage
    {flash notice="Saved" refresh=true}
    {$product = get("/products/$product_id")}

<for method="post">
    <div class="field">
        <label>Product specification</label>
        <textarea name="spec">{$product.spec|escape}</textarea>
    <div class="field">
        <label>Product usage instructions</label>
        <textarea name="usage">{$product.usage|escape}</textarea>
    <button type="submit">Save product</button>

It might be obvious what this template does, but what might be less obvious is that the platform knows nothing about the “spec” or “usage” fields, and yet they are treated as if the e-commerce data model was designed for them. No database migration necessary, just code.

You may argue this can be accomplished with a fuzzy SQL database structure, and you would be correct, but it’s not pretty, or readable with standard database tools. Ad-hoc queries on custom fields would become difficult.

Query on custom fields

If all we needed were custom key/value storage, you might not benefit that much from of a flexible schema. Where MongoDB really shines is in its ability to query on any document field, even embedded documents.

{get $oversized_products from "/products" [
    oversized => true,
    active => true

There are {$oversized_products.count} active oversized products

These fields may or may not be known by the e-commerce API, but in this case MongoDB’s query syntax finds only the documents with matching fields.

No more relational complexity

For those who spent years writing relational SQL queries, this is a big change. How do we create data relationships without joins? There are many different strategies, but Forward defines a field as either a static value or a callback method. This allows a field to return another document or collection based on a query. The result is a data model that can walk through relationships without joins. For example (PHP):

// class Accounts extends AppModel
$this->fields => array(
    'orders' => function ($order) {
        return get("/orders", array('account_id' => $account['id']));

This relationship can be used in a template like this:

{get $account from "/accounts/$session.account_id"}

You’ve placed

    {foreach $account.orders as $order}
            <td>{$order.items|count} item(s)</td>

Relationships can be defined by simple or complex queries. Results are lazy-loaded, making this example possible:

{get $order from "/orders/123"}

{$} placed {$order.account.orders.count} orders since {$order.account.orders.first.date_created|date_format}

// Output: John Smith placed 3 orders since Jun 14, 2012

What about transactions?

Many people bring up MongoDB’s lack of atomic transactions across collections as evidence that it’s not suitable for e-commerce applications. This has not been a significant barrier in our experience so far.

There are other ways to approach data integrity. In systems with low-moderate data contention, optimistic locking is sufficient. We’ll share more details about these strategies as things progress.

In conclusion

The future of e-commerce software looks bright with MongoDB. It’s time to blaze new trails where convoluted schemas, complex relational queries, and hair raising database migrations are a thing of the past. If you’re interested in working with Forward before public release, please consider joining the private beta and help us reinvent open source e-commerce as the world knows it.

A guest post from Eric Ingram, developer/founder @getfwd

MongoDB Sharding Visualizer

Sep 14 • Posted 2 years ago

We’re happy to share with you the initial release of the MongoDB sharding visualizer. The visualizer is a Google Chrome app that provides an intuitive overview of a sharded cluster. This project provides an alternative to the printShardingStatus() utility function available in the MongoDB shell.


The visualizer provides two different perspectives of the cluster’s state.

The collections view is a grid where each rectangle represents a collection. Each rectangle’s area is proportional to that collection’s size relative to the other collections in the cluster. Inside each rectangle a pie chart shows the distribution of that collection’s chunks over all the shards in the cluster.

The shards view is a bar graph where each bar represents a shard and each segment inside the shard represents a collection. The size of each segment is relative to the other collections on that shard.

Additionally, the slider underneath each view allows rewinding the state of the cluster. select and view the state of the cluster at a specific time.


To install the plugin, download and unzip the source code from 10gen labs. In Google Chrome, go to Preferences > Extensions, enable Developer Mode, and click “Load unpacked extension…”. When prompted, select the “plugin” directory. Then, open a new tab in Chrome and navigate to the Apps page and launch the visualizer.


We very much look forward to hearing feedback and encourage everyone to look at the source code which is available .

Perl Driver 0.46.1 Released

Sep 5 • Posted 2 years ago

This was originally posted to Mike Friedman’s personal blog

I’m happy to announce that after a long delay, version 0.46.1 of the Perl MongoDB driver has now been uploaded to CPAN, and should be available on your friendly local CPAN mirror soon.

This release is mostly a series of minor fixes and housekeeping, in preparation for developing a more detailed roadmap for more frequent releases down the line. Here’s what’s new so far:

Most of the distribution has been successfully transitioned to Dist::Zilla for automated building, tagging, and releasing to CPAN. This has vastly reduced the amount of effort needed to get releases out the door.

The behind-the-scenes algorithm for validating UTF-8 strings has been replaced with a more compliant and much faster version. Thanks to Jan Anderssen for contributing the fix.

Serialization of regexes has been improved and now supports proper stripping of unsupported regex flags across all recent Perl versions. Thanks to Arkadiy Kukarkin for reporting the bug and @ikegami for help with figuring out how to serialize regexes properly via the Perl API.

The driver will now reject document key names with NULL bytes, a possible source of serious bugs. Additionally, much of the distribution metadata has been cleaned up, thanks to the automation provided by Dzil. In particular, the official distribution repository and bug-tracker links now point to our GitHub and JIRA sites. Hopefully more bugs will now come in via those channels instead of RT.

Looking ahead, there is a lot of work yet to be done. I have prioritized the following tasks for version 0.47, which should help us moving forward to an eventual 1.0 release.

  • Eliminating the dependency on Module::Install
  • Significantly re-working the documentation to include better organization and more examples.
  • Additionally, much of the current documentation will be refactored via Pod::Weaver.
  • Replacing AUTOLOADed database and collection methods with safer generated symbols upon connection. Beginning with 0.48, these will have a deprecation warning added and will be removed entirely before the 1.0 release in favor of the get_database and get_collection methods. The docs will be updated to reflect this change.

I’m very excited about the future of MongoDB support for Perl, and looking forward to improving the CPAN distribution in concert with the Perl community!

Mike Friedman is the Perl Engineer and Evangelist at 10gen, working on the Perl Driver for MongoDB. You can follow his blog at

Motor: Asynchronous Driver for MongoDB and Python

Sep 5 • Posted 2 years ago

Tornado is a popular asynchronous Python web server. Alas, to connect to MongoDB from a Tornado app requires a tradeoff: You can either use PyMongo and give up the advantages of an async web server, or use AsyncMongo, which is non-blocking but lacks key features.

I decided to fill the gap by writing a new async driver called Motor (for “MOngo + TORnado”), and it’s reached the public alpha stage. Please try it out and tell me what you think. I’ll maintain a homepage for it here, including basic documentation.


Motor is alpha. It is certainly buggy. Its implementation and possibly its API will change in the coming months. I hope you’ll help me by reporting bugs, requesting features, and pointing out how it could be better.


Two good projects, AsyncMongo and APyMongo, took the straightforward approach to implementing an async MongoDB driver: they forked PyMongo and rewrote it to use callbacks. But this approach creates a maintenance headache: now every improvement to PyMongo must be manually ported over. Motor sidesteps the problem. It uses a Gevent-like technique to wrap PyMongo and run it asynchronously, while presenting a classic callback interface to Tornado applications. This wrapping means Motor reuses all of PyMongo’s code and, aside from GridFS support, Motor is already feature-complete. Motor can easily keep up with PyMongo development in the future.


Motor depends on greenlet and, of course, Tornado. It is compatible with CPython 2.5, 2.6, 2.7, and 3.2; and PyPy 1.9. You can get the code from my fork of the PyMongo repo, on the motor branch:

pip install tornado greenlet pip install git+ To keep up with development, watch my repo and do

pip install -U git+ when you want to upgrade.


Here’s an example of an application that can create and display short messages:

Other examples are Chirp, a Twitter-like demo app, and Motor-Blog, which runs this site.

Support For now, email me directly if you have any questions or feedback.

Roadmap In the next week I’ll implement the PyMongo feature I’m missing, GridFS. Once the public alpha and beta stages have shaken out the bugs and revealed missing features, Motor will be included as a module in the official PyMongo distribution.

A. Jesse Jiryu Davis

August MongoDB Releases and Blogroll

Sep 2 • Posted 2 years ago

This August saw a number of new MongoDB releases, including MongoDB 2.2 and compatible driver releases

Blog posts on MongoDB 2.2

Noteworthy Blog Posts of the Month

_Have a blog post you’d like to be included in our next update? Send us a note

MongoDB 2.2 Released

Aug 29 • Posted 2 years ago

We are pleased to announce the release of MongoDB version 2.2.  This release includes over 1,000 new features, bug fixes, and performance enhancements, with a focus on improved flexibility and performance. For additional details on the release:

New Features

Aggregation Framework

The Aggregation Framework is available in its first production-ready release as of 2.2. The aggregation framework makes it easier to manipulate and process documents inside of MongoDB, without needing to useMap Reducez,/span>, or separate application processes for data manipulation.

See the aggregation documentation for more information.

Additional “Data Center Awareness” Functionality

2.2 also brings a cluster of features that make it easier to use MongoDB for larger more geographically distributed contexts. The first change is a standardization of read preferences across all drivers and sharded (i.e. mongos) interfaces. The second is the addition of “tag aware sharding,” which makes it possible to ensure that data in a geographically distributed sharded cluster is always closest to the application that will use that data the most.

Improvements to Concurrency

v2.2 eliminates the global lock in the mongod process.  Locking is now per database.  In addition a new subsystem avoids locks under most page-fault events; thus concurrency improves even on systems with a single database.   Parallelism in application of writes on secondaries is enhanced also.  See this video for more details.

We’re looking forward to your feedback on 2.2. Keep the Jira Issues, blog posts, user group posts, and tweets coming.

- Eliot and the 10gen/MongoDBteam

Like what you see? Get MongoDB updates straight to your inbox

Hosting and Developing the HTML5 Game Cobalt Calibur with MongoDB, Node.js and OpenShift

Aug 22 • Posted 2 years ago

This was originally posted on the OpenShift blog by Thomas Hunter.

So, you’re interested in getting the HTML5 Game Cobalt Calibur hosted for free? Look no further, Red Hat’s OpenShift can do that for you. Follow this guide and you’ll be up and running in no time. Cobalt Calibur is a multiplayer browser-based game which uses a bunch of HTML5 features to run on the frontend, and requires a Node.js and MongoDB server on the backend. Luckily OpenShift will satisfy these requirements for you.

The first thing you’ll want to do is create an OpenShift account. It’s quite easy and painless, I promise. Once you’re done getting it setup, be sure to click any email validation links and then log in to the website.

Once you’ve got your account setup, you’re going to want to create an SSH key for your computer (if you haven’t done so previously). To create your SSH key, you will want to open up a Terminal emulator and run some commands. These commands should work fine for both OS X and Linux computers. If you’ve already got an SSH key (which you should if you’re a GitHub user), you can skip these steps.

If you’re on a Mac, you’ll want to go to your list of applications and run Terminal. You can get to this app quickly by pressing Cmd+Space, typing in Terminal, and pressing enter.

Below is what your terminal window will end up looking like. You’ll want to type the command ssh-keygen -t rsa, and press enter. You will then be prompted a few questions; just leave everything blank and keep hitting enter.

$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/USERNAME/.ssh/id_rsa): <press enter>
Created directory '/home/USERNAME/.ssh'.
Enter passphrase (empty for no passphrase): <press enter>
Enter same passphrase again: <press enter>
Your identification has been saved in /home/USERNAME/.ssh/id_rsa.
Your public key has been saved in /home/USERNAME/.ssh/

Congrats, you’ve now got an SSH public/private key. This is a file which can be used to prove to a remote computer that you are who you say you are. We need to give a copy of this file to OpenShift so that you can use git to push changes to your code to them.

To get a copy of your key file, you’ll want to copy the text from ~/.ssh/ You can run the command

cat ~/.ssh/ which will display the contents of that file to your screen. Select the text and copy the output into your clipboard (everything from ssh-rsa to the username@hostname part):

$ cat ~/.ssh/ 

Once you’ve copied that text, on the OpenShift website, click My Account > Add a new key to visit the Add a Public Key page, and paste the contents of the output into the big text box. In the small text box above it you can name your key (such as Living Room Desktop or Developer MacBook). You’ll want to use a descriptive name, because if your key is ever compromised, you’ll want to know which one to disable.

Now, click the Create button. OpenShift is now aware of your SSH public key, and you can interact with the git server they provide without problems. Feel free to repeat this process from other machines you plan on working from.

If you get an error when you save the key, you might not have copied the whole thing. If so, you might need to open it in an editor. On a Mac try Open ~/.ssh/, and on Linux, you might try gedit ~/.ssh/

Now it is time to create our OpenShift application. To do this, visit the Create Application page from the main OpenShift navigation. On this page, you will see a big list of all the types of applications supported by OpenShift. Scroll down until you see the Node.js option and select that.

On the next screen you will be prompted for some very basic information. Specifically, you will be asked to name your application. Since we are uploading the Cobalt Calibur engine, it makes sense to name it something like cobaltcalibur.

You will also be prompted to create a “namespace” for your account. This is basically a way to associate all of your app URLs with your account. This is so that multiple people can have the same apps named “cobaltcalibur” without stepping on each others toes. I already entered a namespace name before, so I didn’t need to this time.

After you click Create Application, OpenShift will work it’s cloud magic behind the scenes. During this time it is probably creating some DNS entries, copying some skeleton files, creating a git repository, the works. After the process is done, you will be taken to a new screen:

If you like, you can click the blue link to see the skeleton application OpenShift has created for you. It will be a pretty boring, static page which is displayed by a very simple Node.js app.

What you will want to do though is copy the commands in green and paste them into your terminal. This will pull the skeleton code from your applications git repository and make a local copy. There are some (probably) important files in here that we will want to keep.

If you see the same listing of files, then congratulations, you’ve checked out your application from OpenShift.

Now that you’ve got your application created and checked out, we want to add MongoDB support to the application. OpenShift calls these Cartridges.

To add MongoDB, first browse to the All Applications page, and then click the title of the application you created:

On this screen you can see the information for accessing your git repository again, but more importantly, there is a big Add Cartridge button.

Click that big blue button, and the next screen will prompt you for the type of cartridge to be added. Click the MongoDB option:

Once you do this, it will prompt you to make sure you want to add MongoDB. Click the Add Cartridge button again, and after some processing happens in the background it will be added to your application. You will want to copy all of the information you are provided with on this screen, notably the user, password, database, and connection URL which contains the IP address and port number for the database. We’ll give this information to the Cobalt Calibur game later on.

Now that we’ve got the MongoDB cartridge added to our application, we want to actually start the MongoDB server. To do this, you will first need to install the rhc command line utility. You’ll want to follow steps 1 and 2 on that page, you can ignore the other steps. The rhc utility gives you more control over your OpenShift applications that the website does, and is needed to start up the MongoDB server. Run the command rhc app cartridge start -a APPNAME -c mongodb-2.0 and this will start the server for you:

You are now ready to download the Cobalt Calibur source code, configure it to work with your OpenShift account, and upload it to the server. To do this, browse to the Cobalt Calibur GitHub page and simply download the ZIP file.

Extract it to the same folder that the Node.js application was checked out into. This will overwrite the index.html page, the server.js file, and the node_modules/ folder; that is all fine.

Now, it’s time to update the server.js file so that it is able to connect to your MongoDB daemon, as well as bind to the proper ip address and port number that OpenShift requires. You can open up server.js in whatever your favorite editor is. Here is what the old code looks like:

// Web Server Configuration
var server_port = 80; // most OS's will require sudo to listen on 80
var server_address = '';

// MongoDB Configuration
var mongo_host = '';
var mongo_port = 27017;
var mongo_req_auth = false; // Does your MongoDB require authentication?
var mongo_user = 'admin';
var mongo_pass = 'password';
var mongo_collection = 'terraformia';

And here is what you will want to change it to:

// Web Server Configuration
var server_port = process.env.OPENSHIFT_INTERNAL_PORT; // most OS's will require sudo to listen on 80
var server_address = process.env.OPENSHIFT_INTERNAL_IP;

// MongoDB Configuration
var mongo_host = 'MONGO IP ADDRESS';
var mongo_port = 27017;
var mongo_req_auth = true; // Does your MongoDB require authentication?
var mongo_user = 'admin';
var mongo_pass = 'MONGO PASSWORD';
var mongo_collection = 'MONGO DATABASE NAME';

Notice how OpenShift provides some environment variables for the web server port and ip address. It might also provide these same variables for the mongo connection, but I didn’t see this information.

The application is now configured properly. You’ll want to now add your files to git, commit the files into git, and push your changes to the server.

git add -A .
git commit -m "Adding Cobalt Calibur files"
git push

You’ll see a bunch of messages from all of the git hooks performing various actions, this is probably a good thing.

Now, if you browse to the URL for your game instance and refresh the page, it should load for you. If not, you might need to run the following command (I needed to for some reason):

rhc app restart -a game

Congratulations, you’ve now got your own personal instance of Cobalt Calibur running on OpenShift for free!

There is one big bug with OpenShift though, they don’t support websockets yet. My guess is that the different apps are hosted in a shared environment, and each application gets one port number to the outside world. Websockets require a bunch of random high ports for different clients, so this doesn’t really work with the shared host environment. Luckily, will fallback to using long-polling AJAX. The game doesn’t always run perfectly under these conditions, e.g. the monsters or corruption might no load. OpenShift is planning on adding this feature sooner or later, you can vote on it in the mean time.

Thomas Hunter is an evented Node.js hacker transitioning from the world of request/response PHP web development, building everything from hardware control software to traditional web apps. Follow him on Twitter at @tlhunter.

Getting going quickly with Python, MongoDB, and Spatial data on OpenShift: Part II

Aug 18 • Posted 2 years ago

This post originally appeared on the OpenShift blog

As a follow up to my last post about getting spatial going in MongoDB on OpenShift, today we are going to put a web service in front of it using Python. There are several goals for this article:

  • Learn a little bit about Flask - a Python web framework
  • Learn about how to connect to MongoDB from Python
  • Create a REST Style web service to use in our SoLoMo application

I hope by the end you can see how using a Platform as a Service can get you going with Python, MongoDB, and Spatial faster than you can say…“Awesome Sauce”. We have a lot of ground to cover so let’s dig right in.

Creating the Python application

Here is OpenShift the command line to create the Python app

rhc app create -t python-2.6 -a pythonws 

Using the flask quickstart from GitHub

We have already put together a flask quickstart in the openshift github space. To get the framework into your application all you have to do is (from the

cd pythonws git remote add upstream -m master git:// git pull -s recursive -X theirs upstream master 

There we now have a flask app that we can modify source code.

If you want to just check out the source code I used in the app you can see it on Github and follow the instructions to clone it into your OpenShift account

Adding MongoDB and importing data

Time to add MongoDB to our application:

 rhc app cartridge add -a pythonws -t mongodb-2.0 

The previous post in this series will cover how to import the data from a JSON file of the national parks into your mondodb database and prepare it for spatial queries. Please follows those instructions to import the data into the pythonws DB into a collection called parkpoints.

Quick digression to explain Flask

Before we get into our specific application I am going to take a moment to explain the Python framework for this demo. Flask basically allows you to map URL patterns to methods (it also does a lot more, like templating, but this is the only part we are using today). For example, in the file that is now in your project you can find the line: @route(‘/’) def index(): return ‘Hello World!

This says that when a request comes in for the base URL, the function named

index gets executed. In this case the function just returns the string “Hello World!” and returning has the effect of sending the string to the requestor. @route(‘/name/’) def nameindex(name=’Stranger’): return ‘Hello, %s!’ % name

We can also grab pieces of the requested URL and pass it into the function. By enclosing a part of the URL in a < >, it indicates that we want to access it within our function. Here you can see where if the url looks like:

Then the response will be Hello, steve!

Or the URL could be

Hello, Stranger!

We are going to define URL mappings for some basic REST like functionality to interact with our spatial MongoDB data store.

Modify the source code

The first function we are going to write will be to just simply return all the records in the database. In a more full featured app you would probably want to add pagination and other features to this query but we won’t be doing that today.@app.route(“/ws/parks”) def parks(): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #query the DB for all the parkpoints result = db.parkpoints.find() #Now turn the results into valid JSON return str(json.dumps({'results':list(result)},default=json_util.default)) 

I chose to put the web services under the url /ws/parks so that we could use other parts of the URL namespace for other functionality. You can now go to your application URL ( and you should be able to see all the documents in the DB.

Using MongoDB in Python

In the code above we simply make a connection to the MongoDB instance for this application and then execute a query. The pymongo package provides all the functionality to interact with the MongoDB instance from our Python code. The pymongo commands are very similar to the MongoDB command line interaction except two word commands like db.collection.findOne are split with a _, such as db.collection.find_one. Please go to the pymongo site to read more about the documentation.

Notice we use the environment variables to specify the connection URL. While not hard coding database connection parameters is good practice in non-cloud apps, in our case you MUST use the environment variables. Since your app can be idled and then spun up or it could be autoscaled, the IP and ports are not always guaranteed. By using the environment variables we make our code portable.

We pass the result set (which comes back as a Python dictionary) into json.dump so we can return JSON straight to the client. Since pymongo is returning the results in UTF and we want just plain text, we need to pass the json_util.default from the bson library into the json.dump command.

This is probably the easiest experience I have ever had writing a web service. I love Flask, Pymongo, and Python for the simplicity of “Just Getting Stuff Done”.

Grab just one park

Next we will implement the code to get back a park given a parks uniqueID. For ID we will just use the ID generated by MongoDB on document insertion (_id). The ID looks like a long random sequence and that is what we will pass into the URL.

return a specific park given it’s mongo _id

@app.route(“/ws/parks/park/”) def onePark(parkId): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #query based on the objectid result = db.parkpoints.find({'_id': objectid.ObjectId(parkId)}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) 

Here you have to use another class from the bson library - ObjectID. The actual ObjectID in MongoDB is an object and so we have to take the ID passed in on the url and create an Object from it. The ObjectID class allows us to create one of these objects to pass into the query. Other than that the code is the same as above.

This little snippet also shows an example of grabbing part of the URL and passing it to a function. I explained this concept above but here we can see it in practice.

Time for the spatial query

Here we do a query to find national parks near a lattitude longitude pair

find parks near a lat and long passed in as query parameters (near?lat=45.5&lon=-82)

@app.route(“/ws/parks/near”) def near(): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #get the request parameters lat = float(request.args.get('lat')) lon = float(request.args.get('lon')) #use the request parameters in the query result = db.parkpoints.find({"pos" : { "$near" : [lon,lat]}}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) 

This piece of code shows how to get request parameters from the URL. We capture the lat and lon from the request url and then cast them to floats to use in our query. Remember, everything in a URL comes across as a string so it needs to be converted before being used in the query. In a production app you would need to make sure that you were actually passed strings that could be parsed as floating point numbers. But since this app is just for demo purposes I am not going to show that here.

Once we have the coordinates, we pass them in the the query just like we did from the command line MongoDB client. The results come back in distance order from the point passed into the query. Remember, the ordering of the coordinates passed into the query need to match the ordering of the coordinates in your MongoDB collection.

Finish it off with a Regex query with spatial goodness

The final piece of code we are going to write allows for a query based both on the name and the location of interest.

find parks with a certain name (using regex) near a lat long pair such as above

@app.route(“/ws/parks/name/near/”) def nameNear(name): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #get the request parameters lat = float(request.args.get('lat')) lon = float(request.args.get('lon')) #compile the regex we want to search for and make it case insensitive myregex = re.compile(name, re.I) #use the request parameters in the query along with the regex result = db.parkpoints.find({"Name" : myregex, "pos" : { "$near" : [lon,lat]}}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) 

Just like the example above we parse out the lat and lon from the URL query parameters. In looking at my architecture I do think it might have been better to add the name as a query parameter as well, but this will still work for this article. We grab the name from the end of the URL path and then compile it into a standard Python regular expression (regex). I added the re.I to make the regex case-insenstive. I then use the regex to search against the Name field in the document collection and do a geo search against the pos field. Again, the results will come back in distance order from the point passed into the query.


And with that we have wrapped up our little web service code - simple and easy using Python and MongoDB. Again, there are some further changes required for going to production, such as request parameter checking, maybe better URL patterns, exception catching, and perhaps a checkin URL - but overall this should put you well on your way. There are examples of:

  • Using Flask to write some nice REST style services in Python
  • Various methods to get URL information so you can use it in your code
  • How to interact with your MongoDB in Python using PyMongo and BSON libraries
  • Getting spatial data out of your application

Give it all a try on OpenShift and drop me a line to show me what you built. I can’t wait to see all the interesting spatial apps built by shifters.

Designing MongoDB Schemas with Embedded, Non-Embedded and Bucket Structures

Aug 10 • Posted 2 years ago

This was originally posted to the Red Hat OpenShift blog

With the rapid adoption of schema-less, NoSQL data stores like MongoDB, Cassandra and Riak in the last few years, developers now have the ability enjoy greater agility when it comes to their application’s persistence model. However, just because a datastore is schema-less, doesn’t mean the structure of the stored documents won’t play an important role in the overall performance and resilience of the application. In this first, of a four part blog series about MongoDB we’ll explore a few strategies you should consider when designing your document structure.

Application requirements should drive schema design

If you ask a dozen experienced developers to design the relational database structure of an application, such as a book review site, it’s likely that each of the structures will be very similar. You’ll likely see tables for authors, books, commenters and comments and so on.. The likelihood of having varied relational structures is small because relational database structures are generally well understood. However, if you ask dozen experienced NoSQL developers to create a similar structure, you’re likely to get a dozen different answers.

Why is there so much variability when it comes to designing a NoSQL schema? To optimize application performance and reliability, a NoSQL schema must be driven by the application’s use case. It’s a novel idea, but it works. Luckily, there are only a few key factors you need to understand when deriving your schema from application requirements. These factors include: • How your documents reference children collections • The structure and the use of indexes • How your data will be sharded

Elements of MongoDB Schemas

Of these factors, how your documents reference child collections, or embedding, is the most important decision you need to make. This point is best demonstrated with an example.

Suppose we’re building the book review site as we mentioned in the introduction. Our application will have authors and books, as well as reviews with threaded comments. How should we structure the collections? Unfortunately, the answers depend on the number of comments we’re expecting per book and how frequently comments are read vs. written. Let’s look at our possible use cases.

The first possibility is were we’re only going to have a few dozen reviews per book, and each review is likely to have a few hundred comments. In this case, embedding the reviews and comments with the book is a viable possibility. Here’s what that might look like:

Listing 1 – Embedded

// Books { “_id”: ObjectId(“500c680c1fe9193b67b898a3”), “publisher”: “O’Reilly Media”, “isbn”: “978-1-4493-8156-1”, “description”: “How does MongoDB help you…”, “title”: “MongoDB: The Definitive Guide”, “formats”: [“Print”, “Ebook”, “Safari Books Online”], “authors”: [{ “lastName”: “Chodorow”, “firstName”: “Kristina” }, { “lastName”: “Dirolf”, “firstName”: “Michael” }], “pages”: “210” }

// Reviews { “_id”: ObjectId(“500c680c1fe9193b67b898a4”), “rating”: 5, “description”: “The Authors made an excellent work…”, “title”: “One of O’Reilly excellent books”, “created”: ISODate(“2012-07-04T09:48:17Z”), “book_id”: { “$ref”: “books”, “$id”: ObjectId(“500c680c1fe9193b67b898a3”) }, “reviewer”: “Giuseppe” }

// Comments { “_id”: ObjectId(“500c680c1fe9193b67b898a5”), “comment”: “This review helped me choose the correct book.”, “commenter”: “Nick”, “review_id”: { “$ref”: “reviews”, “$id”: ObjectId(“500c680c1fe9193b67b898a4”) }, “created”: ISODate(“2012-07-20T13:15:37Z”) }

While simple, this method does have some trade-offs. First, our reviews and comments are strewn throughout the disk. We’re potentially loading thousands of documents to display a page. This leads us to another common embedding strategy – “buckets”.

By bucketing review comments, we can maintain the benefit of fewer reads to display substantial amounts of content, while at the same time maintaining fast writes to smaller documents. An example of a bucketed structure is presented below:

Figure 1 – Hybrid Structure

In this example, the bucket, or hybrid, structure breaks the comments into chunks of roughly 100 comments. Each comment collection maintains a reference to the parent review, as well as its page and current number of contained comments.

Of course, as software developers, we’re painfully aware there’s no free lunch. The downside to buckets is the increased complexity your application has to deal with. The previous strategies were trivial to implement from an application perspective, but suffered from inefficiencies at scale. Buckets address these inefficiencies, but your application has to do a bit more bookkeeping, such as keeping track of the number of comment buckets for a given review.


My own personal projects with MongoDB have used each one of these strategies at one point or another, but I’ve always grown into more complicated strategies from the most basic, as the application requirements changed. One of the benefits of MongoDB is the ability to change your storage strategy at will and you shouldn’t be afraid to take advantage of this flexibility. By starting simple, you can maintain development velocity early and migrate to a more scalable strategy as the need arises. Stay tuned for additional blogs in this series covering the use of MongoDB indexes, sharding and replica sets.

If you are interested in experimenting with a few of the concepts without having to download and install MongoDB, try in on Red Hat’s OpenShift. It’s FREE to sign up and all it takes is an email and your minutes from having a MongoDB instance running in the cloud.


Introducing Mongo Connector

Aug 10 • Posted 2 years ago

MongoDB is a great general purpose data store, but for some workflows, you may want to use another tool or integrate data from MongoDB into another system. To address this common interest, we built Mongo Connector, which is a generic connection system that you can use to integrate MongoDB with another system with simple CRUD operational semantics (i.e. insert, update, delete, and search operations.)

Consider the following use cases for this system, which could include:

  • Connecting MongoDB to search engines for more advanced search.
  • Creating a secondary, backup MongoDB cluster that uses Mongo Connector to keep both clusters in sync.
  • Storing specific collections or specific information in other, possibly relational, database systems.
  • Connecting MongoDB to integration platforms such as Mule
  • Dumping your data from MongoDB to any other storage systems, with support to stop and restart the dump at any point.

On startup, Mongo Connector copies your documents from MongoDB to your target system. Afterwards, it constantly performs updates on the target system to keep MongoDB and the target in sync. The connector supports both Sharded Clusters and standalone Replica Sets, hiding the internal complexities such as rollbacks and chunk migrations. Mongo Connector abstracts the MongoDB internals so you only have to implement one class: the DocManager.

The DocManager is a simple, lightweight, and most importantly, simple to write class that defines a limited number of CRUD operations for the target system. The DocManager API explains what functions must be implemented, and Mongo Connector uses those functions to link up MongoDB and the target system.

For the first release, we have implementations of the Doc Manager for Solr, ElasticSearch and, of course, MongoDB (if you want to connect your MongoDB to another MongoDB instance).

To install Mongo Connector, issue the following command at your systems shell:

pip install mongo-connector

After that, start the Mongo Connector. For example, suppose there is a Sharded Cluster with a mongos running on localhost:27217, a Solr search server running on localhost:8080, and the Solr access URL being http://localhost:8080/solr. Then, use the following command to have Mongo Connector sync the MongoDB cluster with Solr:

python -m localhost:27217 -t http://localhost:8080/solr

The connector will start syncing the data to the Solr connection at http://localhost:8080/solr

Check out our github repo for requests for new doc managers, bug reports, and documentation on Mongo Connector:

About us: Mongo Connector was designed, coded, tested, packaged, and released by Leonardo Stedile and Aayush Upadhyay, two of 10gen’s summer interns. Special thanks to Spencer Brody and Randolph Tan, our two mentors. We hope you find Mongo Connector useful, and that it helps you build awesome things with MongoDB.

MacOSX Preferences Pane for MongoDB

Aug 7 • Posted 2 years ago

This is a guest post from

RémySAISSY of OCTOTechnology

In my work as a developer, I keep a full development environment with several MongoDB instances and data sets on mylaptop. As an OS X user, I love having beautiful and efficient applications to do everything.

Today,I have the pleasure to announce the release of the MacOSX Preferences Pane for MongoDB.

What is it for?

The MacOSX preferences pane for MongoDB aims to provide a simple and efficient user interface to control the status of a local MongoDB server, just like the MySQL Preferences Pane.

My focus has been on simplicity, and it has the following features:

  • It runs on MacOSX Snow Leopard, Lion and Moutain Lion
  • You can manually start and stop the MongoDB server from your system control panel.
  •  You can configure MongoDB to start and stop automatically with your system.

 If use Homebrew, and you have customized your system’s launchd plist, the MacOSX Preferences pane for MongoDB will:

  • migrate your exiting launchd configuration for use with the preferences pane
  • keep all launchd configurations your customizations through a;; enable/disable cycles 

To prevent upgrade issues from taking time and attention the preference pane comes with an automatic update mecanism. Once a new version has been installed, the preferences pane will simply ask you to restart your preferences pane to start using the new version.


Sounds good but I am not an English speaker

The preferences pane for MongoDB comes in several languages :

  •  English
  • French
  • Simplified Chinese
  • Spanish
  • Brazilian/Portugese

Feel free to contribute by adding a new language!


Since it is only a preferences pane, it does not embed a MongoDB Server. Therefore, the first thing you have to do is installing MongoDB.

A simple way to accomplish this is to use Homebrew:

$brew install mongodb


TheMongoDB Preferences Pane is available on Github:

    1. Download the latest version:
    2. Unzip
    3. Double click on MongoDB.prefPane

That’s all.

I hope this will be useful. Do not hesitate to contribute and send me your feedback!

MongoDB Blogroll: The Best of July 2012 

Aug 2 • Posted 2 years ago

Every month, we’ll be publishing the best community blog posts from the month. Here is the digest for July:

Want your blog post to be included in the next update? Tweet it out with the #mongodb hashatag or send it to us directly

MongoDB on Windows Azure

Jul 19 • Posted 2 years ago

This post originally appeared on the Microsoft Interoperability Blog.  

Do you need to build a high-availability web application or service? One that can scale out quickly in response to fluctuating demand? Need to do complex queries against schema-free collections of rich objects? If you answer yes to any of those questions, MongoDB on Windows Azure is an approach you’ll want to look at closely.

People have been using MongoDB on Windows Azure for some time (for example), but recently the setup, deployment, and development experience has been streamlined by the release of the MongoDB Installer for Windows Azure. It’s now easier than ever to get started with MongoDB on Windows Azure!


MongoDB is a very popular NoSQL database that stores data in collections of BSON (binary JSON) objects. It is very easy to learn if you have JavaScript (or Node.js) experience, featuring a JavaScript interpreter shell for administrating databases, JSON syntax for data updates and queries, and JavaScript-based map/reduce operations on the server. It is also known for a simple but flexible replication architecture based on replica sets, as well as sharding capabilities for load balancing and high availability. MongoDB is used in many high-volume web sites including Craigslist, FourSquare, Shutterfly, The New York Times, MTV, and others.

If you’re new to MongoDB, the best way to get started is to jump right in and start playing with it. Follow the instructions for your operating system from the list of Quickstart guides on, and within a couple of minutes you’ll have a live MongoDB installation ready to use on your local machine. Then you can go through the tutorial to learn the basics of creating databases and collections, inserting and updating documents, querying your data, and other common operations.

MongoDB Installer for Windows Azure

The MongoDB Installer for Windows Azure is a command-line tool (Windows PowerShell script) that automates the provisioning and deployment of MongoDB replica sets on Windows Azure virtual machines. You just need to specify a few options such as the number of nodes and the DNS prefix, and the installer will provision virtual machines, deploy MongoDB to them, and configure a replica set.

Once you have a replica set deployed, you’re ready to build your application or service. The tutorial How to deploy a PHP application using MongoDB on Windows Azure takes you through the steps involved for a simple demo app, including the details of configuring and deploying your application as a cloud service in Windows Azure. If you’re a PHP developer who is new to MongoDB, you may want to also check out the MongoDB tutorial

Developer Choice

MongoDB is also supported by a wide array of programming languages, as you can see on the Drivers page of The example above is PHP-based, but if you’re a Node.js developer you can find a the tutorialNode.js Web Application with Storage on MongoDB over on the Developer Center, and for .NET developers looking to take advantage of MongoDB (either on Windows Azure or Windows), be sure to register for the free July 19 webinar that will cover the latest features of the MongoDB .NET driver in detail.

The team at Microsoft Open Technologies is looking forward to working closely with 10gen to continue to improve the MongoDB developer experience on Windows Azure going forward. We’ll keep you updated here as that collaboration continues!

blog comments powered by Disqus