







We are pleased to announce the initial release of Edda. Edda is a tool for MongoDB that takes mongod log files and generates easy-to-parse pictures of the represented servers.

Edda showing a five-member set with replication paths and member states.
MongoDB servers generate some pretty substantial log files. These lengthy logs are one of the more important tools we have for diagnosing issues with MongoDB servers. However, correlating logs from multiple servers can be time-consuming. Enter Edda, a log visualizer for MongoDB. We hope that this tool will be helpful to MongoDB administrators.

Possible states represented.
For its first release, we focused on visualizing replica sets with Edda. We plan to support visualizing logs from sharded clusters in the future.

A three-member set with one primary, one secondary, and one down node.
Want to try Edda? Install it with pip!
$ pip install edda
Then run Edda from the command line, giving one or more log files for it to parse:
$ edda server1.log server2.log server3.log
Edda requires a mongod to be running. Once Edda has parsed the logs, it will pop up a browser window with a timeline of the events.

You can run Edda on any subset of log files available. This is an example of running Edda on one log file from a seven-member replica set.
Check out our Github repo for feature requests, bug reports, and further documentation on Edda: https://github.com/kchodorow/edda
A bit about the team: Edda was designed, coded, tested, packaged, and released by Samantha Ritter and Kaushal Parikh, two of 10gen’s summer interns. We are so happy to have the chance to build a tool for MongoDB and see it through its first release.
This is a guest post from Rick Copeland of Arborian.
Ming is a Python toolkit providing schema enforcement, an object/document mapper, an in-memory database, and various other goodies developed at SourceForge during our rewrite of the site from a PHP/Postgres stack to a Python/MongoDB one.
Why Ming?If you’ve come to MongoDB from the world of relational databases, you have probably been struck by just how easy everything is: no big object/relational mapper needed, no new query language to learn (well, maybe a little, but we’ll gloss over that for now), everything is just Python dictionaries, and it’s so, so fast! While this is all true to some extent, one of the big things you give up with MongoDB is structure.
MongoDB is sometimes referred to as a schema-free database. (This is not technically true; I find it more useful to think of MongoDB as having dynamically typed documents. The collection doesn’t tell you anything about the type of documents it contains, but each individual document can be inspected.) While this can be nice, as it’s easy to iterate on your schema quickly in development, it’s also easy to get yourself in trouble the first time your application tries to query by a field that only exists in some of your documents.
The fact of the matter is that even if the database cares nothing about your schema, your application does, and if you play too fast and lose with document structure, it will come back to haunt you in the end. At SourceForge, we created Ming (as in “…the Merciless”, the villan who ruled the planet Mongo in Flash Gordon) to deal with precisely this problem. We wanted a (thin) layer on top of PyMongo that would do a couple of things for you:
Ming’s architecture is based on the excellent SQL toolkit SQLAlchemy. While much younger than SQLAlchemy and not including any of its code, MongoDB takes its design inspiration from there.
Ming actually consists of a number of components, including:
pymongo driver used for testing your application without needing to have access to a MongoDB server.Let’s take a look at each of these components in turn…
Ming Schema EnforcementA Ming schema is fairly straightforward. Below is an example containing the schema for a blog post in both the imperative and declarative syntaxes:
from ming import collection, Field, Session
from ming import schema as S
session = Session() # ming abstraction for database
# Set up the User schema ahead-of-time
User = dict(username=str, display_name=str)
# "Imperative" style
BlogPost = collection(
'blog.posts', session,
Field('_id', S.ObjectId),
Field('posted', datetime, if_missing=datetime.utcnow),
Field('title', str),
Field('author', User),
Field('text', str),
Field('comments', [
dict(author=User,
posted=S.DateTime(if_missing=datetime.utcnow),
text=str) ]))
# "Declarative" style
from ming.declarative import Document
class BlogPost(Document):
class __mongometa__:
session=session
name='blog.posts'
indexes=['author.name', 'comments.author.name']
_id=Field(str)
title=Field(str)
posted=Field(datetime, if_missing=datetime.utcnow)
author=Field(User)
text=Field(str)
comments=Field([
dict(author=User,
posted=datetime,
text=str) ])
Once you have your schema set up, you can use it to perform all the same operations you can do in pymongo using the manager object attached to the attribute m:
# Bind the session to the database
from ming.datastore import DataStore
session.bind = DataStore(
'mongodb://localhost:27017', database='test')
# Queries
BlogPost.m.find(...) # equiv. to db.blog.posts.find(...)
# Inserts
post0 = BlogPost(dict(... fields here ... ))
post0.m.insert()
# Updates using save()
post1 = BlogPost.m.find({'author.username': 'rick446'}).first()
post1.author.username = 'rick447'
post1.m.save()
# Updates using update_partial()
BlogPost.m.update_partial(
{ '_id': ... },
{ '$push': { 'comments': {... comment data...} } })
# Deletes
post1.m.delete() # single document
BlogPost.m.remove({...query...}) # delete by query
The Object-Document Mapper
Building on the schema enforcement layer is the object-document mapper, which provides two useful patterns:
flush() them all to the database at once.Ming also allows you to model relationships between your documents via ForeignIdProperty and RelationProperty. Here is an example schema for a blog hosting site with multiple blogs:
from ming import schema as S
from ming.odm.declarative import MappedClass
from ming.odm.property import FieldProperty, RelationProperty
from ming.odm.property import ForeignIdProperty
from ming.odm import ODMSession
# wrap the session from the schema layer
odm_session = ODMSession(session)
class Blog(MappedClass):
class __mongometa__:
session = odm_session
name = 'blog.blog'
_id = FieldProperty(S.ObjectId)
name = FieldProperty(str)
posts = RelationProperty('Post')
class Post(MappedClass):
class __mongometa__:
session = odm_session
name = 'blog.posts'
_id = FieldProperty(S.ObjectId)
title = FieldProperty(str)
text = FieldProperty(str)
blog_id = ForeignIdProperty(Blog)
blog = RelationProperty(Blog)
Once you have the classes defined, you can load and modify the objects, using the odm_session to save your changes to MongoDB:
# Queries Blog.query.find(...) # equiv. to db.blog.posts.find(...) blog = Blog.query.get(name='MongoDB Blog') blog.posts # returns a list of post objects for the blog blog.posts[0].blog # returns the blog object # Inserts post = Post(blog=blog, ...) # automatically sets blog_id # Updates post.title = 'The cool post' # Save your changes odm_session.flush() # Mark post for deletion post.delete() # Actually delete odm_session.flush()MongoDB-in-Memory
The third main component of Ming is an implementation of the pymongo API that allows you to perform testing of your application without having a dependency on a MongoDB server. To use MIM, you can swap out the creation of your pymongo connection:
from ming import mim
import unittest
class TestCase(unittest.TestCase):
def setUp(self):
# self.connection = Connection()
self.connection = mim.Connection()
MIM’s support of the pymongo api and MongoDB query syntax has largely been driven by the various APIs and queries used internal to SourceForge, so there are some gaps, but these are rapidly filled when reported. For instance, MIM does provide support for gridfs and mapreduce already (mapreduce Javascript support provided by python-spidermonkey). And of course MIM integrates well with the rest of Ming, allowing you to substitute a mim:// URL for the normal mongodb:// url in your datastore:
from ming import mim
from ming.datastore import DataStore
import unittest
class TestCase(unittest.TestCase):
def setUp(self):
self.ds = DataStore(
'mongodb://localhost:27017', database='test')
Conclusion
There are other good bits in MongoDB, including lazy and eager migrations, support for the MongoDB filesystem gridfs, WSGI auto-flushing middleware for the ODMSession, and more. We’re also experimenting with support for GQL, Google’s query language for the Google App Engine (GAE), to facilitate porting apps from GAE to MongoDB. Ming is actively maintained and is a mission-critical part of the SourceForge application stack, where it’s been in production use for over 2 years.
So what do you think? Is Ming something that you would use for your projects? Have you chosen one of the other MongoDB mappers? Please let us know in the comments below!
To learn more about development with Ming, check out Rick’s ebook MongoDB with Python and Ming or visit the Atlanta MongoDB User Group on Wednesday, where Rick is presenting.
This post originally appeared on the Microsoft Interoperability Blog.
Do you need to build a high-availability web application or service? One that can scale out quickly in response to fluctuating demand? Need to do complex queries against schema-free collections of rich objects? If you answer yes to any of those questions, MongoDB on Windows Azure is an approach you’ll want to look at closely.
People have been using MongoDB on Windows Azure for some time (for example), but recently the setup, deployment, and development experience has been streamlined by the release of the MongoDB Installer for Windows Azure. It’s now easier than ever to get started with MongoDB on Windows Azure!
MongoDB
MongoDB is a very popular NoSQL database that stores data in collections of BSON (binary JSON) objects. It is very easy to learn if you have JavaScript (or Node.js) experience, featuring a JavaScript interpreter shell for administrating databases, JSON syntax for data updates and queries, and JavaScript-based map/reduce operations on the server. It is also known for a simple but flexible replication architecture based on replica sets, as well as sharding capabilities for load balancing and high availability. MongoDB is used in many high-volume web sites including Craigslist, FourSquare, Shutterfly, The New York Times, MTV, and others.
If you’re new to MongoDB, the best way to get started is to jump right in and start playing with it. Follow the instructions for your operating system from the list of Quickstart guides on MongoDB.org, and within a couple of minutes you’ll have a live MongoDB installation ready to use on your local machine. Then you can go through the MongoDB.org tutorial to learn the basics of creating databases and collections, inserting and updating documents, querying your data, and other common operations.
MongoDB Installer for Windows Azure
The MongoDB Installer for Windows Azure is a command-line tool (Windows PowerShell script) that automates the provisioning and deployment of MongoDB replica sets on Windows Azure virtual machines. You just need to specify a few options such as the number of nodes and the DNS prefix, and the installer will provision virtual machines, deploy MongoDB to them, and configure a replica set.
Once you have a replica set deployed, you’re ready to build your application or service. The tutorial How to deploy a PHP application using MongoDB on Windows Azure takes you through the steps involved for a simple demo app, including the details of configuring and deploying your application as a cloud service in Windows Azure. If you’re a PHP developer who is new to MongoDB, you may want to also check out the MongoDB tutorial
on php.net.
Developer Choice
MongoDB is also supported by a wide array of programming languages, as you can see on the Drivers page of MongoDB.org. The example above is PHP-based, but if you’re a Node.js developer you can find a the tutorialNode.js Web Application with Storage on MongoDB over on the Developer Center, and for .NET developers looking to take advantage of MongoDB (either on Windows Azure or Windows), be sure to register for the free July 19 webinar that will cover the latest features of the MongoDB .NET driver in detail.
The team at Microsoft Open Technologies is looking forward to working closely with 10gen to continue to improve the MongoDB developer experience on Windows Azure going forward. We’ll keep you updated here as that collaboration continues!
The MongoDB shell (mongo) is an extended SpiderMonkey (JavaScript) shell, so you can use it to execute JavaScript code just like you’re used to writing.
The shell is best at things like testing out queries, examining specific records, configuring replica sets and sharding, and administrative tasks like creating indexes.
Many objects in the shell have help functions in case you forget how to do things. When in doubt:
> help
> db.help()
DB methods:
db.addUser(username, password[, readOnly=false])
db.auth(username, password)
...
> db.demo.help()
DBCollection help
db.demo.find().help() - show DBCursor help
...
Some of the most common tasks the shell is needed for involve sending commands to the MongoDB server - like changing profiler settings, losetting. These kinds of functions are performed by using the shell to send database commands for example db.runCommand("shutdown") or db.runCommand({profile:-1}). You can get info on these commands and how to use them from the shell like this:
> db.listCommands() // print a listing of all the available commands
> db.commandHelp("compact") // show details about how to use the "compact" database command.
Sometimes, it’s useful to see how a particular shell function works - you can do this by leaving off the ( ), which forces the shell to print the source code of the function instead of just executing it. For example:
> db.printReplicationInfo() // print info about current replication status
...
> db.printReplicationInfo // print the source code of the printReplicationInfo function
function () {
...
}
In the shell, the default behavior when executing a query is to print the output, unformatted.
db.posts.find()
{ "_id" : ObjectId("4e697832c67f0623d40000ad"), "content" : "lorem ipsum" }
. . .
{ "_id" : ObjectId("4e697832c67f0623d40000f0"), "content" : "four score and 7 years ago" }
has more
Try adding .pretty() to the function to format the output in a more readable matter, like this:
db.posts.find().pretty()
By default, queries with lots of results only print a limit of 20 results at a time - type it at the shell to show the next batch of 20. You can adjust this size limit by setting the value of DBQuery.shellBatchSize.
When the mongo shell starts, it will read and execute any JavaScript code in the file .mongorc.jsin your home directory. By adding your settings, utility functions, or tweaks into this file you can make them available in every shell session. For example, add this snippet of code to the file, and it will allow you to run the inspect() function on any javascript object in the shell, which will print information about its properties:
function inspect(o, i) {
if (typeof i == "undefined") {
i = "";
}
if (i.length > 50) {
return "[MAX ITERATIONS]";
}
var r = [];
for (var p in o) {
var t = typeof o[p];
r.push(i + "\"" + p + "\" (" + t + ") => " + (t == "object" ? "object:" + xinspect(o[p], i + " ") : o[p] + ""));
}
return r.join(i + "\n");
}
You can also add snippets of code in your .mongorc.js file to do other cool stuff, like customize your shell prompt. In addition, you can execute any file containing JavaScript code from within the shell by calling load(filename).
Frequently it’s useful to execute some mongo shell commands without leaving your operating system shell - for example, to pipe the output to another process or redirect to a file. This can be done easily by just calling the shell command with the --eval option followed by the JavaScript you want to execute. Just be aware that since the output isn’t being automatically printed by the shell process with this approach, so if you need to print the JSON representation of documents in your queries, you will need to explicitly use the printjson() function to generate correct output. For example:
$ mongo --eval "db.posts.find().forEach(function(x){printjson(x)})"
{ "_id" : ObjectId("4fbec0b9f3ecac6f43bc1c13"), "x" : 10 }
...
When writing statements in the shell that leave an open bracket, parenthesis, or quote, hitting enter will prompt you with “…” for more input. So if you need to write a long block of code, you can let it span multiple lines:
> for(var i=0;i<100;i++){
…
If you screw up or want to cancel it and start over, just hit enter twice - the entire block of code will be aborted.
A new shell feature available in versions 2.1.x and later is the ability to edit blocks of code using a text editor. Use the “edit” keyword with the name of a function, and it will invoke your editor with the block of source code for that function:
> edit testfunc
// now we get dropped into an editor where we can edit code for the function
> testfunc // show the source of the function we just wrote
function testfunc() {
print("hello world!");
}
> testfunc()
hello world!
This is part 2 of a series, with part 1 covering the bare essentials to get you going. In this post we are going to take a closer look at queries and how indexes work in MongoDB.
IntroductionI’d like to kick off this post with a thanks to the folks behind the PHP extension for MongoDB, who have done a fantastic job of matching the functionality of the Mongo shell client. This is important when you start to see how similarly the two function, and you might find that you can tweak your logic using the shell and quickly implement the same logic from within PHP.
The PHP extension supports something that is rather new to a lot of folks in the PHP world, a feature called method chaining: The ability to run several methods at the same time on one object. For example, you might want to run a query and then apply a limit to it. Most folks would think that this is two operations, and they are correct, however with method chaining, you can do both in one shot, like this:
$result = $songs->find()->limit(2);
Of course this works in the Mongo shell too. You would basically do the same thing:
result = db.songs.find().limit(2);
For more reading on method chaining, there’s an excellent blog post about method chaining in PHP 5.
Before we dig deep into finding and manipulating your data, let’s discuss the different data types that MongoDB supports.
MongoDB Data TypesAll databases have their own data types, and MongoDB is no different. A summary of MongoDB’s available data types are as follows:
_id. It is 12 bytes long, and is automatically created by the database when you insert a document without an _id property set. You can also set your own values, but remember that this is used as the primary key and so must be unique.Thoughtful consideration needs to be given to the MongoId data type, as it is used as a primary key for most documents. It is recommended that you allow this feature to run automatically unless you have very specific needs and your own naturally unique primary keys.
A common mistake is the assumption by PHP engineers that MongoIds are strings. They are not. A MongoId is stored as an object. So if you are working with a document whose _id property is set as a MongoId instance with the value of 4cb4ab6d7addf98506010000, you will need to search for that document with an instance of the MongoId class with that value. For instance, imagine you are looking for the document with the previously mentioned MongoId as the _id property:
// This is only a string, this is NOT a MongoId
$mongoid = '4cb4ab6d7addf98506010000';
// You will not find anything by searching by string alone
$nothing = $collection->find(array('_id' => $mongoid));
echo $nothing->count(); // This should echo 0
// THIS is how you find something by MongoId
$realmongoid = new MongoId($mongoid);
// Pass the actual instance of the MongoId object to the query
$something = $collection->find(array('_id' => $realmongoid));
echo $something->count(); // This should echo 1
Always keep this in mind when working with MongoDB. Types are important here, just like PostgreSQL, which will punish you if you attempt to join using columns with slightly different data types.
Another note on the previous example: I assigned the result of the find() to variables $nothing and $something, and then called methods on them. That is because the find() method returns a recordset called a MongoCursor, which provides its own methods. You can get a count on the number of documents returned by a query, as well as iterate through them, and even get an explain plan to see how the query is being executed.
Here is a very common question: So what should I store for _id values in other collections such as user_id or article_id? The solution is simple: Always use MongoId instances. I’ve made the mistake of storing _id values as strings in another collection, and was rewarded by having to always instantiate a new MongoId object for every query. Apt punishment for not thinking things all the way through.
Simply put, if you have a users collection where a given user has an _id property, and they need to store that same value in the posts collection as author_id, then make sure you save author_id as a MongoId object and not a string. Otherwise, every time you wish to display the details of an author to a post, you have to manually instantiate author_id as a MongoId object so you can find the user document by _id primary key.
MongoDate is also stored as an object, as opposed to an integer or string. Like MongoIds, you need to treat MongoDate objects with additional care; however it’s then possible to do some neat things like find a document that has a MongoDate between 1971 and 1999 for instance:
// Instantiate dates for the range of the query
$start = new MongoDate(strtotime('1971-01-01 00:00:00'));
$end = new MongoDate(strtotime('1999-12-31 23:59:59'));
// Now find documents with create_date between 1971 and 1999
$collection->find(array("create_date" => array('$gt' => $start, '$lte' => $end)));
Queries from PHP
Now it is time to construct some more complex documents to demonstrate how you can find and manipulate your data in MongoDB. Let’s create several documents with a few properties, including a nested array, nested document and a variety of data types discussed earlier in this post. Note the deliberate difference between strings and numbers, as I spell out numbers as strings for simplicity. I’m using the shell to insert these documents quickly and easily, and suggest you follow along:
one = {
"string" : "This is not my beautiful house",
"number" : 42,
"boolean" : true,
"list" : ["one", "two", "three"],
"doc" : {"one" : 1, "two" : 2}
};
db.things.save(one);
two = {
"string" : "This is not my beautiful wife",
"number" : 666,
"boolean" : false,
"list" : [1, 2, 3],
"doc" : {"1" : "one", "2" : "two"}
};
db.things.save(two);
three = {
"string" : "Same as it ever was",
"number" : 117,
"boolean" : true,
"list" : ["one", "two", "four"],
"doc" : {"one" : 1, "four" : 4}
};
db.things.save(three);
You probably want to see how this went, so you can get a nicely formatted list of what is in your things collection thusly:
> db.things.find().pretty()
{
"_id" : ObjectId("4fdc77f74e300a45bea9897a"),
"string" : "This is not my beautiful house",
"number" : 42,
"boolean" : true,
"list" : ["one", "two", "three"],
"doc" : {"one" : 1, "two" : 2}
}
{
"_id" : ObjectId("4fdc77f74e300a45bea9897b"),
"string" : "This is not my beautiful wife",
"number" : 666,
"boolean" : false,
"list" : [1, 2, 3],
"doc" : {"1" : "one", "2" : "two"}
}
{
"_id" : ObjectId("4fdc77f94e300a45bea9897c"),
"string" : "Same as it ever was",
"number" : 117,
"boolean" : true,
"list" : ["one", "two", "four"],
"doc" : {"one" : 1, "four" : 4}
}
Notice that each of your new documents has an _id property. Chances are your _id values are different than mine, as they have been designed to be unique based on hardware, time and other aspects. This is greatly useful when you are running hundreds (thousands!) of MongoDB servers and need a single value to be unique across all of them.
You can now do some interesting things both from the shell and PHP. I’m hopping back to PHP as, um, this is a series on PHP…
// Connect to test database on localhost
$db = new Mongo('mongodb://localhost/test');
// Get the users collection
$c_things = $db->things;
// Get a count of documents in the things collection
$count_things = $c_things->count();
echo "There are $count_things documents in the things collection.\n";
// How many have the boolean property set to true?
$count_things = $c_things->count(array('boolean' => true));
echo "There are $count_things true documents in the things collection.\n";
// How many have a string property set, regardless of value?
$count_things = $c_things->count(array('string' => array('$exists' => true)));
echo "There are $count_things documents with strings in the things collection.\n";
// How many have a list property with array values including "one" and "two"?
$count_things = $c_things->count(array('list' => array('$in' => array('one','two'))));
echo "There are $count_things documents with 'one' and 'two' as list array values in the things collection.\n";
// How many have a list property with array values not including 'three'?
$count_things = $c_things->count(array('list' => array('$nin' => array('three'))));
echo "There are $count_things documents not including the string 'three' in list array values in the things collection.\n";
// How many have include 'ever was' in the string property? Using a regular expression:
$regex = new MongoRegex("/ever was/");
$count_things = $c_things->count(array('string' => $regex));
echo "There are $count_things documents including the string 'ever was' in string property in the things collection.\n";
This is what you should see when running this script on your machine:
$ php -f example.php There are 3 documents in the things collection. There are 2 true documents in the things collection. There are 3 documents with strings in the things collection. There are 2 documents with 'one' and 'two' as list array values in the things collection. There are 2 documents not including the string 'three' in list array values in the things collection. There are 1 documents including the string 'ever was' in string property in the things collection.
Most importantly, notice that we searched on embedded values, including an array and an embedded object. We were able to search by the existence of a property, values set for a property, and pass an array to see if any of those values were set in an embedded array.
That last example illustrates how MongoDB can search with regular expressions - which are case sensitive by default. There are a great many more query options available, with explanation for advanced queries being your best start.
Returning Documents to PHPSo far we’ve stuck to the command line and simple counts as our results. Now we will take a look at how MongoDB returns documents to your PHP applications. The last item in the previous example used a regular expression, but returned just the count. What if you wanted the document instead?
// Find a document that includes 'ever was' in the string property using a regular expression:
$regex = new MongoRegex("/ever was/");
$ever_was = $c_things->findOne(array('string' => $regex));
var_dump($ever_was);
Running this script should look like this, which will probably look very familiar to many of you who have been working in PHP with other databases:
$ php -f example.php
array(6) {
'_id' =>
class MongoId#7 (1) {
public $$id =>
string(24) "4fdc77f94e300a45bea9897c"
}
'string' =>
string(47) "Same as it ever was"
'number' =>
double(117)
'boolean' =>
bool(true)
'list' =>
array(3) {
[0] =>
string(3) "one"
[1] =>
string(3) "two"
[2] =>
string(4) "four"
}
'doc' =>
array(2) {
'one' =>
double(1)
'four' =>
double(4)
}
}
The result was an array, including all the elements of the document returned by the query. By calling findOne() we ensured only one document would be returned, and for multiple documents you could shorten this to just find() and iterate over the results like your ordinary database query.
This section is not PHP specific, but critical if you want your PHP apps to perform adequately.
Indexes in MongoDB are similar to what you are familiar with for other databases. When you reach a certain number of documents (and data set size) indexes will become necessary to ensure your queries execute fast and efficiently. Being a document database, however, means that you can index array values and even embedded objects.
Creating an index is simple, as the following shell example illustrates:
> db.things.ensureIndex({"string":1});
> db.things.ensureIndex({"number":1});
> db.things.ensureIndex({"boolean":1});
We just plopped indexes on the string, number and boolean properties for all documents in this collection. What is interesting about this is that there could be documents in this collection that do not have any of these properties set. With a relational database you would be forced to allow NULL values for those columns, which is not always accurate.
What about those nested arrays and objects? We can index the properties at the top level, and we can even index an embedded property if we wanted to. Look at the following example:
> db.things.ensureIndex({"list":1});
> db.things.ensureIndex({"docs.two":1});
We just indexed a property that has an embedded array. What this means is that MongoDB is smart enough to figure out how to provide index values for each element in that array, per document. So if you search for all documents that have the string one in their list property, MongoDB will still use the index. Type conversion is handled as well, so you can use the same index to search for all documents that have the number 2 instead. MongoDB refers to indexed arrays as multikeys.
The second example demonstrates one of the many powers of BSON: reaching inside a document for embedded information. We just created an index on the docs property, which applies to all documents, including those that do not have that property set.
This can be extremely useful when embedding arrays in your documents. For example, I have an application where I have multiple third parties that have users with special privileges only for their applications that are running within the main website. I can now store an embedded object called partners and store each partner name and a value based on their access levels for their own applications. All of this can live happily in the users collection, making maintenance and reporting a breeze!
But what about compound indexes? If you are doing a ton of queries based on the values of two properties, you can create a single index that includes both:
> db.things.ensureIndex({"string":1,"boolean":1});
Of course, you can search on the first property and still use the index, so you don’t have to create separate indexes for each property in the compound index. That said, this process only works left-to-right, meaning that using the above index, you can search on string, string and boolean; but if you search solely on boolean you will not use the index.
You’re probably wondering what the numbers are behind the properties in the index creation statements. Those numbers (1 and -1) tell MongoDB whether this is an ascending or descending index, respectively. Note that index order is irrelevant for single-key indexes, and mainly comes into play with operations like sorting.
Let’s take a quick look at what we’ve done to the things collection today, using the shell:
>db.things.stats()
{
"ns" : "test.things",
"count" : 3,
"size" : 656,
"avgObjSize" : 218.66666666666666,
"storageSize" : 12288,
"numExtents" : 1,
"nindexes" : 7,
"lastExtentSize" : 12288,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 57232,
"indexSizes" : {
"_id_" : 8176,
"string_1" : 8176,
"number_1" : 8176,
"boolean_1" : 8176,
"list_1" : 8176,
"docs.two_1" : 8176,
"string_1_boolean_1" : 8176
},
"ok" : 1
}
This is the collStats feature in the shell, which gives you statistics about the mentioned collection. This can be useful if you are experiencing unexpected behavior with your collections or indexes.
A new index feature in MongoDB is the sparse index. Imagine having a users collection with around 300 million documents, with only 35 of them also having a specific property set. Do you really want to have an index that includes entries for all 300 million documents when searching for those with that property? A sparse index basically only includes documents that have that indexed property. So if you only have 35 documents in your users collection with that property set, you will want to use a sparse index.
One final word on indexes in MongoDB: You can do many more things like dropping duplicates, unique indexes, and indexing geospatial data. A great place to read in detail is indexing advice and FAQ where a lot of common questions are answered.
Outro, or What’s Coming NextThere are a few more posts coming in this series, including detailed coverage on document data design, a comparison between ODM/ORM/driver approaches and frameworks, advanced queries and how they relate to PHP, taking advantage of map reduce, and a few sample applications demonstrating a few common use cases that I’ll be sharing on GitHub. It is safe to say that there is a lot more coming for the PHP universe on this blog!
In June, the kernel team released MongoDB 2.1.2, which will evolve into the 2.2 release, as well as the 2.0.6 stable release.
At the same time the drivers team has been hard at work improving the drivers and adding support for new features in the upcoming 2.2 release. These releases are:
Additional improvements continue for MongoDB wrapper for Azure:
For up to date information on new MongoDB releases join the MongoDB Announcements mailing list.
Every month, we’ll be publishing the best community blog posts from the month. Here is the digest for June:
Rick Copeland wrote two posts on Python development with Ming, an ODM developed at SourceForge: Schema Maintenance with Ming and MongoDB and Declarative Schemas for MongoDB in Python using Ming.
After his talk on Geolocation, Maps and MongoDB at MongoDB UK, Derick Rethans wrote a post on Indexing Freeform-Tagged Data.
MongoDB developer Kristina Chodorow continued her series on replica sets internals with a post on initial sync.
With the Mongoose version 3 release imminent, Aaron Heckmann started a series of posts on the newest features for the node.js ODM. The first post covers versioning and the second covers FindAndModify.
Ryan McGeary wrote a post on schema design at BusyConf, winning 10gen’s recent blogging contest. To see the rest of the entries, check out blog.10gen.com.
10gen interviewed several MongoDB users on their blog, including Xperscore, Merchpin, iBE.net, and Equal Experts.
Dwight Merriman, one of the MongoDB committers, wrote a post on Why MongoDB is (somewhat) feature-heavy.
The team at Buffer, a social media management tool, blogged about using MongoDB for real-time analytics.
5in5NYC recently published their “10gen and friends” episode, featuring five New York City based startups working with MongoDB: 10gen, Art.sy, Bitfloor, Crowdtap, and NextBigSound.