The MongoDB NoSQL Database Blog

Month

August 2012

10 posts

MongoDB 2.2 Released
We are pleased to announce the release of MongoDB version 2.2.  This release includes over 1,000 new features, bug fixes, and performance enhancements, with a focus on improved flexibility and performance. For additional details on the release: 2.2 Release Notes MongoDB Downloads MongoDB Manual Version 2.2 Online Conference New Features Aggregation Framework The Aggregation Framework is available in its first production-ready release as of 2.2. The aggregation framework makes it easier to manipulate and process documents inside of MongoDB, without needing to useMap Reducez,/span>, or separate application processes for data manipulation. See the aggregation documentation for more information. Additional “Data Center Awareness” Functionality 2.2 also brings a cluster of features that make it easier to use MongoDB for larger more geographically distributed contexts. The first change is a standardization of read preferences across all drivers and sharded (i.e. mongos) interfaces. The second is the addition of “tag aware sharding,” which makes it possible to ensure that data in a geographically distributed sharded cluster is always closest to the application that will use that data the most. Improvements to Concurrency v2.2 eliminates the global lock in the mongod process.  Locking is now per database.  In addition a new subsystem avoids locks under most page-fault events; thus concurrency improves even on systems with a single database.   Parallelism in application of writes on secondaries is enhanced also.  See this video for more details. We’re looking forward to your feedback on 2.2. Keep the Jira Issues, blog posts, user group posts, and tweets coming. - Eliot and the 10gen/MongoDBteam
Aug 29, 20124 notes
#mongodb #mongo #mongodb 2.2 #release #upgrade #data model #concurrency #TTL collections #data center awareness
Hosting and Developing the HTML5 Game Cobalt Calibur with MongoDB, Node.js and OpenShift

This was originally posted on the OpenShift blog by Thomas Hunter.

So, you’re interested in getting the HTML5 Game Cobalt Calibur hosted for free? Look no further, Red Hat’s OpenShift can do that for you. Follow this guide and you’ll be up and running in no time. Cobalt Calibur is a multiplayer browser-based game which uses a bunch of HTML5 features to run on the frontend, and requires a Node.js and MongoDB server on the backend. Luckily OpenShift will satisfy these requirements for you.

The first thing you’ll want to do is create an OpenShift account. It’s quite easy and painless, I promise. Once you’re done getting it setup, be sure to click any email validation links and then log in to the website.

Once you’ve got your account setup, you’re going to want to create an SSH key for your computer (if you haven’t done so previously). To create your SSH key, you will want to open up a Terminal emulator and run some commands. These commands should work fine for both OS X and Linux computers. If you’ve already got an SSH key (which you should if you’re a GitHub user), you can skip these steps.

If you’re on a Mac, you’ll want to go to your list of applications and run Terminal. You can get to this app quickly by pressing Cmd+Space, typing in Terminal, and pressing enter.

Below is what your terminal window will end up looking like. You’ll want to type the command ssh-keygen -t rsa, and press enter. You will then be prompted a few questions; just leave everything blank and keep hitting enter.

$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/USERNAME/.ssh/id_rsa): <press enter>
Created directory '/home/USERNAME/.ssh'.
Enter passphrase (empty for no passphrase): <press enter>
Enter same passphrase again: <press enter>
Your identification has been saved in /home/USERNAME/.ssh/id_rsa.
Your public key has been saved in /home/USERNAME/.ssh/id_rsa.pub.

Congrats, you’ve now got an SSH public/private key. This is a file which can be used to prove to a remote computer that you are who you say you are. We need to give a copy of this file to OpenShift so that you can use git to push changes to your code to them.

To get a copy of your key file, you’ll want to copy the text from ~/.ssh/id_rsa.pub. You can run the command

cat ~/.ssh/id_rsa.pub which will display the contents of that file to your screen. Select the text and copy the output into your clipboard (everything from ssh-rsa to the username@hostname part):

$ cat ~/.ssh/id_rsa.pub ssh-rsa AAAAB3NzaC1yc2BLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAHBLAH== USERNAME@HOSTNAME

Once you’ve copied that text, on the OpenShift website, click My Account > Add a new key to visit the Add a Public Key page, and paste the contents of the output into the big text box. In the small text box above it you can name your key (such as Living Room Desktop or Developer MacBook). You’ll want to use a descriptive name, because if your key is ever compromised, you’ll want to know which one to disable.

Now, click the Create button. OpenShift is now aware of your SSH public key, and you can interact with the git server they provide without problems. Feel free to repeat this process from other machines you plan on working from.

If you get an error when you save the key, you might not have copied the whole thing. If so, you might need to open it in an editor. On a Mac try Open ~/.ssh/id_rsa.pub, and on Linux, you might try gedit ~/.ssh/id_rsa.pub.

Now it is time to create our OpenShift application. To do this, visit the Create Application page from the main OpenShift navigation. On this page, you will see a big list of all the types of applications supported by OpenShift. Scroll down until you see the Node.js option and select that.

On the next screen you will be prompted for some very basic information. Specifically, you will be asked to name your application. Since we are uploading the Cobalt Calibur engine, it makes sense to name it something like cobaltcalibur.

You will also be prompted to create a “namespace” for your account. This is basically a way to associate all of your app URLs with your account. This is so that multiple people can have the same apps named “cobaltcalibur” without stepping on each others toes. I already entered a namespace name before, so I didn’t need to this time.

After you click Create Application, OpenShift will work it’s cloud magic behind the scenes. During this time it is probably creating some DNS entries, copying some skeleton files, creating a git repository, the works. After the process is done, you will be taken to a new screen:

If you like, you can click the blue link to see the skeleton application OpenShift has created for you. It will be a pretty boring, static page which is displayed by a very simple Node.js app.

What you will want to do though is copy the commands in green and paste them into your terminal. This will pull the skeleton code from your applications git repository and make a local copy. There are some (probably) important files in here that we will want to keep.

If you see the same listing of files, then congratulations, you’ve checked out your application from OpenShift.

Now that you’ve got your application created and checked out, we want to add MongoDB support to the application. OpenShift calls these Cartridges.

To add MongoDB, first browse to the All Applications page, and then click the title of the application you created:

On this screen you can see the information for accessing your git repository again, but more importantly, there is a big Add Cartridge button.

Click that big blue button, and the next screen will prompt you for the type of cartridge to be added. Click the MongoDB option:

Once you do this, it will prompt you to make sure you want to add MongoDB. Click the Add Cartridge button again, and after some processing happens in the background it will be added to your application. You will want to copy all of the information you are provided with on this screen, notably the user, password, database, and connection URL which contains the IP address and port number for the database. We’ll give this information to the Cobalt Calibur game later on.

Now that we’ve got the MongoDB cartridge added to our application, we want to actually start the MongoDB server. To do this, you will first need to install the rhc command line utility. You’ll want to follow steps 1 and 2 on that page, you can ignore the other steps. The rhc utility gives you more control over your OpenShift applications that the website does, and is needed to start up the MongoDB server. Run the command rhc app cartridge start -a APPNAME -c mongodb-2.0 and this will start the server for you:

You are now ready to download the Cobalt Calibur source code, configure it to work with your OpenShift account, and upload it to the server. To do this, browse to the Cobalt Calibur GitHub page and simply download the ZIP file.

Extract it to the same folder that the Node.js application was checked out into. This will overwrite the index.html page, the server.js file, and the node_modules/ folder; that is all fine.

Now, it’s time to update the server.js file so that it is able to connect to your MongoDB daemon, as well as bind to the proper ip address and port number that OpenShift requires. You can open up server.js in whatever your favorite editor is. Here is what the old code looks like:

// Web Server Configuration
var server_port = 80; // most OS's will require sudo to listen on 80
var server_address = '127.0.0.1';

// MongoDB Configuration
var mongo_host = '127.0.0.1';
var mongo_port = 27017;
var mongo_req_auth = false; // Does your MongoDB require authentication?
var mongo_user = 'admin';
var mongo_pass = 'password';
var mongo_collection = 'terraformia';

And here is what you will want to change it to:

// Web Server Configuration
var server_port = process.env.OPENSHIFT_INTERNAL_PORT; // most OS's will require sudo to listen on 80
var server_address = process.env.OPENSHIFT_INTERNAL_IP;

// MongoDB Configuration
var mongo_host = 'MONGO IP ADDRESS';
var mongo_port = 27017;
var mongo_req_auth = true; // Does your MongoDB require authentication?
var mongo_user = 'admin';
var mongo_pass = 'MONGO PASSWORD';
var mongo_collection = 'MONGO DATABASE NAME';

Notice how OpenShift provides some environment variables for the web server port and ip address. It might also provide these same variables for the mongo connection, but I didn’t see this information.

The application is now configured properly. You’ll want to now add your files to git, commit the files into git, and push your changes to the server.

git add -A .
git commit -m "Adding Cobalt Calibur files"
git push

You’ll see a bunch of messages from all of the git hooks performing various actions, this is probably a good thing.

Now, if you browse to the URL for your game instance and refresh the page, it should load for you. If not, you might need to run the following command (I needed to for some reason):

rhc app restart -a game

Congratulations, you’ve now got your own personal instance of Cobalt Calibur running on OpenShift for free!

There is one big bug with OpenShift though, they don’t support websockets yet. My guess is that the different apps are hosted in a shared environment, and each application gets one port number to the outside world. Websockets require a bunch of random high ports for different clients, so this doesn’t really work with the shared host environment. Luckily, socket.io will fallback to using long-polling AJAX. The game doesn’t always run perfectly under these conditions, e.g. the monsters or corruption might no load. OpenShift is planning on adding this feature sooner or later, you can vote on it in the mean time.

Thomas Hunter is an evented Node.js hacker transitioning from the world of request/response PHP web development, building everything from hardware control software to traditional web apps. Follow him on Twitter at @tlhunter.

Aug 22, 20123 notes
#mongodb #node.js #nodejs #openshift #source code #games #gaming #calibur
Interview with David Mytton, organiser of the London MongoDB User Group
The London MongoDB User Group was founded in March 2011, and since then has grown to approximately 650 members. The group meets the last Tuesday of every month at 10gen’s new London office in Shoreditch. A very short interview with David Mytton, organiser of the London MongoDB User Group and founder of Server Density.

What are the biggest challenges you have running the London MongoDB User Group? How do you find your speakers for the group? Finding speakers is the hardest part. It’s like running a conference every month because you have to find several people to provide talks on interesting topics to encourage members to come each month. There’s only so many people using MongoDB in each meetup area so it’s not like a yearly conference which allows time between each event for new people to start up and existing users to create new projects or learn new things. How have you helped and encouraged the user group to grow? What advice would you give to someone who was starting their own user group? Making sure we have interesting speakers is the best way to do it. Then using your own promotional channels (Twitter, Blog, telling friends) and connecting with companies using the project. 10gen help with this as well because they’re doing this kind of activity on a full time basis. Aside from your work and MongoDB, tell me about something you are passionate about? I particularly enjoy cycling and just returned from a 3 week cycling trip in Japan. The London MongoDB User Group was recently featured on the Meetup HQ blog, and next meets on Aug 28. RSVP here
Aug 20, 2012
Getting going quickly with Python, MongoDB, and Spatial data on OpenShift: Part II
This post originally appeared on the OpenShift blog As a follow up to my last post about getting spatial going in MongoDB on OpenShift, today we are going to put a web service in front of it using Python. There are several goals for this article: Learn a little bit about Flask - a Python web framework Learn about how to connect to MongoDB from Python Create a REST Style web service to use in our SoLoMo application I hope by the end you can see how using a Platform as a Service can get you going with Python, MongoDB, and Spatial faster than you can say…“Awesome Sauce”. We have a lot of ground to cover so let’s dig right in. Creating the Python application Here is OpenShift the command line to create the Python app rhc app create -t python-2.6 -a pythonws Using the flask quickstart from GitHub We have already put together a flask quickstart in the openshift github space. To get the framework into your application all you have to do is (from the README.md): cd pythonws git remote add upstream -m master git://github.com/openshift/openshift-mongo-flask-example.git git pull -s recursive -X theirs upstream master There we now have a flask app that we can modify source code. If you want to just check out the source code I used in the app you can see it on Github and follow the README.md instructions to clone it into your OpenShift account Adding MongoDB and importing data Time to add MongoDB to our application: rhc app cartridge add -a pythonws -t mongodb-2.0 The previous post in this series will cover how to import the data from a JSON file of the national parks into your mondodb database and prepare it for spatial queries. Please follows those instructions to import the data into the pythonws DB into a collection called parkpoints. Quick digression to explain Flask Before we get into our specific application I am going to take a moment to explain the Python framework for this demo. Flask basically allows you to map URL patterns to methods (it also does a lot more, like templating, but this is the only part we are using today). For example, in the mybottleapp.py file that is now in your project you can find the line: @route(‘/’) def index(): return ‘Hello World!’ This says that when a request comes in for the base URL, the function named index gets executed. In this case the function just returns the string “Hello World!” and returning has the effect of sending the string to the requestor. @route(‘/name/’) def nameindex(name=’Stranger’): return ‘Hello, %s!’ % name We can also grab pieces of the requested URL and pass it into the function. By enclosing a part of the URL in a < >, it indicates that we want to access it within our function. Here you can see where if the url looks like: http://www.mysite.com/name/steve Then the response will be Hello, steve! Or the URL could be http://www.mysite.com/name Hello, Stranger! We are going to define URL mappings for some basic REST like functionality to interact with our spatial MongoDB data store. Modify the source code The first function we are going to write will be to just simply return all the records in the database. In a more full featured app you would probably want to add pagination and other features to this query but we won’t be doing that today.@app.route(“/ws/parks”) def parks(): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks #query the DB for all the parkpoints result = db.parkpoints.find() #Now turn the results into valid JSON return str(json.dumps({'results':list(result)},default=json_util.default)) I chose to put the web services under the url /ws/parks so that we could use other parts of the URL namespace for other functionality. You can now go to your application URL (http://pythonws-.rhcloud.com/ws/parks) and you should be able to see all the documents in the DB. Using MongoDB in Python In the code above we simply make a connection to the MongoDB instance for this application and then execute a query. The pymongo package provides all the functionality to interact with the MongoDB instance from our Python code. The pymongo commands are very similar to the MongoDB command line interaction except two word commands like db.collection.findOne are split with a _, such as db.collection.find_one. Please go to the pymongo site to read more about the documentation. Notice we use the environment variables to specify the connection URL. While not hard coding database connection parameters is good practice in non-cloud apps, in our case you MUST use the environment variables. Since your app can be idled and then spun up or it could be autoscaled, the IP and ports are not always guaranteed. By using the environment variables we make our code portable. We pass the result set (which comes back as a Python dictionary) into json.dump so we can return JSON straight to the client. Since pymongo is returning the results in UTF and we want just plain text, we need to pass the json_util.default from the bson library into the json.dump command. This is probably the easiest experience I have ever had writing a web service. I love Flask, Pymongo, and Python for the simplicity of “Just Getting Stuff Done”. Grab just one park Next we will implement the code to get back a park given a parks uniqueID. For ID we will just use the ID generated by MongoDB on document insertion (_id). The ID looks like a long random sequence and that is what we will pass into the URL. return a specific park given it’s mongo _id @app.route(“/ws/parks/park/”) def onePark(parkId): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks #query based on the objectid result = db.parkpoints.find({'_id': objectid.ObjectId(parkId)}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) Here you have to use another class from the bson library - ObjectID. The actual ObjectID in MongoDB is an object and so we have to take the ID passed in on the url and create an Object from it. The ObjectID class allows us to create one of these objects to pass into the query. Other than that the code is the same as above. This little snippet also shows an example of grabbing part of the URL and passing it to a function. I explained this concept above but here we can see it in practice. Time for the spatial query Here we do a query to find national parks near a lattitude longitude pair find parks near a lat and long passed in as query parameters (near?lat=45.5&lon=-82) @app.route(“/ws/parks/near”) def near(): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks #get the request parameters lat = float(request.args.get('lat')) lon = float(request.args.get('lon')) #use the request parameters in the query result = db.parkpoints.find({"pos" : { "$near" : [lon,lat]}}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) This piece of code shows how to get request parameters from the URL. We capture the lat and lon from the request url and then cast them to floats to use in our query. Remember, everything in a URL comes across as a string so it needs to be converted before being used in the query. In a production app you would need to make sure that you were actually passed strings that could be parsed as floating point numbers. But since this app is just for demo purposes I am not going to show that here. Once we have the coordinates, we pass them in the the query just like we did from the command line MongoDB client. The results come back in distance order from the point passed into the query. Remember, the ordering of the coordinates passed into the query need to match the ordering of the coordinates in your MongoDB collection. Finish it off with a Regex query with spatial goodness The final piece of code we are going to write allows for a query based both on the name and the location of interest. find parks with a certain name (using regex) near a lat long pair such as above @app.route(“/ws/parks/name/near/”) def nameNear(name): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks #get the request parameters lat = float(request.args.get('lat')) lon = float(request.args.get('lon')) #compile the regex we want to search for and make it case insensitive myregex = re.compile(name, re.I) #use the request parameters in the query along with the regex result = db.parkpoints.find({"Name" : myregex, "pos" : { "$near" : [lon,lat]}}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) Just like the example above we parse out the lat and lon from the URL query parameters. In looking at my architecture I do think it might have been better to add the name as a query parameter as well, but this will still work for this article. We grab the name from the end of the URL path and then compile it into a standard Python regular expression (regex). I added the re.I to make the regex case-insenstive. I then use the regex to search against the Name field in the document collection and do a geo search against the pos field. Again, the results will come back in distance order from the point passed into the query. Conclusion And with that we have wrapped up our little web service code - simple and easy using Python and MongoDB. Again, there are some further changes required for going to production, such as request parameter checking, maybe better URL patterns, exception catching, and perhaps a checkin URL - but overall this should put you well on your way. There are examples of: Using Flask to write some nice REST style services in Python Various methods to get URL information so you can use it in your code How to interact with your MongoDB in Python using PyMongo and BSON libraries Getting spatial data out of your application Give it all a try on OpenShift and drop me a line to show me what you built. I can’t wait to see all the interesting spatial apps built by shifters.
Aug 18, 20124 notes
#mongodb #openshift #python #spatial #cloud #Cloud Hosting
Pub/sub with MongoDB
There are plenty of existing messaging systems out there (Redis, AMQP, ØMQ, etc.) but I’ve recently found MongoDB to be a very compelling alternative, especially if you’re already running MongoDB somewhere in your setup. Using MongoDB’s capped collections and tailable cursors we can build a simple pub/sub system to communicate messages (documents) between processes. Tailable Cursors When retrieving records from a tailable cursor we’re able to instruct the MongoDB server to block until some data becomes available (at which point it will be returned by the cursor). It’s worth noting here that the server will timeout after a few seconds of waiting for data and return nothing. In this case the driver you’re using will most likely initiate another blocking call behind the scenes- giving us the impression that the cursor is “listening” for data. This process may sound reminiscent of HTTP long polling in the way that data can be “pushed” to the listener. While we could achieve something similar by constantly re-querying for new data, using tailable cursors like this offers a much nicer solution. Example I put together a very basic example to demonstrate this functionality using Node.js. You can grab it here if you want to follow along: https://gist.github.com/3210919 It assumes that you already have MongoDB installed and running locally. First we need to create the capped collection in which messages will be stored. Unfortunately, it turns out that MongoDB won’t keep a tailable cursor open if the collection is empty, so let’s also create a blank document to “prime” the collection. We’ll fire up the Mongo shell to do this: $ mongo use pubsub db.messages.insert({ message: 'Hello world', time: Date.now() }) Without anyone listening for these message inserts, though, we haven’t accomplished anything terribly exciting. Subscribe When subscribing to newly inserted messages we first need to find the last document currently in the messages collection. We’ll then use the _id of that document to ensure that our tailable cursor only returns messages created in the future. Beware that since a capped collection does not have a unique index on _id by default, this initial query requires scanning the entire collection. Depending on the size of your capped collection it may be wise to create an index on _id. var query = { _id: { $gt: doc._id }, message: { $regex: /foo/i }}; I find the ability to perform complex queries like this an incredibly powerful feature and big selling point of using this setup. With our tailable cursor created, we can then repeatedly “poll” the cursor for any new messages- keeping in mind that the callback passed to nextObject will not be called until data is available: node-mongodb-native module to connect with MongoDB. Install it and then start up the subscriber: Mubsub. Honestly, I’d love to see this sort of functionality baked right into MongoDB itself. Until then, though, I think the amount of effort required is pretty minimal for what we get. If you’re using MongoDB for messaging like this I’d be curious to hear about it. Hit me up on Twitter (@scttnlsn) or discuss it in the comment section below. Scott Nelson is a JavaScript developer from Ithaca, NY. He is an open source enthusiast, freelancer, and fervent practitioner of Node.js and MongoDB!

image

Aug 15, 201210 notes
Designing MongoDB Schemas with Embedded, Non-Embedded and Bucket Structures
This was originally posted to the Red Hat OpenShift blog With the rapid adoption of schema-less, NoSQL data stores like MongoDB, Cassandra and Riak in the last few years, developers now have the ability enjoy greater agility when it comes to their application’s persistence model. However, just because a datastore is schema-less, doesn’t mean the structure of the stored documents won’t play an important role in the overall performance and resilience of the application. In this first, of a four part blog series about MongoDB we’ll explore a few strategies you should consider when designing your document structure. Application requirements should drive schema design If you ask a dozen experienced developers to design the relational database structure of an application, such as a book review site, it’s likely that each of the structures will be very similar. You’ll likely see tables for authors, books, commenters and comments and so on.. The likelihood of having varied relational structures is small because relational database structures are generally well understood. However, if you ask dozen experienced NoSQL developers to create a similar structure, you’re likely to get a dozen different answers. Why is there so much variability when it comes to designing a NoSQL schema? To optimize application performance and reliability, a NoSQL schema must be driven by the application’s use case. It’s a novel idea, but it works. Luckily, there are only a few key factors you need to understand when deriving your schema from application requirements. These factors include: • How your documents reference children collections • The structure and the use of indexes • How your data will be sharded Elements of MongoDB Schemas Of these factors, how your documents reference child collections, or embedding, is the most important decision you need to make. This point is best demonstrated with an example. Suppose we’re building the book review site as we mentioned in the introduction. Our application will have authors and books, as well as reviews with threaded comments. How should we structure the collections? Unfortunately, the answers depend on the number of comments we’re expecting per book and how frequently comments are read vs. written. Let’s look at our possible use cases. The first possibility is were we’re only going to have a few dozen reviews per book, and each review is likely to have a few hundred comments. In this case, embedding the reviews and comments with the book is a viable possibility. Here’s what that might look like: Listing 1 – Embedded // Books { “_id”: ObjectId(“500c680c1fe9193b67b898a3”), “publisher”: “O’Reilly Media”, “isbn”: “978-1-4493-8156-1”, “description”: “How does MongoDB help you…”, “title”: “MongoDB: The Definitive Guide”, “formats”: [“Print”, “Ebook”, “Safari Books Online”], “authors”: [{ “lastName”: “Chodorow”, “firstName”: “Kristina” }, { “lastName”: “Dirolf”, “firstName”: “Michael” }], “pages”: “210” } // Reviews { “_id”: ObjectId(“500c680c1fe9193b67b898a4”), “rating”: 5, “description”: “The Authors made an excellent work…”, “title”: “One of O’Reilly excellent books”, “created”: ISODate(“2012-07-04T09:48:17Z”), “book_id”: { “$ref”: “books”, “$id”: ObjectId(“500c680c1fe9193b67b898a3”) }, “reviewer”: “Giuseppe” } // Comments { “_id”: ObjectId(“500c680c1fe9193b67b898a5”), “comment”: “This review helped me choose the correct book.”, “commenter”: “Nick”, “review_id”: { “$ref”: “reviews”, “$id”: ObjectId(“500c680c1fe9193b67b898a4”) }, “created”: ISODate(“2012-07-20T13:15:37Z”) } While simple, this method does have some trade-offs. First, our reviews and comments are strewn throughout the disk. We’re potentially loading thousands of documents to display a page. This leads us to another common embedding strategy – “buckets”. By bucketing review comments, we can maintain the benefit of fewer reads to display substantial amounts of content, while at the same time maintaining fast writes to smaller documents. An example of a bucketed structure is presented below: Figure 1 – Hybrid Structure 

In this example, the bucket, or hybrid, structure breaks the comments into chunks of roughly 100 comments. Each comment collection maintains a reference to the parent review, as well as its page and current number of contained comments. Of course, as software developers, we’re painfully aware there’s no free lunch. The downside to buckets is the increased complexity your application has to deal with. The previous strategies were trivial to implement from an application perspective, but suffered from inefficiencies at scale. Buckets address these inefficiencies, but your application has to do a bit more bookkeeping, such as keeping track of the number of comment buckets for a given review. Conclusion My own personal projects with MongoDB have used each one of these strategies at one point or another, but I’ve always grown into more complicated strategies from the most basic, as the application requirements changed. One of the benefits of MongoDB is the ability to change your storage strategy at will and you shouldn’t be afraid to take advantage of this flexibility. By starting simple, you can maintain development velocity early and migrate to a more scalable strategy as the need arises. Stay tuned for additional blogs in this series covering the use of MongoDB indexes, sharding and replica sets. If you are interested in experimenting with a few of the concepts without having to download and install MongoDB, try in on Red Hat’s OpenShift. It’s FREE to sign up and all it takes is an email and your minutes from having a MongoDB instance running in the cloud. References • http://www.mongodb.org/display/DOCS/Schema+Design • http://www.10gen.com/presentations/mongosf2011/schemascale
Aug 10, 20129 notes
#mongodb #nosql #cloud #Cloud Hosting #paas
Introducing Mongo Connector
MongoDB is a great general purpose data store, but for some workflows, you may want to use another tool or integrate data from MongoDB into another system. To address this common interest, we built Mongo Connector, which is a generic connection system that you can use to integrate MongoDB with another system with simple CRUD operational semantics (i.e. insert, update, delete, and search operations.) Consider the following use cases for this system, which could include: Connecting MongoDB to search engines for more advanced search. Creating a secondary, backup MongoDB cluster that uses Mongo Connector to keep both clusters in sync. Storing specific collections or specific information in other, possibly relational, database systems. Connecting MongoDB to integration platforms such as Mule Dumping your data from MongoDB to any other storage systems, with support to stop and restart the dump at any point. On startup, Mongo Connector copies your documents from MongoDB to your target system. Afterwards, it constantly performs updates on the target system to keep MongoDB and the target in sync. The connector supports both Sharded Clusters and standalone Replica Sets, hiding the internal complexities such as rollbacks and chunk migrations. Mongo Connector abstracts the MongoDB internals so you only have to implement one class: the DocManager. The DocManager is a simple, lightweight, and most importantly, simple to write class that defines a limited number of CRUD operations for the target system. The DocManager API explains what functions must be implemented, and Mongo Connector uses those functions to link up MongoDB and the target system. For the first release, we have implementations of the Doc Manager for Solr, ElasticSearch and, of course, MongoDB (if you want to connect your MongoDB to another MongoDB instance). To install Mongo Connector, issue the following command at your systems shell: pip install mongo-connector After that, start the Mongo Connector. For example, suppose there is a Sharded Cluster with a mongos running on localhost:27217, a Solr search server running on localhost:8080, and the Solr access URL being http://localhost:8080/solr. Then, use the following command to have Mongo Connector sync the MongoDB cluster with Solr: python mongo_connector.py -m localhost:27217 -t http://localhost:8080/solr The connector will start syncing the data to the Solr connection at http://localhost:8080/solr Check out our github repo for requests for new doc managers, bug reports, and documentation on Mongo Connector: https://github.com/10gen-labs/mongo-connector About us: Mongo Connector was designed, coded, tested, packaged, and released by Leonardo Stedile and Aayush Upadhyay, two of 10gen’s summer interns. Special thanks to Spencer Brody and Randolph Tan, our two mentors. We hope you find Mongo Connector useful, and that it helps you build awesome things with MongoDB.
Aug 10, 20124 notes
#interns #mongodb
MacOSX Preferences Pane for MongoDB
This is a guest post from RémySAISSY of OCTOTechnology In my work as a developer, I keep a full development environment with several MongoDB instances and data sets on mylaptop. As an OS X user, I love having beautiful and efficient applications to do everything. Today,I have the pleasure to announce the release of the MacOSX Preferences Pane for MongoDB.

What is it for? The MacOSX preferences pane for MongoDB aims to provide a simple and efficient user interface to control the status of a local MongoDB server, just like the MySQL Preferences Pane. My focus has been on simplicity, and it has the following features: It runs on MacOSX Snow Leopard, Lion and Moutain Lion You can manually start and stop the MongoDB server from your system control panel.  You can configure MongoDB to start and stop automatically with your system.  If use Homebrew, and you have customized your system’s launchd plist, the MacOSX Preferences pane for MongoDB will: migrate your exiting launchd configuration for use with the preferences pane keep all launchd configurations your customizations through a;; enable/disable cycles  To prevent upgrade issues from taking time and attention the preference pane comes with an automatic update mecanism. Once a new version has been installed, the preferences pane will simply ask you to restart your preferences pane to start using the new version.

  Sounds good but I am not an English speaker The preferences pane for MongoDB comes in several languages :  English French Simplified Chinese Spanish Brazilian/Portugese Feel free to contribute by adding a new language! Prerequisites  Since it is only a preferences pane, it does not embed a MongoDB Server. Therefore, the first thing you have to do is installing MongoDB. A simple way to accomplish this is to use Homebrew: $brew install mongodb Installation  TheMongoDB Preferences Pane is available on Github: https://github.com/remysaissy/mongodb-macosx-prefspane.

Download the latest version: https://github.com/remysaissy/mongodb-macosx-prefspane/raw/master/download/MongoDB.prefPane.zip Unzip MongoDB.prefPane.zip Double click on MongoDB.prefPane

That’s all.

I hope this will be useful. Do not hesitate to contribute and send me your feedback!

Aug 7, 20122 notes
#MongoDB
July 2012 Release Summary
  • MongoDB 2.0.7 rc1
  • MongoDB 2.2.0 rc0

At the same time the drivers team has been hard at work improving the drivers and adding support for new features in the upcoming 2.2 release. These releases are:

  • C# Driver 1.5.0
  • PyMongo 2.2.1
  • MongoEngine 0.6.18
  • PHP Driver 1.2.11

For up-to-date information on new MongoDB releases join the MongoDB announcements mailing list.

Aug 3, 2012
MongoDB Blogroll: The Best of July 2012
Every month, we’ll be publishing the best community blog posts from the month. Here is the digest for July: Mike O’Brien, node.js and python engineer at 10gen, wrote an overview of tips and tricks for using mongo, the MongoDB shell Jesse Jiryu Davis published an overview of Motor, the Asynchronous python driver for MongoDB.  Kristina Chodorow wrote an overview of Shard Tagging, a new feature in the 2.2.0 series.  Juhi Bhatia, a member of the Pune MongoDB User Group, wrote an article on how she improved on her Schema Design for Entrib’s EMG PaaS.  Tobias Trelle, organizer of the Dusseldorf MongoDB User group, published a post on using GridFS with Spring Data.  Rick Copeland wrote an overview of the Ming toolkit for Python with tips on how it can accelerate your development.  Stephen Bronstein shared some tactics in continuous deployment in his blog post on Feature flipping with MongoDB and Node.js.  10gen interns, Samantha Ritter and Kaushal Parikh released their log visualization tool, Edda.  Database as a service company ObjectRocket launched their blog with the post In Memory MongoDB concurrency testing on ObjectRocket Comsysto’s Johannes Brandstetter showcased how to use MongoDB to create a  real-time Twitter Heatmap using MongoDB’s capped collections.  Want your blog post to be included in the next update? Tweet it out with the #mongodb hashatag or send it to us directly. 
Aug 2, 2012
#mongodb #nosql #mongo #javascript #python #logging #logs
Next page →
2012 2013
  • January 3
  • February 1
  • March 4
  • April 3
  • May 5
  • June 3
  • July
  • August
  • September
  • October
  • November
  • December
2011 2012 2013
  • January 1
  • February 1
  • March
  • April 2
  • May 4
  • June 5
  • July 8
  • August 10
  • September 5
  • October 8
  • November 7
  • December 5
2010 2011 2012
  • January 1
  • February
  • March 2
  • April 2
  • May 3
  • June 3
  • July 2
  • August 1
  • September 3
  • October 1
  • November 1
  • December 2
2009 2010 2011
  • January 1
  • February 6
  • March 12
  • April 6
  • May 3
  • June 3
  • July 1
  • August 1
  • September 1
  • October
  • November
  • December 1
2009 2010
  • January 1
  • February
  • March
  • April 4
  • May 2
  • June 3
  • July 5
  • August 6
  • September 2
  • October 3
  • November 4
  • December 2