MongoDB Connector for Hadoop

Aug 7 • Posted 1 year ago

by Mike O’Brien, MongoDB Kernel Tools Lead and maintainer of Mongo-Hadoop, the Hadoop Adapter for MongoDB

Hadoop is a powerful, JVM-based platform for running Map/Reduce jobs on clusters of many machines, and it excels at doing analytics and processing tasks on very large data sets.

Since MongoDB excels at storing large operational data sets for applications, it makes sense to explore using these together - MongoDB for storage and querying, and Hadoop for batch processing.

The MongoDB Connector for Hadoop

We recently released the 1.1 release of the MongoDB Connector for Hadoop. The MongoDB Connector for Hadoop makes it easy to use Mongo databases, or MongoDB backup files in .bson format, as the input source or output destination for Hadoop Map/Reduce jobs. By inspecting the data and computing input splits, Hadoop can process the data in parallel so that very large datasets can be processed quickly.

Read more

Securing MongoDB on Windows Azure

Aug 2 • Posted 1 year ago

By Sridhar Nanjundeswaran, Windows Azure lead at 10gen

I have used the MongoDB Installer for Windows Azure to deploy my MongoDB instance on a Windows Virtual Machine on Windows Azure. It is not my production environment but I would still like to secure it. What do I need to do to secure this standalone instance?

Let us take a look at the possible issues and how you would resolve each of them.

  • Password
  • Administrator username
  • Endpoints

Read more

The Most Popular Pub Names

Jul 30 • Posted 1 year ago

By Ross Lawley, MongoEngine maintainer and Scala Engineer at 10gen

Earlier in the year I gave a talk at MongoDB London about the different aggregation options with MongoDB. The topic recently came up again in conversation at a user group, so I thought it deserved a blog post.

Gathering ideas for the talk

I wanted to give a more interesting aggregation talk than the standard “counting words in text”, and as the aggregation framework gained shiny 2dsphere geo support in 2.4, I figured I’d use that. I just needed a topic…

What is top of mind for us Brits?

Two things immediately sprang to mind: weather and beer.

Read more

Introducing the MongoDB Driver for the Rust Programming Language

Jul 25 • Posted 1 year ago

Discuss on Hacker News

This is a guest post by Jao-ke Chin-Lee and Jed Estep, who are currently interns at 10gen. This summer they were tasked with building a Rust driver for MongoDB.

Today we are open sourcing the alpha release of a MongoDB driver for the Rust programming language. This is the culmination of two months of work with help from the Rust community, and we’re excited to be sharing our initial version. We are looking forward to feedback from both the Rust and MongoDB communities.

Read more

NeDB: a lightweight Javascript database using MongoDB’s API

Jul 17 • Posted 1 year ago

This is a guest post by Louis Chatriot

Sometimes you need database functionality but want to avoid the constraints that come with installing a full-blown solution. Maybe you are writing a Node service or web application that needs to be easily packageable, such as a continuous integration server. Maybe you’re writing a desktop application with Node Webkit, and don’t want to ask your users to install an external database. That’s when you need NeDB.

Read more

Morphia Version 0.101 Released

Jul 11 • Posted 1 year ago

The Java Team has released Morphia, version 0.101. Rumors of Morphia’s demise have been greatly exaggerated.

This release formalizes the 0.99-SNAPSHOT code that many have been using for years. Apart from some formatting and minor style changes, there is no functional change from what’s been on Google Code for the past few years.

Read more


Jul 9 • Posted 1 year ago

Libbson is a new shared library written in C for developers wanting to work with the BSON serialization format.

Its API will feel natural to C programmers but can also be used as the base of a C extension in higher-level MongoDB drivers.

The library contains everything you would expect from a BSON implementation. It has the ability to work with documents in their serialized form, iterating elements within a document, overwriting fields in place, Object Id generation, JSON conversion, data validation, and more. Some lessons were learned along the way that are beneficial for those choosing to implement BSON themselves.

Read more

Leafblower: Winner of the MongoDB March Madness Hackathon

Jul 1 • Posted 1 year ago

In March, 10gen hosted a Global Hackathon called MongoDB March Madness, and challenged the community to build tools to support the growing MongoDB community. Leafblower, a project built at the Sydney March Madness Hackathon, was the global winner of the Hackathon

Who are rdrkt?

Andy Sharman and Josh Stevenson work together on independent projects under their brand rdrkt (pronounced “redirect”) and capitalize on their largely overlapping skillset, but unique passions to build cutting-edge web applications. Josh is passionate about flexible, scalable back-end systems and Andy is passionate about accessible, responsive user interfaces.

Read more

Real-time Profiling a MongoDB Cluster

Jun 25 • Posted 1 year ago

by A. Jesse Jiryu Davis, Python Evangelist at 10gen

In a sharded cluster of replica sets, which server or servers handle each of your queries? What about each insert, update, or command? If you know how a MongoDB cluster routes operations among its servers, you can predict how your application will scale as you add shards and add members to shards.

Operations are routed according to the type of operation, your shard key, and your read preference. Let’s set up a cluster and use the system profiler to see where each operation is run. This is an interactive, experimental way to learn how your cluster really behaves and how your architecture will scale.

Read more

2dsphere, GeoJSON, and Doctrine MongoDB

Jun 24 • Posted 1 year ago

By Jeremy Mikola, 10gen software engineer and maintainer of Doctrine MongoDB ODM.

It seems that GeoJSON is all the rage these days. Last month, Ian Bentley shared a bit about the new geospatial features in MongoDB 2.4. Derick Rethans, one of my PHP driver teammates and a renowned OpenStreetMap aficionado, recently blogged about importing OSM data into MongoDB as GeoJSON objects. A few days later, GitHub added support for rendering .geojson files in repositories, using a combination of Leaflet.js, MapBox, and OpenStreetMap data. Coincidentally, I visited a local CloudCamp meetup last week to present on geospatial data, and for the past two weeks I’ve been working on adding support for MongoDB 2.4’s geospatial query operators to Doctrine MongoDB.

Doctrine MongoDB is an abstraction for the PHP driver that provides a fluent query builder API among other useful features. It’s used internally by Doctrine MongoDB ODM, but is completely usable on its own. One of the challenges in developing the library has been supporting multiple versions of MongoDB and the PHP driver. The introduction of read preferences last year is one such example. We wanted to still allow users to set slaveOk bits for older server and driver versions, but allow read preferences to apply for newer versions, all without breaking our API and abiding by semantic versioning. Now, the setSlaveOkay() method in Doctrine MongoDB will invoke setReadPreference() if it exists in the driver, and fall back to the deprecated setSlaveOkay() driver method otherwise.

Read more

Ruby, Rails, MongoDB and the Object-Relational Mismatch

Jun 18 • Posted 1 year ago

by Emily Stolfo, Ruby Engineer and Evangelist at 10gen

MongoDB is a popular choice among developers in part because it permits a one-to-one mapping between object-oriented (OO) software objects and database entities. Ruby developers are at a great advantage in using MongoDB because they are already used to working with and designing software that is purely object-oriented.

Most of the discussions I’ve had about MongoDB and Ruby assume Ruby knowledge and explain why MongoDB is a good fit for the Rubyist. This post will do the opposite; I’m going to assume you know a few things about MongoDB but not much about Ruby. In showing the Rubyist’s OO advantage, I’ll share a bit about the Ruby programming language and its popularity, explain specifically how the majority of Ruby developers are using MongoDB, and then talk about the future of the 10gen Ruby driver in the context of the Rails community.

Read more

The MEAN Stack: Mistakes You’re Probably Making With MongooseJS, And How To Fix Them

Jun 6 • Posted 1 year ago

This is a guest post from Valeri Karpov, a MongoDB Hacker and co-founder of the Ascot Project. _If you’re interesting in learning about how to use MongoDB with Node.js, sign up for a free, [introductory 7-week course on MongoDB and Node.js](   For more MEAN Stack wisdom, check out his blog at  Valeri originally coined the term MEAN Stack while writing for the MongoDB blog, and you can find that post here.

If you’re familiar with Ruby on Rails and are using MongoDB to build a NodeJS app, you might miss some slick ActiveRecord features, such as declarative validation. Diving into most of the basic tutorials out there, you’ll find that many basic web development tasks are more work than you like. For example, if we borrow the style of, a route that pulls a document by its ID will look something like this:

app.get('/document/:id', function(req, res) { 
  db.collection('documents', function(error, collection) {
    collection.findOne({ _id : collection.db.bson_serializer.ObjectID.createFromHexString( },
        function(error, document) {
          if (error || !document) {
            res.render('error', {});
          } else {

            res.render('document', { document : document });

Read more

Integrating MongoDB Text Search with a Python App

Jun 4 • Posted 1 year ago

By Mike O’Brien, 10gen Software engineer and maintainer of Mongo-Hadoop

With the release of MongoDB 2.4, it’s now pretty simple to take an existing application that already uses MongoDB and add new features that take advantage of text search. Prior to 2.4, adding text search to a MongoDB app would have required writing code to interface with another system like Solr, Lucene, ElasticSearch, or something else. Now that it’s integrated with the database we are already using, we can accomplish the same result with reduced complexity, and fewer moving parts in the deployment.

Here we’ll go through a practical example of adding text search to Planet MongoDB, our blog aggregator site.

Read more

Go Agent, Go

May 29 • Posted 1 year ago

Discuss on Hacker News

10gen introduced MongoDB Backup Service in early May. Creating a backup service for MongoDB was a new challenge, and we used the opportunity to explore new technologies for our stack. The final implementation of the MongoDB Backup Service agent is written in Go, an open-source, natively executable language initiated and maintained by Google.

Why did we Go with Go?

The Backup Service started as a Java project, but as the project matured, the team wanted to move to a language that compiled natively on the machine. After considering a few options, the team decided that Go was the best fit for its C-like syntax, strong standard library, the resolution of concurrency problems via goroutines, and painless multi-platform distribution.

Read more

MongoDB’s New Matcher

May 28 • Posted 1 year ago

Discuss on Hacker News

MongoDB 2.5.0 (an unstable dev build) has a new implementation of the “Matcher”. The old Matcher is the bit of code in Mongo that takes a query and decides if a document matches a query expression. It also has to understand indexes so that it can do things like create a subsets of queries suitable for index covering. However, the structure of the Matcher code hasn’t changed significantly in more than four years and until this release, it lacked the ability to be easily extended. It was also structured in such a way that its knowledge could not be reused for query optimization. It was clearly ready for a rewrite.

The “New Matcher” in 2.5.0 is a total rewrite. It contains three separate pieces: an abstract syntax tree (hereafter ‘AST’) for expression match expressions, a parser from BSON into said AST, and a Matcher API layer that simulates the old Matcher interface while using all new internals. This new version is much easier to extend, easier to reason about, and will allow us to use the same structure for matching as for query analysis and rewriting.

Read more
blog comments powered by Disqus