Posts tagged:


MongoDB’s New Matcher

May 28 • Posted 1 year ago

Discuss on Hacker News

MongoDB 2.5.0 (an unstable dev build) has a new implementation of the “Matcher”. The old Matcher is the bit of code in Mongo that takes a query and decides if a document matches a query expression. It also has to understand indexes so that it can do things like create a subsets of queries suitable for index covering. However, the structure of the Matcher code hasn’t changed significantly in more than four years and until this release, it lacked the ability to be easily extended. It was also structured in such a way that its knowledge could not be reused for query optimization. It was clearly ready for a rewrite.

The “New Matcher” in 2.5.0 is a total rewrite. It contains three separate pieces: an abstract syntax tree (hereafter ‘AST’) for expression match expressions, a parser from BSON into said AST, and a Matcher API layer that simulates the old Matcher interface while using all new internals. This new version is much easier to extend, easier to reason about, and will allow us to use the same structure for matching as for query analysis and rewriting.

Read more

New Geo Features in MongoDB 2.4 

May 21 • Posted 1 year ago


Geometric processing as a field of study has many applications, and has resulted in lots of research, and powerful tools. Many modern web applications have location based components, and require a data storage engines capable of managing geometric information. Typically this requires the introduction of an additional storage engine into your infrastructure, which can be a time consuming and expensive operation.

MongoDB has a set of geometric storage and search features. The MongoDB 2.4 release brought several improvements to MongoDB’s existing geo capabilities and the introduction of the 2dsphere index.

The primary conceptual difference (though there are also many functional differences) between the 2d and 2dsphere indexes, is the type of coordinate system that they consider. Planar coordinate systems are useful for certain applications, and can serve as a simplifying approximation of spherical coordinates. As you consider larger geometries, or consider geometries near the meridians and poles however, the requirement to use proper spherical coordinates becomes important.

In addition to this major conceptional difference, there are also significant functional differences, which are outlined in some depth in the Geospatial Indexes and Queries section of the MongoDB documentation. This post will discuss the new features that have been added in the 2.4 release.

Read more


May 28 • Posted 5 years ago

MongoDB stores documents (objects) in a format called BSON.  BSON is a binary serialization of JSON-like documents. BSON stands for “Binary JSON”, but also  contains extensions that allow representation of data types that are not part of JSON.  For example, BSON has a Date data type and BinData type.

The MongoDB client drivers perform the serialization and unserialization.  For a given language, the driver performs translation from the language’s “object” (ordered associative array) data representation to BSON, and back. While the client performs this work, the database understands the internals of the format and can “reach into” BSON objects when appropriate: for example to build index keys, or to match an object against a query expression.  That is to say, MongoDB is not just a blob store.

Thus, BSON is a language independent data interchange format.

The BSON serialization code from any MongoDB driver can be used to serialize and unserialize BSON, even for applications where the Mongo database proper is completely uninvolved.  This usage is encouraged and we would be happy to work with others on making the format as generically useful as possible.

Other Formats

The key advantage over XML and JSON is efficiency (both in space and compute time), as it is a binary format.

BSON can be compared to binary interchange formats, such as Protocol Buffers.  BSON is more “schemaless” than Protocol Buffers — this being both an advantage in flexibility, and a slight disadvantage in space as BSON has a little overhead for fieldnames within the serialized BSON data.

See Also

BSON Specification

blog comments powered by Disqus