MongoDB’s New Matcher

May 28 • Posted 11 months ago

Discuss on Hacker News

MongoDB 2.5.0 (an unstable dev build) has a new implementation of the “Matcher”. The old Matcher is the bit of code in Mongo that takes a query and decides if a document matches a query expression. It also has to understand indexes so that it can do things like create a subsets of queries suitable for index covering. However, the structure of the Matcher code hasn’t changed significantly in more than four years and until this release, it lacked the ability to be easily extended. It was also structured in such a way that its knowledge could not be reused for query optimization. It was clearly ready for a rewrite.

The “New Matcher” in 2.5.0 is a total rewrite. It contains three separate pieces: an abstract syntax tree (hereafter ‘AST’) for expression match expressions, a parser from BSON into said AST, and a Matcher API layer that simulates the old Matcher interface while using all new internals. This new version is much easier to extend, easier to reason about, and will allow us to use the same structure for matching as for query analysis and rewriting.

This matcher rewrite is part of a larger project to restructure query execution, to optimize them, and to lay the groundwork for more advanced queries in the future. One planned optimization is index intersection. For example, if you have an index on each of ‘a’ and ‘b’ attributes, we want a query of the form { a : 5 , b : 6 } to do an index intersection of the two indexes rather than just use one index and discard the documents from that index that don’t match. Index intersection would also be suitable for merging geo-spatial, text and regular indexes together in fun and interesting ways (i.e. a query to return all the users in a 3.5 mile radius of a location with a greater than #x# reputation who are RSVP’ed ‘yes’ for an event).

A good example of an extension we’d like to enable is self referential queries, such as finding all documents where a = b + c. (This would be written { a : { $sum : [ “$b” , “$c” ] } }). With the new Matcher, such queries are easy to implement as a native part of the language.

Now that the Matcher re-write is ready for testing, we’d love people to help test it by trying out MongoDB 2.5.0. (Release Notes)

Code

By Eliot Horowitz, 10gen CTO, MongoDB core contributor. You can find the original post on his personal blog