Using the Python toolkit Ming to accelerate your MongoDB development

Jul 24 • Posted 2 years ago

This is a guest post from Rick Copeland of Arborian.

Ming is a Python toolkit providing schema enforcement, an object/document mapper, an in-memory database, and various other goodies developed at SourceForge during our rewrite of the site from a PHP/Postgres stack to a Python/MongoDB one.

Why Ming?

If you’ve come to MongoDB from the world of relational databases, you have probably been struck by just how easy everything is: no big object/relational mapper needed, no new query language to learn (well, maybe a little, but we’ll gloss over that for now), everything is just Python dictionaries, and it’s so, so fast! While this is all true to some extent, one of the big things you give up with MongoDB is structure.

MongoDB is sometimes referred to as a schema-free database. (This is not technically true; I find it more useful to think of MongoDB as having dynamically typed documents. The collection doesn’t tell you anything about the type of documents it contains, but each individual document can be inspected.) While this can be nice, as it’s easy to iterate on your schema quickly in development, it’s also easy to get yourself in trouble the first time your application tries to query by a field that only exists in some of your documents.

The fact of the matter is that even if the database cares nothing about your schema, your application does, and if you play too fast and lose with document structure, it will come back to haunt you in the end. At SourceForge, we created Ming (as in “…the Merciless”, the villan who ruled the planet Mongo in Flash Gordon) to deal with precisely this problem. We wanted a (thin) layer on top of PyMongo that would do a couple of things for you:

  • Make sure that we don’t put malformed data into the database
  • Try to ‘fix’ malformed data coming back from the database

Ming’s Architecture

Ming’s architecture is based on the excellent SQL toolkit SQLAlchemy. While much younger than SQLAlchemy and not including any of its code, MongoDB takes its design inspiration from there.

Ming actually consists of a number of components, including:

  • The schema enforcement layer - This is ‘basic’ Ming, providing validation and conversion of documents on their way in and out of MongoDB. There are actually two APIs at this layer, the imperative syntax and a more declarative syntax.
  • The object/document mapper - The ODM Layer extends the schema enforcement layer by providing a unit of work, identity map, and psuedo-relational concepts (one-to-many joins, for instance).
  • MongoDB-in-Memory - This is layer designed to be a drop-in replacement for the native pymongo driver used for testing your application without needing to have access to a MongoDB server.

Let’s take a look at each of these components in turn…

Ming Schema Enforcement

A Ming schema is fairly straightforward. Below is an example containing the schema for a blog post in both the imperative and declarative syntaxes:

from ming import collection, Field, Session
from ming import schema as S

session = Session() # ming abstraction for database

# Set up the User schema ahead-of-time
User = dict(username=str, display_name=str)

# "Imperative" style
BlogPost = collection(
   'blog.posts', session, 
   Field('_id', S.ObjectId),
   Field('posted', datetime, if_missing=datetime.utcnow),
   Field('title', str),
   Field('author', User),
   Field('text', str),
   Field('comments', [ 
       dict(author=User,
            posted=S.DateTime(if_missing=datetime.utcnow),
            text=str) ]))

# "Declarative" style
from ming.declarative import Document

class BlogPost(Document):
    class __mongometa__:
        session=session
        name='blog.posts'
        indexes=['author.name', 'comments.author.name']
    _id=Field(str)
    title=Field(str)
    posted=Field(datetime, if_missing=datetime.utcnow)
    author=Field(User)
    text=Field(str)
    comments=Field([
        dict(author=User, 
             posted=datetime,
             text=str) ])

Once you have your schema set up, you can use it to perform all the same operations you can do in pymongo using the manager object attached to the attribute m:

# Bind the session to the database
from ming.datastore import DataStore 
session.bind = DataStore(
    'mongodb://localhost:27017', database='test')

# Queries
BlogPost.m.find(...) # equiv. to db.blog.posts.find(...)

# Inserts
post0 = BlogPost(dict(... fields here ... ))
post0.m.insert()

# Updates using save()
post1 = BlogPost.m.find({'author.username': 'rick446'}).first()
post1.author.username = 'rick447'
post1.m.save()

# Updates using update_partial()
BlogPost.m.update_partial(
  { '_id': ... },
  { '$push': { 'comments': {... comment data...} } })

# Deletes
post1.m.delete() # single document
BlogPost.m.remove({...query...}) # delete by query

The Object-Document Mapper

Building on the schema enforcement layer is the object-document mapper, which provides two useful patterns:

  • Unit of Work - This pattern collects the changes to your objects in memory until a point at which you flush() them all to the database at once.
  • Identity Map - This guarantees that if you load the same database document twice, you’ll get the same object in memory. This keeps you from accidentally loading the object twice, modifying it twice, and having your two sets of changes overwrite one another.

Ming also allows you to model relationships between your documents via ForeignIdProperty and RelationProperty. Here is an example schema for a blog hosting site with multiple blogs:

from ming import schema as S
from ming.odm.declarative import MappedClass
from ming.odm.property import FieldProperty, RelationProperty
from ming.odm.property import ForeignIdProperty
from ming.odm import ODMSession

# wrap the session from the schema layer
odm_session = ODMSession(session)

class Blog(MappedClass):
    class __mongometa__:
        session = odm_session
        name = 'blog.blog'

    _id = FieldProperty(S.ObjectId)
    name = FieldProperty(str)
    posts = RelationProperty('Post')

class Post(MappedClass):
    class __mongometa__:
        session = odm_session
        name = 'blog.posts'

    _id = FieldProperty(S.ObjectId)
    title = FieldProperty(str)
    text = FieldProperty(str)
    blog_id = ForeignIdProperty(Blog)
    blog = RelationProperty(Blog)

Once you have the classes defined, you can load and modify the objects, using the odm_session to save your changes to MongoDB:

# Queries
Blog.query.find(...) # equiv. to db.blog.posts.find(...)
blog = Blog.query.get(name='MongoDB Blog')
blog.posts # returns a list of post objects for the blog
blog.posts[0].blog # returns the blog object

# Inserts
post = Post(blog=blog, ...) # automatically sets blog_id

# Updates 
post.title = 'The cool post'

# Save your changes
odm_session.flush()

# Mark post for deletion
post.delete()

# Actually delete
odm_session.flush()

MongoDB-in-Memory

The third main component of Ming is an implementation of the pymongo API that allows you to perform testing of your application without having a dependency on a MongoDB server. To use MIM, you can swap out the creation of your pymongo connection:

from ming import mim
import unittest

class TestCase(unittest.TestCase):

    def setUp(self):
        # self.connection = Connection()
        self.connection = mim.Connection()

MIM’s support of the pymongo api and MongoDB query syntax has largely been driven by the various APIs and queries used internal to SourceForge, so there are some gaps, but these are rapidly filled when reported. For instance, MIM does provide support for gridfs and mapreduce already (mapreduce Javascript support provided by python-spidermonkey). And of course MIM integrates well with the rest of Ming, allowing you to substitute a mim:// URL for the normal mongodb:// url in your datastore:

from ming import mim
from ming.datastore import DataStore
import unittest

class TestCase(unittest.TestCase):

    def setUp(self):
        self.ds = DataStore(
            'mongodb://localhost:27017', database='test')

Conclusion

There are other good bits in MongoDB, including lazy and eager migrations, support for the MongoDB filesystem gridfs, WSGI auto-flushing middleware for the ODMSession, and more. We’re also experimenting with support for GQL, Google’s query language for the Google App Engine (GAE), to facilitate porting apps from GAE to MongoDB. Ming is actively maintained and is a mission-critical part of the SourceForge application stack, where it’s been in production use for over 2 years.

So what do you think? Is Ming something that you would use for your projects? Have you chosen one of the other MongoDB mappers? Please let us know in the comments below!

To learn more about development with Ming, check out Rick’s ebook MongoDB with Python and Ming or visit the Atlanta MongoDB User Group on Wednesday, where Rick is presenting.

blog comments powered by Disqus
blog comments powered by Disqus