Posts tagged:


Storing Large Objects and Files in MongoDB

Sep 9 • Posted 5 years ago

Large objects, or “files”, are easily stored in MongoDB.  It is no problem to store 100MB videos in the database.  For example, MusicNation uses MongoDB to store its videos.

This has a number of advantages over files stored in a file system.  Unlike a file system, the database will have no problem dealing with millions of objects.  Additionally, we get the power of the database when dealing with this data: we can do advanced queries to find a file, using indexes; we can also do neat things like replication of the entire file set.

MongoDB stores objects in a binary format called BSON.  BinData is a BSON data type for a binary byte array.  However, MongoDB objects are typically limited to 4MB in size.  To deal with this, files are “chunked” into multiple objects that are less than 4MB each.  This has the added advantage of letting us efficiently retrieve a specific range of the given file.

While we could write our own chunking code, a standard format for this chunking is predefined, call GridFS.  GridFS support is included in many MongoDB drivers and also in the mongofiles command line utility.

A good way to do a quick test of this facility is to try out the mongofiles utility.  See the MongoDB documentation for more information on GridFS.


May 28 • Posted 5 years ago

MongoDB stores documents (objects) in a format called BSON.  BSON is a binary serialization of JSON-like documents. BSON stands for “Binary JSON”, but also  contains extensions that allow representation of data types that are not part of JSON.  For example, BSON has a Date data type and BinData type.

The MongoDB client drivers perform the serialization and unserialization.  For a given language, the driver performs translation from the language’s “object” (ordered associative array) data representation to BSON, and back. While the client performs this work, the database understands the internals of the format and can “reach into” BSON objects when appropriate: for example to build index keys, or to match an object against a query expression.  That is to say, MongoDB is not just a blob store.

Thus, BSON is a language independent data interchange format.

The BSON serialization code from any MongoDB driver can be used to serialize and unserialize BSON, even for applications where the Mongo database proper is completely uninvolved.  This usage is encouraged and we would be happy to work with others on making the format as generically useful as possible.

Other Formats

The key advantage over XML and JSON is efficiency (both in space and compute time), as it is a binary format.

BSON can be compared to binary interchange formats, such as Protocol Buffers.  BSON is more “schemaless” than Protocol Buffers — this being both an advantage in flexibility, and a slight disadvantage in space as BSON has a little overhead for fieldnames within the serialized BSON data.

See Also

BSON Specification

blog comments powered by Disqus