Posts tagged:

data model

Aggregation Options on Big Data Sets Part 1: Basic Analysis using a Flights Data Set

Aug 21 • Posted 11 months ago

By Daniel Alabi and Sweet Song, MongoDB Summer Interns

Flights Dataset Overview

This is the first of three blog posts from this summer internship project showing how to answer questions concerning big datasets stored in MongoDB using MongoDB’s frameworks and connectors.

The first dataset explored was a domestic flights dataset. The Bureau of Transportation Statistics provides information for every commercial flight from 1987, but we narrowed down our project to focus on the most recent available data for the past year (April 2012-March 2013).

We were particularly attracted to this dataset because it contains a lot of fields that are well suited for manipulation using the MongoDB aggregation framework.

Read more

MongoDB 2.2 Released

Aug 29 • Posted 1 year ago

We are pleased to announce the release of MongoDB version 2.2.  This release includes over 1,000 new features, bug fixes, and performance enhancements, with a focus on improved flexibility and performance. For additional details on the release:

New Features

Aggregation Framework

The Aggregation Framework is available in its first production-ready release as of 2.2. The aggregation framework makes it easier to manipulate and process documents inside of MongoDB, without needing to useMap Reducez,/span>, or separate application processes for data manipulation.

See the aggregation documentation for more information.

Additional “Data Center Awareness” Functionality

2.2 also brings a cluster of features that make it easier to use MongoDB for larger more geographically distributed contexts. The first change is a standardization of read preferences across all drivers and sharded (i.e. mongos) interfaces. The second is the addition of “tag aware sharding,” which makes it possible to ensure that data in a geographically distributed sharded cluster is always closest to the application that will use that data the most.

Improvements to Concurrency

v2.2 eliminates the global lock in the mongod process.  Locking is now per database.  In addition a new subsystem avoids locks under most page-fault events; thus concurrency improves even on systems with a single database.   Parallelism in application of writes on secondaries is enhanced also.  See this video for more details.

We’re looking forward to your feedback on 2.2. Keep the Jira Issues, blog posts, user group posts, and tweets coming.

- Eliot and the 10gen/MongoDBteam

Like what you see? Get MongoDB updates straight to your inbox

What is the Right Data Model?

Jul 16 • Posted 5 years ago

There is certainly plenty of activity in the nonrelational (“NOSQL”) db space right now.  We know for these projects the data model is not relational.  But what is the data model?  What is the right model?

There are many possibilities, the most popular of which are:

Key/Value. Pure key/value stores are blobs stored by key.

Tabular. Some projects use a Google BigTable-like data model which we call “tabular” here — or one can think of it as “multidimensional tabular”.

Document-Oriented. Typical of these are JSON-style data stores.

We think this is a very important topic.  What is the right data model?  Should there be standardization?

Below are some thoughts on the approaches above.  Of course, as MongoDB committers, we are biased — you know which one we’re going to like.

Key/value has the advantage of being simple.  It is easy to make such systems fast and scalable.  Con is that it is too simple for easy implementation of some real world problems.  We’d like to see something more general purpose.

The tabular space brings more flexibility.  But why are we sticking to tables?  Shouldn’t we do something closer to the data model of our programming languages?  Tabular jettisons the theoretical underpinnings of relational algebra, yet we still have significant mapping work from program objects to “tables”.  If I were going to work with tables, I’d really like to have full relational power.

We really like the document-oriented approach.  The programming languages we use today, not to mention web services, map very nicely to say, JSON.  A JSON store gives us an object-like representation, yet also is not tied too tightly to any one single language, which seems wrong for a database.

Would love to hear the thoughts of others.

See also: the BSON blog post

blog comments powered by Disqus