Data Science on MongoDB…At Last!

Nov 7 • Posted 1 year ago

Today, I’m excited to announce the launch of Precog for MongoDB, a release that bundles all of the really cool Precog technology into a free package that anyone can download and deploy on their existing MongoDB database.

Precog is a data science platform that lets developers and data scientists do advanced analytics and statistics using Quirrel, the “R for big data” language. You can analyze data programmatically with a REST API (or client library) or interactively with Labcoat, an easy-to-use HTML5 application built on the REST API. We provide a cloud-hosted version of Precog, but we’ve known for a long time that we were going to bring a standalone version of our data science Precog to some NoSQL database.
MongoDB makes the perfect choice for many reasons:
  • MongoDB developers share our passion for creating software that developers love to use.
  • Quirrel is designed to analyze JSON, which is natively supported by MongoDB.
  • MongoDB has a basic query and aggregation framework, but to do more advanced analytics, you have to write lots of custom code or export the data into a RDBMS, both of which are very painful.
  • We’re great friends of some of the 10gen developers and have released open source software for MongoDB.
Precog for MongoDB gives you the ability to analyze all the data in your MongoDB database, without forcing you to export data into another tool or write any custom code.

We’re really excited about the release and encourage you to download the release from the official product page and start using it today.

In the remainder of this post, I’m going to quickly walk you through installation and configuration of the Precog for MongoDB release.
Step 1: Unpack the Download

The download is a ZIP file that contains the following files:
The file precog.jar is the Java JAR that bundles all of the Precog dependencies into a single (really big!) file. The file precog.sh and precog.bat are scripts that launch precog.jar.

The file config.cfg contains configuration information.

Step 2: Configure Precog

All the configuration settings for Precog are stored in the file config.cfg, with reasonable defaults

There are two things you need to do at a minimum before you can launch Precog:

  1. Tell Precog where to find the MongoDB server.
  2. Tell Precog what the master account is.

To tell Precog where to find the MongoDB server, simply edit the following settings:

Change the “localhost:27017” portion to the host and port of your mongo server. For optimal performance, you should launch Precog on the same machine that is running the MongoDB server.

Precog will map the MongoDB databases and collections into the file system by placing the databases at the top level of the file system, and will nest the database collections under the databases (e.g. /mydb/mycollection/).

To tell Precog what the master account is, edit config.cfg and add the following settings:

The API key for the master account can be anything you like, but you should treat it securely because they whoever has it has full access to all of your MongoDB data.

You may also want to tweak the ports that Precog use for the web server that exposes the Precog REST API and to server labcoat:



Step 3: Launch Precog

To run precog.jar, you will need to install JRE 6 or later (many systems already have Java installed). If you’re on an OS X or Linux machine, just run the precog script, which automatically launches Java:

precog.sh


If you’re on a Windows machine, you can launch Precog with the precog.bat script.

Once Precog has been launched, it will start a web server that exposes the REST API as well as labcoat.

Step 4: Try the API

Once Precog is running, you have full access to the Precog REST API. You can find a large number of open source client libraries available on Github, and the Precog developers site contains a bunch of documentation and tutorials for interacting with the API.


Step 5: Try Labcoat

Labcoat is an HTML5 application that comes bundled in the download. You don’t have to use Labcoat, of course, since Precog has a REST API, but Labcoat is the best way to interactively explore your data and develop Quirrel queries.

The precog.jar comes with a bundled web server for labcoat, so once it’s running just point your browser at http://localhost:8000/ (or whatever port you’ve configured it for) and you’ll have a new labcoat IDE pointing at your local Precog REST API.

Step 6: Analyze Data!

Once you’ve got Labcoat running, you’re all set! You should see your MongoDB collections in the file system explorer, and you can query data from the collections, develop queries to analyze the data, and export queries as code that run against your Precog server.

Precog is a beta product, and Precog for MongoDB is hot off the press. You may encounter a few rough corners, and if so, we’d love to hear about them (just send an email to support@precog.com).

If you end up doing something cool with Precog for MongoDB, or if you just want to say hello, feel free to reach out to us via our website, or to me personally at john@precog.com.

Have fun analyzing!

John A. De Goes, CEO/Founder of Precog