Posts tagged:

Python

Managing the web nuggets with MongoDB and MongoKit

Sep 27 • Posted 10 months ago

This is a guest post by Nicolas Clairon, maintainer of MongoKit and founder of Elkorado

MongoKit is a python ODM for MongoDB. I created it in 2009 (when the ODM acronym wasn’t even used) for my startup project called Elkorado. Now that the service is live, I realize that I never wrote about MongoKit. I’d like to introduce it to you with this quick tutorial based on real use cases from Elkorado.

Elkorado: a place to store web nuggets

Elkorado is a collaborative, interest-based curation tool. It was born over the frustration that there is no place where to find quality resources about a particular topic of interest. There are so many blogs, forums, videos and websites out there that it is very difficult to find our way over this massive wealth of information.

Elkorado aims at helping people to centralize quality content, so they can find them later easily and discover new ones.

MongoDB to the rescue

Rapid prototyping is one of the most important thing in startup world and it is an area where MongoDB shines.

The web is changing fast, and so are web resources and their metadata. MongoDB’s and schemaless database is a perfect fit to store this kind of data. After losing hair by trying to use polymorphism with SQL databases, I went into MongoDB… and I felt in love with it.

While playing with the data, I needed a validation layer and wanted to add some methods to my documents. Back then, they was no ODM for Python. And so I created MongoKit.

MongoKit: MongoDB ODM for Python

MongoKit is a thin layer on top of Pymongo. It brings field validations, inheritance, polymorphism and a bunch of other features. Let’s see how it is used in Elkorado.

Elkorado is a collection of quality web resources called nuggets. This is how we could fetch a nugget discovered by the user “namlook” with Pymongo:

nuggets here is a regular python dict.

Here’s a simple nugget definition with MongoKit:

Fetching a nugget with MongoKit is pretty the same:

However, this time, nugget is a Nugget object and we can call the is_popular method on it:

One of the main advantages of MongoKit is that all your models are registered and accessible via the connection instance. MongoKit look at the __database__ and __collection__ fields to know which database and which collection has to be used. This is useful so we have only one place to specify those variables.

Inheritance

MongoKit was first build to natively support inheritance:

In this Core object, we are defining the database name and some fields that will be shared by other models.

If one wants a Nugget object to have date metadata, one just have to make it inherit from Core:

It’s all about Pymongo

With MongoKit, your are still very close to Pymongo. In fact, MongoKit’s connection, database and collection are subclasses of Pymongo’s. If once in an algorithm, you need pure performances, you can directly use Pymongo’s layer which is blazing fast:

Here, connection is a MongoKit connection but it can be used like a Pymongo connection. Note that to keep the benefice of DRY, we can call the pymongo’s layer from a MongoKit document:

A real life “simplified” example

Let’s see an example of CRUD done with MongoKit.

On Elkorado, each nugget is unique but multiple users can share a nugget which have differents metadata. Each time a user picks up a nugget, a UserNugget is created with specific informations. If this is the first time the nugget is discovered, a Nugget object is created, otherwise, it is updated. Here is a simplified UserNugget structure:

This example well describes what can be done with MongoKit. Here, the save method has been overloaded to check if a nugget exists (remember, each nugget is unique by its URL). It will create it if it is not already created, and update it.

Updating data with MongoKit is similar to Pymongo. Use save on the object or use directly the Pymongo’s layer to make atomic updates. Here, we use atomic updates to push new topics and increase the popularity:

Getting live

Let’s play with our model:

When calling the save method, the document is validated against the UserNugget’s structure. As expected, the fields created_at and updated_at have been added:

and the related nugget has been created:

Conclusion

MongoKit is a central piece of Elkorado. It has been written to be small and minimalist but powerful. There is so much more to say about features like inherited queries, i18n and gridFS, so take a look at the wiki to read more about how this tool can help you.

Check the documentation for more information about MongoKit. And if you register on Elkorado, check out the nuggets about MongoDB. Don’t hesitate to share you nuggets as well, the more the merrier.

Integrating MongoDB Text Search with a Python App

Jun 4 • Posted 1 year ago

By Mike O’Brien, 10gen Software engineer and maintainer of Mongo-Hadoop

With the release of MongoDB 2.4, it’s now pretty simple to take an existing application that already uses MongoDB and add new features that take advantage of text search. Prior to 2.4, adding text search to a MongoDB app would have required writing code to interface with another system like Solr, Lucene, ElasticSearch, or something else. Now that it’s integrated with the database we are already using, we can accomplish the same result with reduced complexity, and fewer moving parts in the deployment.

Here we’ll go through a practical example of adding text search to Planet MongoDB, our blog aggregator site.

Read more

Lessons Learnt Building mongoengine

Nov 29 • Posted 1 year ago

Recently, I attended both Pycon UK and Pycon Ireland to talk about the lessons I have learnt while maintaining mongoengine. The conferences were both excellent and surprisingly different. Pycon UK had quite an “unconference” feel, with some exciting sprint rooms - I wish I had more time as by all reports the educational jam was inspirational. Pycon Ireland in contrast felt more slick with booths from DemonWare, Amazon and Facebook. If you can, I’d advise going to both conferences as they really complement each other.

Read more

Motor: Asynchronous Driver for MongoDB and Python

Sep 5 • Posted 1 year ago

Tornado is a popular asynchronous Python web server. Alas, to connect to MongoDB from a Tornado app requires a tradeoff: You can either use PyMongo and give up the advantages of an async web server, or use AsyncMongo, which is non-blocking but lacks key features.

I decided to fill the gap by writing a new async driver called Motor (for “MOngo + TORnado”), and it’s reached the public alpha stage. Please try it out and tell me what you think. I’ll maintain a homepage for it here, including basic documentation.

Status

Motor is alpha. It is certainly buggy. Its implementation and possibly its API will change in the coming months. I hope you’ll help me by reporting bugs, requesting features, and pointing out how it could be better.

Advantages

Two good projects, AsyncMongo and APyMongo, took the straightforward approach to implementing an async MongoDB driver: they forked PyMongo and rewrote it to use callbacks. But this approach creates a maintenance headache: now every improvement to PyMongo must be manually ported over. Motor sidesteps the problem. It uses a Gevent-like technique to wrap PyMongo and run it asynchronously, while presenting a classic callback interface to Tornado applications. This wrapping means Motor reuses all of PyMongo’s code and, aside from GridFS support, Motor is already feature-complete. Motor can easily keep up with PyMongo development in the future.

Installation

Motor depends on greenlet and, of course, Tornado. It is compatible with CPython 2.5, 2.6, 2.7, and 3.2; and PyPy 1.9. You can get the code from my fork of the PyMongo repo, on the motor branch:

pip install tornado greenlet pip install git+https://github.com/ajdavis/mongo-python-driver.git@motor To keep up with development, watch my repo and do

pip install -U git+https://github.com/ajdavis/mongo-python-driver.git@motor when you want to upgrade.

Example

Here’s an example of an application that can create and display short messages:

Other examples are Chirp, a Twitter-like demo app, and Motor-Blog, which runs this site.

Support For now, email me directly if you have any questions or feedback.

Roadmap In the next week I’ll implement the PyMongo feature I’m missing, GridFS. Once the public alpha and beta stages have shaken out the bugs and revealed missing features, Motor will be included as a module in the official PyMongo distribution.

A. Jesse Jiryu Davis

Getting going quickly with Python, MongoDB, and Spatial data on OpenShift: Part II

Aug 18 • Posted 2 years ago

This post originally appeared on the OpenShift blog

As a follow up to my last post about getting spatial going in MongoDB on OpenShift, today we are going to put a web service in front of it using Python. There are several goals for this article:

  • Learn a little bit about Flask - a Python web framework
  • Learn about how to connect to MongoDB from Python
  • Create a REST Style web service to use in our SoLoMo application

I hope by the end you can see how using a Platform as a Service can get you going with Python, MongoDB, and Spatial faster than you can say…“Awesome Sauce”. We have a lot of ground to cover so let’s dig right in.

Creating the Python application

Here is OpenShift the command line to create the Python app

rhc app create -t python-2.6 -a pythonws 

Using the flask quickstart from GitHub

We have already put together a flask quickstart in the openshift github space. To get the framework into your application all you have to do is (from the README.md):

cd pythonws git remote add upstream -m master git://github.com/openshift/openshift-mongo-flask-example.git git pull -s recursive -X theirs upstream master 

There we now have a flask app that we can modify source code.

If you want to just check out the source code I used in the app you can see it on Github and follow the README.md instructions to clone it into your OpenShift account

Adding MongoDB and importing data

Time to add MongoDB to our application:

 rhc app cartridge add -a pythonws -t mongodb-2.0 

The previous post in this series will cover how to import the data from a JSON file of the national parks into your mondodb database and prepare it for spatial queries. Please follows those instructions to import the data into the pythonws DB into a collection called parkpoints.

Quick digression to explain Flask

Before we get into our specific application I am going to take a moment to explain the Python framework for this demo. Flask basically allows you to map URL patterns to methods (it also does a lot more, like templating, but this is the only part we are using today). For example, in the mybottleapp.py file that is now in your project you can find the line: @route(‘/’) def index(): return ‘Hello World!

This says that when a request comes in for the base URL, the function named

index gets executed. In this case the function just returns the string “Hello World!” and returning has the effect of sending the string to the requestor. @route(‘/name/’) def nameindex(name=’Stranger’): return ‘Hello, %s!’ % name

We can also grab pieces of the requested URL and pass it into the function. By enclosing a part of the URL in a < >, it indicates that we want to access it within our function. Here you can see where if the url looks like:

http://www.mysite.com/name/steve

Then the response will be Hello, steve!

Or the URL could be http://www.mysite.com/name

Hello, Stranger!

We are going to define URL mappings for some basic REST like functionality to interact with our spatial MongoDB data store.

Modify the source code

The first function we are going to write will be to just simply return all the records in the database. In a more full featured app you would probably want to add pagination and other features to this query but we won’t be doing that today.@app.route(“/ws/parks”) def parks(): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #query the DB for all the parkpoints result = db.parkpoints.find() #Now turn the results into valid JSON return str(json.dumps({'results':list(result)},default=json_util.default)) 

I chose to put the web services under the url /ws/parks so that we could use other parts of the URL namespace for other functionality. You can now go to your application URL (http://pythonws-.rhcloud.com/ws/parks) and you should be able to see all the documents in the DB.

Using MongoDB in Python

In the code above we simply make a connection to the MongoDB instance for this application and then execute a query. The pymongo package provides all the functionality to interact with the MongoDB instance from our Python code. The pymongo commands are very similar to the MongoDB command line interaction except two word commands like db.collection.findOne are split with a _, such as db.collection.find_one. Please go to the pymongo site to read more about the documentation.

Notice we use the environment variables to specify the connection URL. While not hard coding database connection parameters is good practice in non-cloud apps, in our case you MUST use the environment variables. Since your app can be idled and then spun up or it could be autoscaled, the IP and ports are not always guaranteed. By using the environment variables we make our code portable.

We pass the result set (which comes back as a Python dictionary) into json.dump so we can return JSON straight to the client. Since pymongo is returning the results in UTF and we want just plain text, we need to pass the json_util.default from the bson library into the json.dump command.

This is probably the easiest experience I have ever had writing a web service. I love Flask, Pymongo, and Python for the simplicity of “Just Getting Stuff Done”.

Grab just one park

Next we will implement the code to get back a park given a parks uniqueID. For ID we will just use the ID generated by MongoDB on document insertion (_id). The ID looks like a long random sequence and that is what we will pass into the URL.

return a specific park given it’s mongo _id

@app.route(“/ws/parks/park/”) def onePark(parkId): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #query based on the objectid result = db.parkpoints.find({'_id': objectid.ObjectId(parkId)}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) 

Here you have to use another class from the bson library - ObjectID. The actual ObjectID in MongoDB is an object and so we have to take the ID passed in on the url and create an Object from it. The ObjectID class allows us to create one of these objects to pass into the query. Other than that the code is the same as above.

This little snippet also shows an example of grabbing part of the URL and passing it to a function. I explained this concept above but here we can see it in practice.

Time for the spatial query

Here we do a query to find national parks near a lattitude longitude pair

find parks near a lat and long passed in as query parameters (near?lat=45.5&lon=-82)

@app.route(“/ws/parks/near”) def near(): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #get the request parameters lat = float(request.args.get('lat')) lon = float(request.args.get('lon')) #use the request parameters in the query result = db.parkpoints.find({"pos" : { "$near" : [lon,lat]}}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) 

This piece of code shows how to get request parameters from the URL. We capture the lat and lon from the request url and then cast them to floats to use in our query. Remember, everything in a URL comes across as a string so it needs to be converted before being used in the query. In a production app you would need to make sure that you were actually passed strings that could be parsed as floating point numbers. But since this app is just for demo purposes I am not going to show that here.

Once we have the coordinates, we pass them in the the query just like we did from the command line MongoDB client. The results come back in distance order from the point passed into the query. Remember, the ordering of the coordinates passed into the query need to match the ordering of the coordinates in your MongoDB collection.

Finish it off with a Regex query with spatial goodness

The final piece of code we are going to write allows for a query based both on the name and the location of interest.

find parks with a certain name (using regex) near a lat long pair such as above

@app.route(“/ws/parks/name/near/”) def nameNear(name): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #get the request parameters lat = float(request.args.get('lat')) lon = float(request.args.get('lon')) #compile the regex we want to search for and make it case insensitive myregex = re.compile(name, re.I) #use the request parameters in the query along with the regex result = db.parkpoints.find({"Name" : myregex, "pos" : { "$near" : [lon,lat]}}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) 

Just like the example above we parse out the lat and lon from the URL query parameters. In looking at my architecture I do think it might have been better to add the name as a query parameter as well, but this will still work for this article. We grab the name from the end of the URL path and then compile it into a standard Python regular expression (regex). I added the re.I to make the regex case-insenstive. I then use the regex to search against the Name field in the document collection and do a geo search against the pos field. Again, the results will come back in distance order from the point passed into the query.

Conclusion

And with that we have wrapped up our little web service code - simple and easy using Python and MongoDB. Again, there are some further changes required for going to production, such as request parameter checking, maybe better URL patterns, exception catching, and perhaps a checkin URL - but overall this should put you well on your way. There are examples of:

  • Using Flask to write some nice REST style services in Python
  • Various methods to get URL information so you can use it in your code
  • How to interact with your MongoDB in Python using PyMongo and BSON libraries
  • Getting spatial data out of your application

Give it all a try on OpenShift and drop me a line to show me what you built. I can’t wait to see all the interesting spatial apps built by shifters.

MongoDB Blogroll: The Best of July 2012 

Aug 2 • Posted 2 years ago

Every month, we’ll be publishing the best community blog posts from the month. Here is the digest for July:

Want your blog post to be included in the next update? Tweet it out with the #mongodb hashatag or send it to us directly

Hadoop Streaming Support for MongoDB

Jun 7 • Posted 2 years ago

MongoDB has some native data processing tools, such as the built-in Javascript-oriented MapReduce framework, and a new Aggregation Framework in MongoDB v2.2. That said, there will always be a need to decouple persistance and computational layers when working with Big Data.

Enter MongoDB+Hadoop: an adapter that allows Apache’s Hadoop platform to integrate with MongoDB.

Using this adapter, it is possible to use MongoDB as a real-time datastore for your application while shifting large aggregation, batch processing, and ETL workloads to a platform better suited for the task.

          

Well, the engineers at 10gen have taken it one step further with the introduction of the streaming assembly for Mongo-Hadoop.

What does all that mean?

The streaming assembly lets you write MapReduce jobs in languages like Python, Ruby, and JavaScript instead of Java, making it easy for developers that are familiar with MongoDB and popular dynamic programing languages to leverage the power of Hadoop.

                    

It works like this:

Once a developer has Java installed and Hadoop ready to rock they download and build the adapter. With the adapter built, you compile the streaming assembly, load some data into Mongo, and get down to writing some MapReduce jobs.

The assembly streams data from MongoDB into Hadoop and back out again, running it through the mappers and reducers defined in a language you feel at home with. Cool right?

Ruby support was recently added and is particularly easy to get started with. Lets take a look at an example where we analyze twitter data.

Import some data into MongoDB from twitter:

This script curls the twitter status stream and and pipes the json into mongodb using mongoimport. The mongoimport binary has a couple of flags: “-d” which specifies the database “twitter” and -c which specifies the collection “in”.

Next, write a Mapper and save it in a file called mapper.rb:

The mapper needs to call the MongoHadoop.map function and passes it a block. This block takes an argument “docuement” and emits a hash containing the user’s timezone and a count of 1.

Now, write a Reducer and save it in a file called reducer.rb:

The reducer calls the MongoHadoop.reduce function and passes it a block. This block takes two parameters, a key and an array of values for that key, reduces the values into a single aggregate and emits a hash with the same key and the newly reduced value.

To run it all, create a shell script that executes hadoop with the streaming assembly jar and tells it how to find the mapper and reducer files as well as where to retrieve and store the data:

Make them all executable by running chmod +x on the all the scripts and run twit.sh to have hadoop process the job.

MongoDB Driver Releases: April

May 8 • Posted 2 years ago

We’ve had a big month with updates and improvements to our drivers.  Here’s a summary:

blog comments powered by Disqus