Posts tagged:

Cloud Hosting

Getting going quickly with Python, MongoDB, and Spatial data on OpenShift: Part II

Aug 18 • Posted 2 years ago

This post originally appeared on the OpenShift blog

As a follow up to my last post about getting spatial going in MongoDB on OpenShift, today we are going to put a web service in front of it using Python. There are several goals for this article:

  • Learn a little bit about Flask - a Python web framework
  • Learn about how to connect to MongoDB from Python
  • Create a REST Style web service to use in our SoLoMo application

I hope by the end you can see how using a Platform as a Service can get you going with Python, MongoDB, and Spatial faster than you can say…“Awesome Sauce”. We have a lot of ground to cover so let’s dig right in.

Creating the Python application

Here is OpenShift the command line to create the Python app

rhc app create -t python-2.6 -a pythonws 

Using the flask quickstart from GitHub

We have already put together a flask quickstart in the openshift github space. To get the framework into your application all you have to do is (from the

cd pythonws git remote add upstream -m master git:// git pull -s recursive -X theirs upstream master 

There we now have a flask app that we can modify source code.

If you want to just check out the source code I used in the app you can see it on Github and follow the instructions to clone it into your OpenShift account

Adding MongoDB and importing data

Time to add MongoDB to our application:

 rhc app cartridge add -a pythonws -t mongodb-2.0 

The previous post in this series will cover how to import the data from a JSON file of the national parks into your mondodb database and prepare it for spatial queries. Please follows those instructions to import the data into the pythonws DB into a collection called parkpoints.

Quick digression to explain Flask

Before we get into our specific application I am going to take a moment to explain the Python framework for this demo. Flask basically allows you to map URL patterns to methods (it also does a lot more, like templating, but this is the only part we are using today). For example, in the file that is now in your project you can find the line: @route(‘/’) def index(): return ‘Hello World!

This says that when a request comes in for the base URL, the function named

index gets executed. In this case the function just returns the string “Hello World!” and returning has the effect of sending the string to the requestor. @route(‘/name/’) def nameindex(name=’Stranger’): return ‘Hello, %s!’ % name

We can also grab pieces of the requested URL and pass it into the function. By enclosing a part of the URL in a < >, it indicates that we want to access it within our function. Here you can see where if the url looks like:

Then the response will be Hello, steve!

Or the URL could be

Hello, Stranger!

We are going to define URL mappings for some basic REST like functionality to interact with our spatial MongoDB data store.

Modify the source code

The first function we are going to write will be to just simply return all the records in the database. In a more full featured app you would probably want to add pagination and other features to this query but we won’t be doing that today.@app.route(“/ws/parks”) def parks(): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #query the DB for all the parkpoints result = db.parkpoints.find() #Now turn the results into valid JSON return str(json.dumps({'results':list(result)},default=json_util.default)) 

I chose to put the web services under the url /ws/parks so that we could use other parts of the URL namespace for other functionality. You can now go to your application URL ( and you should be able to see all the documents in the DB.

Using MongoDB in Python

In the code above we simply make a connection to the MongoDB instance for this application and then execute a query. The pymongo package provides all the functionality to interact with the MongoDB instance from our Python code. The pymongo commands are very similar to the MongoDB command line interaction except two word commands like db.collection.findOne are split with a _, such as db.collection.find_one. Please go to the pymongo site to read more about the documentation.

Notice we use the environment variables to specify the connection URL. While not hard coding database connection parameters is good practice in non-cloud apps, in our case you MUST use the environment variables. Since your app can be idled and then spun up or it could be autoscaled, the IP and ports are not always guaranteed. By using the environment variables we make our code portable.

We pass the result set (which comes back as a Python dictionary) into json.dump so we can return JSON straight to the client. Since pymongo is returning the results in UTF and we want just plain text, we need to pass the json_util.default from the bson library into the json.dump command.

This is probably the easiest experience I have ever had writing a web service. I love Flask, Pymongo, and Python for the simplicity of “Just Getting Stuff Done”.

Grab just one park

Next we will implement the code to get back a park given a parks uniqueID. For ID we will just use the ID generated by MongoDB on document insertion (_id). The ID looks like a long random sequence and that is what we will pass into the URL.

return a specific park given it’s mongo _id

@app.route(“/ws/parks/park/”) def onePark(parkId): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #query based on the objectid result = db.parkpoints.find({'_id': objectid.ObjectId(parkId)}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) 

Here you have to use another class from the bson library - ObjectID. The actual ObjectID in MongoDB is an object and so we have to take the ID passed in on the url and create an Object from it. The ObjectID class allows us to create one of these objects to pass into the query. Other than that the code is the same as above.

This little snippet also shows an example of grabbing part of the URL and passing it to a function. I explained this concept above but here we can see it in practice.

Time for the spatial query

Here we do a query to find national parks near a lattitude longitude pair

find parks near a lat and long passed in as query parameters (near?lat=45.5&lon=-82)

@app.route(“/ws/parks/near”) def near(): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #get the request parameters lat = float(request.args.get('lat')) lon = float(request.args.get('lon')) #use the request parameters in the query result = db.parkpoints.find({"pos" : { "$near" : [lon,lat]}}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) 

This piece of code shows how to get request parameters from the URL. We capture the lat and lon from the request url and then cast them to floats to use in our query. Remember, everything in a URL comes across as a string so it needs to be converted before being used in the query. In a production app you would need to make sure that you were actually passed strings that could be parsed as floating point numbers. But since this app is just for demo purposes I am not going to show that here.

Once we have the coordinates, we pass them in the the query just like we did from the command line MongoDB client. The results come back in distance order from the point passed into the query. Remember, the ordering of the coordinates passed into the query need to match the ordering of the coordinates in your MongoDB collection.

Finish it off with a Regex query with spatial goodness

The final piece of code we are going to write allows for a query based both on the name and the location of interest.

find parks with a certain name (using regex) near a lat long pair such as above

@app.route(“/ws/parks/name/near/”) def nameNear(name): #setup the connection conn = pymongo.Connection(os.environ[‘OPENSHIFT_NOSQL_DB_URL’]) db = conn.parks

 #get the request parameters lat = float(request.args.get('lat')) lon = float(request.args.get('lon')) #compile the regex we want to search for and make it case insensitive myregex = re.compile(name, re.I) #use the request parameters in the query along with the regex result = db.parkpoints.find({"Name" : myregex, "pos" : { "$near" : [lon,lat]}}) #turn the results into valid JSON return str(json.dumps({'results' : list(result)},default=json_util.default)) 

Just like the example above we parse out the lat and lon from the URL query parameters. In looking at my architecture I do think it might have been better to add the name as a query parameter as well, but this will still work for this article. We grab the name from the end of the URL path and then compile it into a standard Python regular expression (regex). I added the re.I to make the regex case-insenstive. I then use the regex to search against the Name field in the document collection and do a geo search against the pos field. Again, the results will come back in distance order from the point passed into the query.


And with that we have wrapped up our little web service code - simple and easy using Python and MongoDB. Again, there are some further changes required for going to production, such as request parameter checking, maybe better URL patterns, exception catching, and perhaps a checkin URL - but overall this should put you well on your way. There are examples of:

  • Using Flask to write some nice REST style services in Python
  • Various methods to get URL information so you can use it in your code
  • How to interact with your MongoDB in Python using PyMongo and BSON libraries
  • Getting spatial data out of your application

Give it all a try on OpenShift and drop me a line to show me what you built. I can’t wait to see all the interesting spatial apps built by shifters.

Designing MongoDB Schemas with Embedded, Non-Embedded and Bucket Structures

Aug 10 • Posted 2 years ago

This was originally posted to the Red Hat OpenShift blog

With the rapid adoption of schema-less, NoSQL data stores like MongoDB, Cassandra and Riak in the last few years, developers now have the ability enjoy greater agility when it comes to their application’s persistence model. However, just because a datastore is schema-less, doesn’t mean the structure of the stored documents won’t play an important role in the overall performance and resilience of the application. In this first, of a four part blog series about MongoDB we’ll explore a few strategies you should consider when designing your document structure.

Application requirements should drive schema design

If you ask a dozen experienced developers to design the relational database structure of an application, such as a book review site, it’s likely that each of the structures will be very similar. You’ll likely see tables for authors, books, commenters and comments and so on.. The likelihood of having varied relational structures is small because relational database structures are generally well understood. However, if you ask dozen experienced NoSQL developers to create a similar structure, you’re likely to get a dozen different answers.

Why is there so much variability when it comes to designing a NoSQL schema? To optimize application performance and reliability, a NoSQL schema must be driven by the application’s use case. It’s a novel idea, but it works. Luckily, there are only a few key factors you need to understand when deriving your schema from application requirements. These factors include: • How your documents reference children collections • The structure and the use of indexes • How your data will be sharded

Elements of MongoDB Schemas

Of these factors, how your documents reference child collections, or embedding, is the most important decision you need to make. This point is best demonstrated with an example.

Suppose we’re building the book review site as we mentioned in the introduction. Our application will have authors and books, as well as reviews with threaded comments. How should we structure the collections? Unfortunately, the answers depend on the number of comments we’re expecting per book and how frequently comments are read vs. written. Let’s look at our possible use cases.

The first possibility is were we’re only going to have a few dozen reviews per book, and each review is likely to have a few hundred comments. In this case, embedding the reviews and comments with the book is a viable possibility. Here’s what that might look like:

Listing 1 – Embedded

// Books { “_id”: ObjectId(“500c680c1fe9193b67b898a3”), “publisher”: “O’Reilly Media”, “isbn”: “978-1-4493-8156-1”, “description”: “How does MongoDB help you…”, “title”: “MongoDB: The Definitive Guide”, “formats”: [“Print”, “Ebook”, “Safari Books Online”], “authors”: [{ “lastName”: “Chodorow”, “firstName”: “Kristina” }, { “lastName”: “Dirolf”, “firstName”: “Michael” }], “pages”: “210” }

// Reviews { “_id”: ObjectId(“500c680c1fe9193b67b898a4”), “rating”: 5, “description”: “The Authors made an excellent work…”, “title”: “One of O’Reilly excellent books”, “created”: ISODate(“2012-07-04T09:48:17Z”), “book_id”: { “$ref”: “books”, “$id”: ObjectId(“500c680c1fe9193b67b898a3”) }, “reviewer”: “Giuseppe” }

// Comments { “_id”: ObjectId(“500c680c1fe9193b67b898a5”), “comment”: “This review helped me choose the correct book.”, “commenter”: “Nick”, “review_id”: { “$ref”: “reviews”, “$id”: ObjectId(“500c680c1fe9193b67b898a4”) }, “created”: ISODate(“2012-07-20T13:15:37Z”) }

While simple, this method does have some trade-offs. First, our reviews and comments are strewn throughout the disk. We’re potentially loading thousands of documents to display a page. This leads us to another common embedding strategy – “buckets”.

By bucketing review comments, we can maintain the benefit of fewer reads to display substantial amounts of content, while at the same time maintaining fast writes to smaller documents. An example of a bucketed structure is presented below:

Figure 1 – Hybrid Structure

In this example, the bucket, or hybrid, structure breaks the comments into chunks of roughly 100 comments. Each comment collection maintains a reference to the parent review, as well as its page and current number of contained comments.

Of course, as software developers, we’re painfully aware there’s no free lunch. The downside to buckets is the increased complexity your application has to deal with. The previous strategies were trivial to implement from an application perspective, but suffered from inefficiencies at scale. Buckets address these inefficiencies, but your application has to do a bit more bookkeeping, such as keeping track of the number of comment buckets for a given review.


My own personal projects with MongoDB have used each one of these strategies at one point or another, but I’ve always grown into more complicated strategies from the most basic, as the application requirements changed. One of the benefits of MongoDB is the ability to change your storage strategy at will and you shouldn’t be afraid to take advantage of this flexibility. By starting simple, you can maintain development velocity early and migrate to a more scalable strategy as the need arises. Stay tuned for additional blogs in this series covering the use of MongoDB indexes, sharding and replica sets.

If you are interested in experimenting with a few of the concepts without having to download and install MongoDB, try in on Red Hat’s OpenShift. It’s FREE to sign up and all it takes is an email and your minutes from having a MongoDB instance running in the cloud.


blog comments powered by Disqus