Integrating MongoDB Text Search with a Python App

Jun 4 • Posted 1 year ago

By Mike O’Brien, 10gen Software engineer and maintainer of Mongo-Hadoop

With the release of MongoDB 2.4, it’s now pretty simple to take an existing application that already uses MongoDB and add new features that take advantage of text search. Prior to 2.4, adding text search to a MongoDB app would have required writing code to interface with another system like Solr, Lucene, ElasticSearch, or something else. Now that it’s integrated with the database we are already using, we can accomplish the same result with reduced complexity, and fewer moving parts in the deployment.

Here we’ll go through a practical example of adding text search to Planet MongoDB, our blog aggregator site.

Planet MongoDB is built in Python, uses the excellent Flask web framework, and stores feed content in a collection called posts. We’ll add some code that enables us to search over posts for any keyword terms we want. As you’ll see, the amount of code and configuration that needs to be added to accomplish this is quite small.

Initial Setup

Before you can actually use any text search features, you have to explicitly enable it. You can do this by just restarting mongod with the additional command line options --setParameter textSearchEnabled=true, or just from the mongo shell by running db.runCommand({setParameter:1, textSearchEnabled:true}). Since you’re hopefully developing and testing on a different database than you use for production, don’t forget to do this on both.

Creating Indexes

The next critical step is to create the text search index on the field you want to make searchable. In our case, we want our searches to be able to find hits in the article titles as well as the content. However, since the article titles are more prominent, we want to consider matches in the title to rank a bit higher overall in the search than matches in the content body. We can do this by setting weights on the fields.

To do this, we’ll add a line of python code to the application that is executed upon startup which creates the index we need, if it doesn’t already exist:

db.posts.ensure_index([
      ('body', 'text'),
      ('title', 'text'),
  ],
  name="search_index",
  weights={
      'title':100,
      'body':25
  }
)

Running searches

At this point, we now have a collection of data, and we’ve created a text index that can be used to do searches on arbitrary keywords. We just need to write some code that will actually run searches and render the results.

Unlike regular MongoDB queries, text search is implemented as a special command that returns a document containing a ‘results’ field, an array of the highest-scoring documents that matched. To use it, run the command with the additional field search which contains the keywords to match against. To use this in the app, we just grab the request parameter containing what the user typed into the search box and pass it as an argument to the text search command, and then render a page containing the search results.

@app.route('/search')
def search():
    query = request.form['q']
    text_results = db.command('text', 'posts', search=query, limit=SEARCH_LIMIT)
    doc_matches = (res['obj'] for res in text_results['results'])
    return render_template("search.html", results=results)

Filtering

In addition to finding docs that match text queries, you may want to filter the result set even further based on other criteria and fields in the documents. To do this, add a filter field to the text search command containing the additional filtering logic, in the exact same style as a regular find() query. In this case, we want to restrict the results to only the blog posts that are related to MongoDB, which is determined by a field in the posts called related. Modifying the call to db.command to include this, we get:

text_results = db.command('text', 'posts', search=query, filter={'related':True}, limit=SEARCH_LIMIT)

Pagination

In practice, most applications want to just show a few results on a page at a time, and then provide some kind of “previous/next” links to navigate through multiple pages of matches. We can tweak the existing code to accomplish this too, by adding a parameter page to indicate where we are in the results, and rendering 10 results at a time.

So now, we’ll parse out the page param and slice out the necessary items from the array returned in results, using an additional arg limit to return only as many documents as needed. On the results page, we can then just generate a link to the next page of results by constructing the same search link but incrementing page in the Jinja template.

PAGE_SIZE = 10
try:
    page = int(request.args.get("page", 0))
except:
    page = 0

start = page * PAGE_SIZE
end = (page + 1) * PAGE_SIZE
text_results = db.command('text', 'posts', search=query, filter={'related':True}, limit=end)
doc_matches = text_results[start:end]
Wrap-up

The rest of the work to be done to finish up is all on the user-interface side. We add a form with a single input element for the user to type in the query, and write the code to display the posts returned in the text search command, and it’s already up and running. Although it was very quick and easy to add a functional text-search feature to the app, this only scratches the surface of how it all works. To learn more, refer to the docs on text search.

blog comments powered by Disqus
blog comments powered by Disqus