Introducing Semantic Caching and a Dedicated MongoDB LangChain Package for Gen AI Apps

Prakul Agarwal, Erick Friis, and Jacob Lee

#genAI

We are in an unprecedented time in history where developers can build transformative AI applications quickly, without being AI experts themselves. This ability is enabling new classes of applications that can better serve customers with conversational AI for assistance and automation, advanced reasoning and analysis using AI-powered retrieval, and recommendation systems.

Behind this revolution are large language models (LLMs) that can be prompted to solve for a wide range of use cases. However, LLMs have various limitations, like knowledge cutoff and a tendency to hallucinate. To overcome these limitations, they must be integrated with proprietary enterprise data sources to build reliable, relevant, and high-quality generative AI applications. That’s where MongoDB plays a critical role in the modern generative AI stack.

Developers use MongoDB Atlas Vector Search as a vital part of the generative AI technique known as retrieval-augmented generation (RAG). RAG is the process of feeding LLMs the supplementary data necessary to ground their responses, ensuring they're dependable and precise. LangChain has been a critical part of this journey since the public launch of Atlas Vector Search, enabling developers to build better retriever systems powered by vector search and store conversation history in the operational database.

Today, we are excited to announce support for two enhancements:

  1. Semantic cache powered by Atlas vector search, which improves the performance of your apps

  2. A dedicated LangChain-MongoDB package for Python and JS/TS developers, enabling them to build advanced applications even more efficiently

The MongoDB Atlas integration with LangChain can now power all the database requirements for building modern generative AI applications: vector search, semantic caching (currently only available in Python), and conversation history.

Earlier, we announced the launch of MongoDB LangChain Templates, which enable the developers to quickly deploy RAG applications, and provided a reference implementation of a basic RAG template using MongoDB Atlas Vector Search and OpenAI and a more advanced Parent-document Retrieval RAG template using MongoDB Atlas Vector Search. We are excited about our partnership with LangChain and will continue innovating.

Improve LLM application performance with semantic cache

Semantic cache improves the performance of LLM applications by caching responses based on the semantic meaning or context within the queries themselves. This is different from a traditional cache that works based on exact keyword matching. In the era of LLM the value of semantic cache is increasing tremendously, enabling sophisticated user experiences that closely mimic human interactions. For example, if two different users enter two different prompts, “give me suggestions for a comedy movie” and “recommend a comedy movie”, the semantic cache can understand that the intent behind the queries are same and return a similar response, even though different keywords are used, whereas a traditional cache will fail.

Diagram of using semantic cache using MongoDB Atlas Vector Search. Before using cache, each use query sends to LLM, which takes up time and money. With the cache, the system checks if the query has already been processed by the LLM and only new queries are sent to the LLM. Saving time and money
Figure 1: Semantic cache using MongoDB Atlas Vector Search

Check out this video walkthrough for the semantic cache:

Accelerate development with a dedicated package

With a dedicated LangChain-MongoDB package, MongoDB is even more deeply integrated with LangChain. The Python and Javascript packages contain the following LangChain Integrations: MongoDBAtlasVectorSearch (Vector stores) and MongoDBChatMessageHistory (Chat Messages Memory). In addition, the Python package includes the MongoDBAtlasSemanticCache (LLM Caching).

The new package langchain-mongodb contains all the MongoDB-specific implementations and needs to be installed separately from langchain, which includes all the core abstractions. Earlier, everything was in the same package, making it challenging to correctly version and communicate what version should be used and whether any breaking changes were made.

Find out more about the langchain-mongodb package:

Get started today

  • Check out this accompanying tutorial and notebook on building advanced RAG with MongoDB and LangChain, which contains a walkthrough and use cases for using semantic cache, vector search, and chat message history.

  • Check out the “PDFtoChat” app to see langchain-mongodb JS in action. It allows you to have a conversation with your proprietary PDFs using AI and is built with MongoDB Atlas, LangChain.js, and TogetherAI. It’s an end-to-end SaaS-in-a-box app and includes user authentication, saving PDFs, and saving chats per PDF.

  • Read the excellent overview of semantic caching using LangChain and MongoDB.