Introduction to LangChain for Database Queries

LangChain is a powerful framework designed to help developers build applications with language models. In this beginner’s guide, we’ll explore how to use LangChain with Ollama (a local AI model runner) to query a MySQL database using natural language.

Rather than writing complex SQL queries, we’ll create a system that allows users to ask questions in plain English and get meaningful answers from their database.

What You’ll Learn

Setting up LangChain with Ollama
Connecting to a MySQL database
Creating database query chains
Handling natural language questions
Building a simple but powerful database assistant

Prerequisites

Python 3.8+
MySQL database
Basic Python knowledge
Ollama installed locally

Our Sample Database

We’ll be working with a simple e-commerce database with two tables:

Products:
- id (INT): Primary key
- name (VARCHAR): Product name
- price (DECIMAL): Product price
- category_id (INT): Foreign key to categories table
- stock (INT): Available inventory
- description (TEXT): Product description
Categories:
- id (INT): Primary key
- name (VARCHAR): Category name
- parent_id (INT): Optional parent category

This structure allows for questions like “What are the most expensive products in the Electronics category?” or “Which categories have products with low stock?”.

Code Implementation

Let’s build our LangChain database assistant step by step:

Step 1: Install Required Libraries

1
pip install langchain langchain-community langchain-core langchain-openai python-dotenv sqlalchemy pymysql

Step 2: Set Up Your Database

First, let’s create our sample database:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
CREATE DATABASE ecommerce;

USE ecommerce;

CREATE TABLE categories (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  parent_id INT,
  FOREIGN KEY (parent_id) REFERENCES categories(id)
);

CREATE TABLE products (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  price DECIMAL(10, 2) NOT NULL,
  category_id INT NOT NULL,
  stock INT NOT NULL DEFAULT 0,
  description TEXT,
  FOREIGN KEY (category_id) REFERENCES categories(id)
);

-- Insert some sample data
INSERT INTO categories (id, name, parent_id) VALUES
(1, 'Electronics', NULL),
(2, 'Clothing', NULL),
(3, 'Books', NULL),
(4, 'Laptops', 1),
(5, 'Smartphones', 1),
(6, 'T-shirts', 2),
(7, 'Jeans', 2),
(8, 'Fiction', 3),
(9, 'Non-Fiction', 3);

INSERT INTO products (name, price, category_id, stock, description) VALUES
('MacBook Pro', 1299.99, 4, 45, 'Powerful laptop for professionals'),
('iPhone 15', 899.99, 5, 120, 'Latest smartphone with advanced features'),
('Samsung Galaxy S23', 849.99, 5, 85, 'Android smartphone with great camera'),
('Basic T-shirt', 19.99, 6, 200, 'Comfortable cotton t-shirt'),
('Designer Jeans', 89.99, 7, 75, 'Stylish jeans for all occasions'),
('Python Programming', 49.99, 9, 30, 'Comprehensive guide to Python programming'),
('Science Fiction Anthology', 29.99, 8, 15, 'Collection of award-winning sci-fi stories'),
('Gaming Laptop', 1499.99, 4, 12, 'High-performance laptop for gaming enthusiasts');

Step 3: Create the LangChain Database Assistant

Here’s our main script:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
# db_assistant.py

import os
from dotenv import load_dotenv
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.chains.sql_database.query import create_sql_query_chain
from langchain_community.utilities import SQLDatabase
from langchain_openai import ChatOpenAI
from langchain_core.messages import AIMessage, HumanMessage

# Load environment variables
load_dotenv()

# Setup database connection
db_user = os.environ.get("DB_USER", "root")
db_password = os.environ.get("DB_PASSWORD", "")
db_host = os.environ.get("DB_HOST", "localhost")
db_name = "ecommerce"

# Create SQLDatabase instance
db = SQLDatabase.from_uri(
    f"mysql+pymysql://{db_user}:{db_password}@{db_host}/{db_name}"
)

# Setup Ollama model (local)
model = ChatOpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ANYTHING",
    model="mistral",  # Specify your model name here
)

# Create a chain for generating SQL queries from natural language
sql_chain = create_sql_query_chain(model, db)

# Create a more conversational response template
response_prompt = ChatPromptTemplate.from_messages([
    ("system", """
    You are a helpful assistant that explains database query results in a friendly way.

    Given the following:
    1. A user question
    2. The SQL query used to retrieve the data
    3. The query result

    Provide a helpful, conversational response that answers the user's question.
    Focus on insights, not just repeating data. Format data as needed for readability.
    """),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{question}"),
    ("system", "SQL Query: {query}"),
    ("system", "SQL Result: {result}"),
])

# Create a function to execute SQL and get results
def execute_query(query):
    try:
        return db.run(query)
    except Exception as e:
        return f"Error executing query: {str(e)}"

# Create the full chain
def create_db_chain():
    chain = (
        RunnablePassthrough.assign(query=sql_chain)
        .assign(
            result=lambda vars: execute_query(vars["query"])
        )
        | response_prompt
        | model
        | StrOutputParser()
    )
    return chain

# Main interaction loop
def main():
    print("🤖 Database Assistant")
    print("Ask questions about products and categories in the e-commerce database")
    print("Type 'exit' to quit\n")

    db_chain = create_db_chain()
    chat_history = []

    while True:
        question = input("\nYour question: ")
        if question.lower() in ("exit", "quit"):
            break

        # Process the question
        try:
            response = db_chain.invoke({
                "question": question,
                "chat_history": chat_history
            })
            print("\n" + response)

            # Update chat history
            chat_history.extend([
                HumanMessage(content=question),
                AIMessage(content=response)
            ])

        except Exception as e:
            print(f"Error: {str(e)}")

if __name__ == "__main__":
    main()

Step 4: Environment Setup

Create a .env file in your project directory:

DB_USER=your_mysql_username
DB_PASSWORD=your_mysql_password
DB_HOST=localhost

How It Works

Let’s break down how this system works:

Database Connection: We use SQLAlchemy to connect to our MySQL database through the LangChain SQLDatabase utility.
Language Model: We configure LangChain to use Ollama’s local model (Mistral in this example).
SQL Chain: The create_sql_query_chain() function is a LangChain utility that creates a chain specifically for turning natural language into SQL queries. It automatically:
- Examines your database schema
- Creates a prompt with your database structure
- Generates valid SQL based on the user’s question
Response Generation: After getting the SQL results, we pass them through another prompt to generate human-friendly responses.
Conversation History: We maintain a chat history to enable follow-up questions.

Example Questions

Once you’ve set up the system, you can ask questions like:

“What are all the products in the Electronics category?”
“What’s the average price of laptops?”
“Which product has the lowest stock?”
“How many items do we have in each category?”
“List all categories with their parent categories”

Security Considerations

When implementing this in a real-world scenario, keep these security aspects in mind:

1. Database Security

Restricted User: Create a database user with limited permissions (READ-only)
Connection Security: Use SSL for database connections in production
Environment Variables: Never hardcode credentials in your script

2. SQL Injection Prevention

LangChain’s SQL chain helps prevent SQL injection by using SQLAlchemy’s parameterized queries, but you should still:

Validate any direct inputs if you modify the code
Consider implementing additional validation for generated SQL
If possible, restrict SQL to only SELECT statements

3. Data Privacy

Be careful about what data is exposed through this interface
Consider masking sensitive fields in responses
Implement user authentication before deploying publicly

4. Local Model Benefits

Using Ollama locally provides several advantages:

Data stays on your machine
No API costs
Works offline
Greater privacy

Enhancing the Application

Here are some ways to improve your LangChain database assistant:

Add More Tables: Expand the database schema for more complex queries
Custom Tools: Add specialized functions for specific business logic
Better Error Handling: Improve error messages and fallback mechanisms
User Interface: Add a web interface with Streamlit or Flask
Query Logging: Log questions and responses for analysis
Fine-tuning: Fine-tune the model on your specific domain

Conclusion

LangChain provides a powerful framework for connecting language models to databases. By combining it with Ollama’s local AI capabilities and a MySQL database, you’ve created a system that makes data accessible through natural language.

This approach democratizes database access, allowing non-technical team members to get insights without knowing SQL. As language models continue to improve, these types of applications will become increasingly powerful tools in your data toolkit.

The next steps would be to expand the database schema, refine the prompts, and potentially add a user-friendly interface to make this tool accessible to more people in your organization.