How to Use GPT + SQLite to Query Your Life Like a Database

A person placing a wedding ring on a glass table, symbolizing separation or divorce.
A person placing a wedding ring on a glass table, symbolizing separation or divorce.

How to Use GPT + SQLite to Query Your Life Like a Database

Life generates data. Every task completed, every book read, every expense logged, every thought jotted down. What if you could query all of it with natural language, just like you query a database? With GPT and SQLite, you absolutely can.

This post will walk you through building a system to store your personal data in a local SQLite database and then use GPT to translate your natural language questions into SQL queries, letting you “query your life.”

We’ll focus on a pragmatic, command-line driven approach, using Python to bridge GPT and SQLite.

Why GPT + SQLite for “Life Data”?

  • Local & Private: SQLite keeps your data on your machine. No cloud sync, no third-party servers. Crucial for personal information.
  • Structured: SQLite provides the robust structure of a relational database, making your data queryable and organized.
  • Natural Language Interface: GPT removes the need to remember SQL syntax. Just ask, and GPT tries to generate the right query.
  • Actionable Insights: Want to know how many books you read last quarter? Or what topics you journaled about most frequently in a specific month? This setup makes it trivial.
  • Extensible: Easily add new data types (tables) as your needs evolve.

Setting Up Your Environment

First, let’s get our tools in order.

Prerequisites

  • Python 3.8+: Essential for scripting. Most systems have it pre-installed or easily installable.
  • OpenAI API Key: You’ll need an API key from OpenAI. Keep it secure!

Install Python Libraries

We’ll need openai for interacting with GPT and python-dotenv for securely loading our API key. sqlite3 is built into Python.

pip install openai python-dotenv

Configure Your OpenAI API Key

Create a file named .env in the root of your project directory and add your API key:

OPENAI_API_KEY="sk-your-super-secret-key-here"

Note: Never commit your .env file to version control (e.g., Git). Add it to your .gitignore.

Designing Your SQLite Database for Life Data

The core of “querying your life” is having your life data in a queryable format. SQLite is perfect for this. Let’s design a couple of simple tables: journal_entries and books_read.

1. Define Your Schema

Think about the data you want to store and how it relates.

  • journal_entries: A simple table to store daily thoughts, ideas, or events.
    • id: Primary key, unique identifier for each entry.
    • entry_date: The date of the journal entry (e.g., TEXT in YYYY-MM-DD format).
    • title: A short title for the entry.
    • content: The actual text of the journal entry.
  • books_read: To track books you’ve completed.
    • id: Primary key.
    • title: Book title.
    • author: Book author.
    • genre: Book genre.
    • start_date: When you started reading (TEXT YYYY-MM-DD).
    • end_date: When you finished reading (TEXT YYYY-MM-DD).
    • rating: Your rating (INTEGER, e.g., 1-5).

2. Create the Database and Tables

Let’s write a small Python script to set up our database and create these tables.

db_setup.py:

import sqlite3
import os

DB_FILE = 'life_data.db'

def setup_database():
    """Connects to or creates the SQLite database and sets up tables."""
    conn = sqlite3.connect(DB_FILE)
    cursor = conn.cursor()

    # Create journal_entries table
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS journal_entries (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            entry_date TEXT NOT NULL,
            title TEXT,
            content TEXT NOT NULL
        )
    ''')

    # Create books_read table
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS books_read (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            title TEXT NOT NULL,
            author TEXT,
            genre TEXT,
            start_date TEXT,
            end_date TEXT,
            rating INTEGER
        )
    ''')

    conn.commit()
    conn.close()
    print(f"Database '{DB_FILE}' and tables created/verified successfully.")

    # Add some sample data
    add_sample_data()

def add_sample_data():
    """Adds some initial sample data to the tables."""
    conn = sqlite3.connect(DB_FILE)
    cursor = conn.cursor()

    # Sample Journal Entries
    journal_data = [
        ('2023-10-25', 'Morning Thoughts', 'Woke up feeling refreshed, had a great idea for a new blog post. Decided to write about GPT and SQLite.'),
        ('2023-10-26', 'Project Update', 'Spent most of the day coding the backend for the new project. Ran into a few bugs but fixed them.'),
        ('2023-10-26', 'Evening Walk', 'Took a long walk in the park. Saw some interesting birds. Good for clearing my head.'),
        ('2023-10-27', 'Blog Post Progress', 'Made significant progress on the GPT-SQLite blog post. Examples are coming along nicely.'),
        ('2023-09-15', 'Old Idea', 'Had an idea for a simple CLI tool for tracking habits. Maybe revisit later.')
    ]
    cursor.executemany("INSERT OR IGNORE INTO journal_entries (entry_date, title, content) VALUES (?, ?, ?)", journal_data)

    # Sample Books Read
    books_data = [
        ('Dune', 'Frank Herbert', 'Science Fiction', '2023-09-01', '2023-09-20', 5),
        ('Sapiens', 'Yuval Noah Harari', 'History', '2023-08-01', '2023-08-30', 4),
        ('The Hitchhikers Guide to the Galaxy', 'Douglas Adams', 'Science Fiction', '2023-10-01', '2023-10-10', 5),
        ('Factfulness', 'Hans Rosling', 'Non-fiction', '2023-07-01', '2023-07-25', 4)
    ]
    cursor.executemany("INSERT OR IGNORE INTO books_read (title, author, genre, start_date, end_date, rating) VALUES (?, ?, ?, ?, ?, ?)", books_data)


    conn.commit()
    conn.close()
    print("Sample data added/verified.")

if __name__ == '__main__':
    setup_database()

Run this script once to initialize your database:

python db_setup.py
Database 'life_data.db' and tables created/verified successfully.
Sample data added/verified.

You should now have a life_data.db file in your directory.

The GPT-SQL Connection: Bridging Natural Language and Data

This is where the magic happens. We’ll build a Python script that does the following:

  1. Gets the schema of your SQLite tables. GPT needs to know what tables and columns exist.
  2. Takes a natural language query from you.
  3. Sends the schema and your query to GPT, asking it to generate a SQL query.
  4. Executes the generated SQL query against your SQLite database.
  5. Presents the results.

1. Getting Database Schema for GPT

GPT needs context. It can’t generate correct SQL if it doesn’t know your table names, column names, and their types. We’ll write a helper function to extract this.

# Part of your main script (e.g., query_tool.py)
import sqlite3

DB_FILE = 'life_data.db'

def get_table_schema(table_name):
    """
    Retrieves the schema for a given table from the SQLite database.
    Returns a string representation suitable for GPT.
    """
    conn = sqlite3.connect(DB_FILE)
    cursor = conn.cursor()
    cursor.execute(f"PRAGMA table_info({table_name})")
    columns = cursor.fetchall()
    conn.close()

    schema_parts = []
    for col in columns:
        cid, name, col_type, not_null, default_val, pk = col
        pk_str = " PRIMARY KEY" if pk else ""
        nn_str = " NOT NULL" if not_null and not pk else "" # Don't double-state NOT NULL for PK
        schema_parts.append(f"{name} {col_type}{nn_str}{pk_str}")
    return f"CREATE TABLE {table_name} ({', '.join(schema_parts)});"

def get_full_db_schema():
    """
    Retrieves the schema for all tables in the database.
    Returns a dictionary mapping table names to their CREATE TABLE statements.
    """
    conn = sqlite3.connect(DB_FILE)
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
    tables = [row[0] for row in cursor.fetchall()]
    conn.close()

    full_schema = {}
    for table in tables:
        if table != 'sqlite_sequence': # Ignore internal SQLite table
            full_schema[table] = get_table_schema(table)
    return full_schema

2. Crafting the GPT Prompt

This is critical. We need to tell GPT:

  • Its role: A SQL expert.
  • The format: ONLY SQL. No explanations, no extra text.
  • The available schema.
  • The user’s question.

We’ll use the gpt-3.5-turbo model for this.

# Part of your main script (e.g., query_tool.py)
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv() # Load environment variables from .env file

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)

def generate_sql_query(user_question, db_schema):
    """
    Sends the user's question and database schema to GPT
    to generate an SQL query.
    """
    schema_str = "\n".join(db_schema.values())

    system_message = {
        "role": "system",
        "content": (
            "You are an expert SQLite database administrator. "
            "Your task is to convert natural language questions into valid SQLite SQL queries. "
            "You MUST ONLY return the SQL query, and nothing else. "
            "Do not include any explanations, preambles, or additional text. "
            "Always include a LIMIT clause in your SELECT queries to prevent excessively large results. "
            "The available tables and their schemas are:\n"
            f"{schema_str}"
        )
    }

    user_message = {
        "role": "user",
        "content": user_question
    }

    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo", # You can try "gpt-4" for better accuracy if available
            messages=[system_message, user_message],
            temperature=0.1, # Keep temperature low for deterministic SQL generation
            max_tokens=200 # Sufficient for most queries
        )
        sql_query = response.choices[0].message.content.strip()
        # Basic cleanup: Sometimes GPT might add ```sql ... ```
        if sql_query.startswith("```sql") and sql_query.endswith("```"):
            sql_query = sql_query[len("```sql"): -len("```")].strip()
        elif sql_query.startswith("```") and sql_query.endswith("```"):
            sql_query = sql_query[len("```"): -len("```")].strip()

        return sql_query
    except Exception as e:
        print(f"Error generating SQL: {e}")
        return None

Note on temperature: A low temperature (e.g., 0.1) makes GPT’s output more deterministic and focused, which is ideal when you want precise SQL. A higher temperature might lead to more creative but less accurate queries.

3. Executing SQL and Displaying Results

Finally, we need a function to execute the generated SQL against our database and present the results in a readable format.

# Part of your main script (e.g., query_tool.py)
def execute_sql_query(sql_query):
    """Executes a given SQL query against the SQLite database and returns results."""
    conn = sqlite3.connect(DB_FILE)
    cursor = conn.cursor()
    try:
        cursor.execute(sql_query)
        rows = cursor.fetchall()
        # Get column names for better output
        column_names = [description[0] for description in cursor.description]
        return column_names, rows
    except sqlite3.Error as e:
        print(f"SQLite error: {e}")
        return [], []
    finally:
        conn.close()

def format_results(column_names, rows):
    """Formats query results for display."""
    if not column_names:
        return "No results or query failed."

    # Determine maximum width for each column
    col_widths = [len(col) for col in column_names]
    for row in rows:
        for i, val in enumerate(row):
            col_widths[i] = max(col_widths[i], len(str(val)))

    # Header
    header = " | ".join(col.ljust(width) for col, width in zip(column_names, col_widths))
    separator = "-+-".join("-" * width for width in col_widths)

    # Data rows
    data_rows = []
    for row in rows:
        data_rows.append(" | ".join(str(val).ljust(width) for val, width in zip(row, col_widths)))

    return "\n".join([header, separator] + data_rows)

4. Putting It All Together: The Interactive Query Tool

Let’s combine all the pieces into an interactive command-line tool.

query_tool.py:

import sqlite3
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv() # Load environment variables from .env file

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)

DB_FILE = 'life_data.db'

def get_table_schema(table_name):
    """
    Retrieves the schema for a given table from the SQLite database.
    Returns a string representation suitable for GPT.
    """
    conn = sqlite3.connect(DB_FILE)
    cursor = conn.cursor()
    cursor.execute(f"PRAGMA table_info({table_name})")
    columns = cursor.fetchall()
    conn.close()

    schema_parts = []
    for col in columns:
        cid, name, col_type, not_null, default_val, pk = col
        pk_str = " PRIMARY KEY" if pk else ""
        nn_str = " NOT NULL" if not_null and not pk else ""
        schema_parts.append(f"{name} {col_type}{nn_str}{pk_str}")
    return f"CREATE TABLE {table_name} ({', '.join(schema_parts)});"

def get_full_db_schema():
    """
    Retrieves the schema for all tables in the database.
    Returns a dictionary mapping table names to their CREATE TABLE statements.
    """
    conn = sqlite3.connect(DB_FILE)
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
    tables = [row[0] for row in cursor.fetchall()]
    conn.close()

    full_schema = {}
    for table in tables:
        if table != 'sqlite_sequence': # Ignore internal SQLite table
            full_schema[table] = get_table_schema(table)
    return full_schema

def generate_sql_query(user_question, db_schema):
    """
    Sends the user's question and database schema to GPT
    to generate an SQL query.
    """
    schema_str = "\n".join(db_schema.values())

    system_message = {
        "role": "system",
        "content": (
            "You are an expert SQLite database administrator. "
            "Your task is to convert natural language questions into valid SQLite SQL queries. "
            "You MUST ONLY return the SQL query, and nothing else. "
            "Do not include any explanations, preambles, or additional text. "
            "Always include a LIMIT clause in your SELECT queries unless specifically asked for all records. "
            "If the query involves dates, ensure they are handled correctly for TEXT columns (e.g., 'YYYY-MM-DD'). "
            "The available tables and their schemas are:\n"
            f"{schema_str}"
        )
    }

    user_message = {
        "role": "user",
        "content": user_question
    }

    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo", # You can try "gpt-4" for better accuracy if available
            messages=[system_message, user_message],
            temperature=0.1, # Keep temperature low for deterministic SQL generation
            max_tokens=200 # Sufficient for most queries
        )
        sql_query = response.choices[0].message.content.strip()
        # Basic cleanup: Sometimes GPT might add ```sql ... ```
        if sql_query.startswith("```sql") and sql_query.endswith("```"):
            sql_query = sql_query[len("```sql"): -len("```")].strip()
        elif sql_query.startswith("```") and sql_query.endswith("```"):
            sql_query = sql_query[len("```"): -len("```")].strip()

        return sql_query
    except Exception as e:
        print(f"Error generating SQL: {e}")
        return None

def execute_sql_query(sql_query):
    """Executes a given SQL query against the SQLite database and returns results."""
    conn = sqlite3.connect(DB_FILE)
    cursor = conn.cursor()
    try:
        cursor.execute(sql_query)
        rows = cursor.fetchall()
        # Get column names for better output
        column_names = [description[0] for description in cursor.description]
        return column_names, rows
    except sqlite3.Error as e:
        print(f"SQLite error: {e}")
        return [], []
    finally:
        conn.close()

def format_results(column_names, rows):
    """Formats query results for display."""
    if not column_names:
        return "No results or query failed."

    # Determine maximum width for each column
    col_widths = [len(col) for col in column_names]
    if rows: # Only iterate if there are rows to avoid IndexError on empty result sets
        for row in rows:
            for i, val in enumerate(row):
                col_widths[i] = max(col_widths[i], len(str(val)))

    # Header
    header = " | ".join(col.ljust(width) for col, width in zip(column_names, col_widths))
    separator = "-+-".join("-" * width for width in col_widths)

    # Data rows
    data_rows = []
    for row in rows:
        data_rows.append(" | ".join(str(val).ljust(width) for val, width in zip(row, col_widths)))

    return "\n".join([header, separator] + data_rows)

def main():
    print("Welcome to your Life Data Query Tool!")
    print("Type your questions in natural language. Type 'exit' to quit.")

    db_schema = get_full_db_schema()

    while True:
        user_question = input("\nYour question: ").strip()
        if user_question.lower() == 'exit':
            break

        print(f"Thinking... (Generating SQL for: '{user_question}')")
        sql_query = generate_sql_query(user_question, db_schema)

        if sql_query:
            print(f"Generated SQL: {sql_query}")
            column_names, results = execute_sql_query(sql_query)
            print("\n--- Results ---")
            print(format_results(column_names, results))
            print("---------------")
        else:
            print("Could not generate a valid SQL query. Please try rephrasing.")

    print("Exiting. Goodbye!")

if __name__ == '__main__':
    main()

Running Your Life Data Query Tool

Make sure you have your life_data.db and .env files in the same directory as query_tool.py.

python query_tool.py

Now, let’s try some queries!

Example 1: Basic Journal Query

python query_tool.py
Welcome to your Life Data Query Tool!
Type your questions in natural language. Type 'exit' to quit.

Your question: show me all my journal entries
Thinking... (Generating SQL for: 'show me all my journal entries')
Generated SQL: SELECT id, entry_date, title, content FROM journal_entries LIMIT 10;

--- Results ---
id | entry_date | title           | content
---+------------+-----------------+--------------------------------------------------------------------------------------
1  | 2023-10-25 | Morning Thoughts| Woke up feeling refreshed, had a great idea for a new blog post. Decided to write about GPT and SQLite.
2  | 2023-10-26 | Project Update  | Spent most of the day coding the backend for the new project. Ran into a few bugs but fixed them.
3  | 2023-10-26 | Evening Walk    | Took a long walk in the park. Saw some interesting birds. Good for clearing my head.
4  | 2023-10-27 | Blog Post Progress| Made significant progress on the GPT-SQLite blog post. Examples are coming along nicely.
5  | 2023-09-15 | Old Idea        | Had an idea for a simple CLI tool for tracking habits. Maybe revisit later.
---------------

Example 2: Filtered Journal Query

Your question: journal entries from October 26, 2023
Thinking... (Generating SQL for: 'journal entries from October 26, 2023')
Generated SQL: SELECT id, entry_date, title, content FROM journal_entries WHERE entry_date = '2023-10-26' LIMIT 10;

--- Results ---
id | entry_date | title         | content
---+------------+---------------+-----------------------------------------------------------------------
2  | 2023-10-26 | Project Update| Spent most of the day coding the backend for the new project. Ran into a few bugs but fixed them.
3  | 2023-10-26 | Evening Walk  | Took a long walk in the park. Saw some interesting birds. Good for clearing my head.
---------------

Example 3: Querying Books

Your question: what science fiction books have I read?
Thinking... (Generating SQL for: 'what science fiction books have I read?')
Generated SQL: SELECT title, author, end_date, rating FROM books_read WHERE genre = 'Science Fiction' LIMIT 10;

--- Results ---
title                               | author         | end_date   | rating
------------------------------------+----------------+------------+-------
Dune                                | Frank Herbert  | 2023-09-20 | 5
The Hitchhikers Guide to the Galaxy | Douglas Adams  | 2023-10-10 | 5
---------------

Example 4: Aggregation Query

Your question: how many books did I read in September 2023?
Thinking... (Generating SQL for: 'how many books did I read in September 2023?')
Generated SQL: SELECT COUNT(*) FROM books_read WHERE end_date LIKE '2023-09-%' LIMIT 10;

--- Results ---
COUNT(*)
--------
1
---------------

This demonstrates GPT’s ability to understand date patterns and aggregate functions when given the schema context.

Advanced Considerations and Tips

  • Error Handling and Robustness: The current script provides basic error messages. For production use, you’d want more sophisticated error handling, including retries for API calls, more precise SQLite error reporting, and validation of generated SQL before execution (e.g., using a dry run or checking for destructive commands like DROP TABLE).
  • Prompt Engineering: Experiment with your system_message.
    • You might add instructions like “Prioritize searching by date if a date is mentioned.”
    • “If an exact date is given, use ‘=’, otherwise use ‘LIKE’ for year/month patterns.”
    • “Always order results by date descending if a date column is present.”
    • Caution: The more complex your instructions, the more tokens they consume, potentially increasing cost and inference time.
  • Preventing Destructive Queries: GPT is powerful. It could generate DROP TABLE or DELETE FROM statements. The current system message (You MUST ONLY return the SQL query, and nothing else.) is a deterrent, but not a guarantee.
    • Safety Tip: If this were a user-facing tool, you’d implement a SQL parser to specifically disallow destructive commands, or run queries in a read-only mode, or require user confirmation for any UPDATE/DELETE/INSERT queries. For personal use, be mindful of what you type and review the Generated SQL before interpreting results.
  • Data Types and Formatting: SQLite is flexible, but dates (TEXT) can be tricky for GPT. Explicitly telling GPT the format (YYYY-MM-DD) in the prompt helps.
  • Adding More Tables: Want to track expenses, habits, or fitness data? Just create new tables in db_setup.py, add them to life_data.db, and get_full_db_schema() will automatically include them in the context sent to GPT.
  • Offline Mode / Caching: For common queries, you might cache GPT’s SQL generation or pre-define certain queries to reduce API calls and latency.
  • Data Privacy: By keeping life_data.db local and only sending schema + a single natural language question to OpenAI, you minimize data exposure. No actual life data leaves your machine.

Conclusion

You’ve now built a powerful, private, and extensible system to query your personal data using natural language. This combination of SQLite’s robust local storage and GPT’s intelligent language processing opens up entirely new ways to interact with your own information.

From tracking habits to analyzing past decisions, your life data is no longer siloed or hard to access. It’s a database, and you’re the natural language wizard querying it. This is just the beginning of how AI can empower us to better understand our own lives. Go forth and query!

Last updated on