How to Use GPT + SQLite to Query Your Life Like a Database
Life generates data. Every task completed, every book read, every expense logged, every thought jotted down. What if you could query all of it with natural language, just like you query a database? With GPT and SQLite, you absolutely can.
This post will walk you through building a system to store your personal data in a local SQLite database and then use GPT to translate your natural language questions into SQL queries, letting you “query your life.”
We’ll focus on a pragmatic, command-line driven approach, using Python to bridge GPT and SQLite.
Why GPT + SQLite for “Life Data”?
- Local & Private: SQLite keeps your data on your machine. No cloud sync, no third-party servers. Crucial for personal information.
- Structured: SQLite provides the robust structure of a relational database, making your data queryable and organized.
- Natural Language Interface: GPT removes the need to remember SQL syntax. Just ask, and GPT tries to generate the right query.
- Actionable Insights: Want to know how many books you read last quarter? Or what topics you journaled about most frequently in a specific month? This setup makes it trivial.
- Extensible: Easily add new data types (tables) as your needs evolve.
Setting Up Your Environment
First, let’s get our tools in order.
Prerequisites
- Python 3.8+: Essential for scripting. Most systems have it pre-installed or easily installable.
- OpenAI API Key: You’ll need an API key from OpenAI. Keep it secure!
Install Python Libraries
We’ll need openai
for interacting with GPT and python-dotenv
for securely loading our API key. sqlite3
is built into Python.
pip install openai python-dotenv
Configure Your OpenAI API Key
Create a file named .env
in the root of your project directory and add your API key:
OPENAI_API_KEY="sk-your-super-secret-key-here"
Note: Never commit your .env
file to version control (e.g., Git). Add it to your .gitignore
.
Designing Your SQLite Database for Life Data
The core of “querying your life” is having your life data in a queryable format. SQLite is perfect for this. Let’s design a couple of simple tables: journal_entries
and books_read
.
1. Define Your Schema
Think about the data you want to store and how it relates.
journal_entries
: A simple table to store daily thoughts, ideas, or events.id
: Primary key, unique identifier for each entry.entry_date
: The date of the journal entry (e.g.,TEXT
inYYYY-MM-DD
format).title
: A short title for the entry.content
: The actual text of the journal entry.
books_read
: To track books you’ve completed.id
: Primary key.title
: Book title.author
: Book author.genre
: Book genre.start_date
: When you started reading (TEXTYYYY-MM-DD
).end_date
: When you finished reading (TEXTYYYY-MM-DD
).rating
: Your rating (INTEGER, e.g., 1-5).
2. Create the Database and Tables
Let’s write a small Python script to set up our database and create these tables.
db_setup.py
:
import sqlite3
import os
DB_FILE = 'life_data.db'
def setup_database():
"""Connects to or creates the SQLite database and sets up tables."""
conn = sqlite3.connect(DB_FILE)
cursor = conn.cursor()
# Create journal_entries table
cursor.execute('''
CREATE TABLE IF NOT EXISTS journal_entries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entry_date TEXT NOT NULL,
title TEXT,
content TEXT NOT NULL
)
''')
# Create books_read table
cursor.execute('''
CREATE TABLE IF NOT EXISTS books_read (
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT NOT NULL,
author TEXT,
genre TEXT,
start_date TEXT,
end_date TEXT,
rating INTEGER
)
''')
conn.commit()
conn.close()
print(f"Database '{DB_FILE}' and tables created/verified successfully.")
# Add some sample data
add_sample_data()
def add_sample_data():
"""Adds some initial sample data to the tables."""
conn = sqlite3.connect(DB_FILE)
cursor = conn.cursor()
# Sample Journal Entries
journal_data = [
('2023-10-25', 'Morning Thoughts', 'Woke up feeling refreshed, had a great idea for a new blog post. Decided to write about GPT and SQLite.'),
('2023-10-26', 'Project Update', 'Spent most of the day coding the backend for the new project. Ran into a few bugs but fixed them.'),
('2023-10-26', 'Evening Walk', 'Took a long walk in the park. Saw some interesting birds. Good for clearing my head.'),
('2023-10-27', 'Blog Post Progress', 'Made significant progress on the GPT-SQLite blog post. Examples are coming along nicely.'),
('2023-09-15', 'Old Idea', 'Had an idea for a simple CLI tool for tracking habits. Maybe revisit later.')
]
cursor.executemany("INSERT OR IGNORE INTO journal_entries (entry_date, title, content) VALUES (?, ?, ?)", journal_data)
# Sample Books Read
books_data = [
('Dune', 'Frank Herbert', 'Science Fiction', '2023-09-01', '2023-09-20', 5),
('Sapiens', 'Yuval Noah Harari', 'History', '2023-08-01', '2023-08-30', 4),
('The Hitchhikers Guide to the Galaxy', 'Douglas Adams', 'Science Fiction', '2023-10-01', '2023-10-10', 5),
('Factfulness', 'Hans Rosling', 'Non-fiction', '2023-07-01', '2023-07-25', 4)
]
cursor.executemany("INSERT OR IGNORE INTO books_read (title, author, genre, start_date, end_date, rating) VALUES (?, ?, ?, ?, ?, ?)", books_data)
conn.commit()
conn.close()
print("Sample data added/verified.")
if __name__ == '__main__':
setup_database()
Run this script once to initialize your database:
python db_setup.py
Database 'life_data.db' and tables created/verified successfully.
Sample data added/verified.
You should now have a life_data.db
file in your directory.
The GPT-SQL Connection: Bridging Natural Language and Data
This is where the magic happens. We’ll build a Python script that does the following:
- Gets the schema of your SQLite tables. GPT needs to know what tables and columns exist.
- Takes a natural language query from you.
- Sends the schema and your query to GPT, asking it to generate a SQL query.
- Executes the generated SQL query against your SQLite database.
- Presents the results.
1. Getting Database Schema for GPT
GPT needs context. It can’t generate correct SQL if it doesn’t know your table names, column names, and their types. We’ll write a helper function to extract this.
# Part of your main script (e.g., query_tool.py)
import sqlite3
DB_FILE = 'life_data.db'
def get_table_schema(table_name):
"""
Retrieves the schema for a given table from the SQLite database.
Returns a string representation suitable for GPT.
"""
conn = sqlite3.connect(DB_FILE)
cursor = conn.cursor()
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
conn.close()
schema_parts = []
for col in columns:
cid, name, col_type, not_null, default_val, pk = col
pk_str = " PRIMARY KEY" if pk else ""
nn_str = " NOT NULL" if not_null and not pk else "" # Don't double-state NOT NULL for PK
schema_parts.append(f"{name} {col_type}{nn_str}{pk_str}")
return f"CREATE TABLE {table_name} ({', '.join(schema_parts)});"
def get_full_db_schema():
"""
Retrieves the schema for all tables in the database.
Returns a dictionary mapping table names to their CREATE TABLE statements.
"""
conn = sqlite3.connect(DB_FILE)
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = [row[0] for row in cursor.fetchall()]
conn.close()
full_schema = {}
for table in tables:
if table != 'sqlite_sequence': # Ignore internal SQLite table
full_schema[table] = get_table_schema(table)
return full_schema
2. Crafting the GPT Prompt
This is critical. We need to tell GPT:
- Its role: A SQL expert.
- The format: ONLY SQL. No explanations, no extra text.
- The available schema.
- The user’s question.
We’ll use the gpt-3.5-turbo
model for this.
# Part of your main script (e.g., query_tool.py)
from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv() # Load environment variables from .env file
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
)
def generate_sql_query(user_question, db_schema):
"""
Sends the user's question and database schema to GPT
to generate an SQL query.
"""
schema_str = "\n".join(db_schema.values())
system_message = {
"role": "system",
"content": (
"You are an expert SQLite database administrator. "
"Your task is to convert natural language questions into valid SQLite SQL queries. "
"You MUST ONLY return the SQL query, and nothing else. "
"Do not include any explanations, preambles, or additional text. "
"Always include a LIMIT clause in your SELECT queries to prevent excessively large results. "
"The available tables and their schemas are:\n"
f"{schema_str}"
)
}
user_message = {
"role": "user",
"content": user_question
}
try:
response = client.chat.completions.create(
model="gpt-3.5-turbo", # You can try "gpt-4" for better accuracy if available
messages=[system_message, user_message],
temperature=0.1, # Keep temperature low for deterministic SQL generation
max_tokens=200 # Sufficient for most queries
)
sql_query = response.choices[0].message.content.strip()
# Basic cleanup: Sometimes GPT might add ```sql ... ```
if sql_query.startswith("```sql") and sql_query.endswith("```"):
sql_query = sql_query[len("```sql"): -len("```")].strip()
elif sql_query.startswith("```") and sql_query.endswith("```"):
sql_query = sql_query[len("```"): -len("```")].strip()
return sql_query
except Exception as e:
print(f"Error generating SQL: {e}")
return None
Note on temperature
: A low temperature
(e.g., 0.1) makes GPT’s output more deterministic and focused, which is ideal when you want precise SQL. A higher temperature
might lead to more creative but less accurate queries.
3. Executing SQL and Displaying Results
Finally, we need a function to execute the generated SQL against our database and present the results in a readable format.
# Part of your main script (e.g., query_tool.py)
def execute_sql_query(sql_query):
"""Executes a given SQL query against the SQLite database and returns results."""
conn = sqlite3.connect(DB_FILE)
cursor = conn.cursor()
try:
cursor.execute(sql_query)
rows = cursor.fetchall()
# Get column names for better output
column_names = [description[0] for description in cursor.description]
return column_names, rows
except sqlite3.Error as e:
print(f"SQLite error: {e}")
return [], []
finally:
conn.close()
def format_results(column_names, rows):
"""Formats query results for display."""
if not column_names:
return "No results or query failed."
# Determine maximum width for each column
col_widths = [len(col) for col in column_names]
for row in rows:
for i, val in enumerate(row):
col_widths[i] = max(col_widths[i], len(str(val)))
# Header
header = " | ".join(col.ljust(width) for col, width in zip(column_names, col_widths))
separator = "-+-".join("-" * width for width in col_widths)
# Data rows
data_rows = []
for row in rows:
data_rows.append(" | ".join(str(val).ljust(width) for val, width in zip(row, col_widths)))
return "\n".join([header, separator] + data_rows)
4. Putting It All Together: The Interactive Query Tool
Let’s combine all the pieces into an interactive command-line tool.
query_tool.py
:
import sqlite3
import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv() # Load environment variables from .env file
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
)
DB_FILE = 'life_data.db'
def get_table_schema(table_name):
"""
Retrieves the schema for a given table from the SQLite database.
Returns a string representation suitable for GPT.
"""
conn = sqlite3.connect(DB_FILE)
cursor = conn.cursor()
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
conn.close()
schema_parts = []
for col in columns:
cid, name, col_type, not_null, default_val, pk = col
pk_str = " PRIMARY KEY" if pk else ""
nn_str = " NOT NULL" if not_null and not pk else ""
schema_parts.append(f"{name} {col_type}{nn_str}{pk_str}")
return f"CREATE TABLE {table_name} ({', '.join(schema_parts)});"
def get_full_db_schema():
"""
Retrieves the schema for all tables in the database.
Returns a dictionary mapping table names to their CREATE TABLE statements.
"""
conn = sqlite3.connect(DB_FILE)
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = [row[0] for row in cursor.fetchall()]
conn.close()
full_schema = {}
for table in tables:
if table != 'sqlite_sequence': # Ignore internal SQLite table
full_schema[table] = get_table_schema(table)
return full_schema
def generate_sql_query(user_question, db_schema):
"""
Sends the user's question and database schema to GPT
to generate an SQL query.
"""
schema_str = "\n".join(db_schema.values())
system_message = {
"role": "system",
"content": (
"You are an expert SQLite database administrator. "
"Your task is to convert natural language questions into valid SQLite SQL queries. "
"You MUST ONLY return the SQL query, and nothing else. "
"Do not include any explanations, preambles, or additional text. "
"Always include a LIMIT clause in your SELECT queries unless specifically asked for all records. "
"If the query involves dates, ensure they are handled correctly for TEXT columns (e.g., 'YYYY-MM-DD'). "
"The available tables and their schemas are:\n"
f"{schema_str}"
)
}
user_message = {
"role": "user",
"content": user_question
}
try:
response = client.chat.completions.create(
model="gpt-3.5-turbo", # You can try "gpt-4" for better accuracy if available
messages=[system_message, user_message],
temperature=0.1, # Keep temperature low for deterministic SQL generation
max_tokens=200 # Sufficient for most queries
)
sql_query = response.choices[0].message.content.strip()
# Basic cleanup: Sometimes GPT might add ```sql ... ```
if sql_query.startswith("```sql") and sql_query.endswith("```"):
sql_query = sql_query[len("```sql"): -len("```")].strip()
elif sql_query.startswith("```") and sql_query.endswith("```"):
sql_query = sql_query[len("```"): -len("```")].strip()
return sql_query
except Exception as e:
print(f"Error generating SQL: {e}")
return None
def execute_sql_query(sql_query):
"""Executes a given SQL query against the SQLite database and returns results."""
conn = sqlite3.connect(DB_FILE)
cursor = conn.cursor()
try:
cursor.execute(sql_query)
rows = cursor.fetchall()
# Get column names for better output
column_names = [description[0] for description in cursor.description]
return column_names, rows
except sqlite3.Error as e:
print(f"SQLite error: {e}")
return [], []
finally:
conn.close()
def format_results(column_names, rows):
"""Formats query results for display."""
if not column_names:
return "No results or query failed."
# Determine maximum width for each column
col_widths = [len(col) for col in column_names]
if rows: # Only iterate if there are rows to avoid IndexError on empty result sets
for row in rows:
for i, val in enumerate(row):
col_widths[i] = max(col_widths[i], len(str(val)))
# Header
header = " | ".join(col.ljust(width) for col, width in zip(column_names, col_widths))
separator = "-+-".join("-" * width for width in col_widths)
# Data rows
data_rows = []
for row in rows:
data_rows.append(" | ".join(str(val).ljust(width) for val, width in zip(row, col_widths)))
return "\n".join([header, separator] + data_rows)
def main():
print("Welcome to your Life Data Query Tool!")
print("Type your questions in natural language. Type 'exit' to quit.")
db_schema = get_full_db_schema()
while True:
user_question = input("\nYour question: ").strip()
if user_question.lower() == 'exit':
break
print(f"Thinking... (Generating SQL for: '{user_question}')")
sql_query = generate_sql_query(user_question, db_schema)
if sql_query:
print(f"Generated SQL: {sql_query}")
column_names, results = execute_sql_query(sql_query)
print("\n--- Results ---")
print(format_results(column_names, results))
print("---------------")
else:
print("Could not generate a valid SQL query. Please try rephrasing.")
print("Exiting. Goodbye!")
if __name__ == '__main__':
main()
Running Your Life Data Query Tool
Make sure you have your life_data.db
and .env
files in the same directory as query_tool.py
.
python query_tool.py
Now, let’s try some queries!
Example 1: Basic Journal Query
python query_tool.py
Welcome to your Life Data Query Tool!
Type your questions in natural language. Type 'exit' to quit.
Your question: show me all my journal entries
Thinking... (Generating SQL for: 'show me all my journal entries')
Generated SQL: SELECT id, entry_date, title, content FROM journal_entries LIMIT 10;
--- Results ---
id | entry_date | title | content
---+------------+-----------------+--------------------------------------------------------------------------------------
1 | 2023-10-25 | Morning Thoughts| Woke up feeling refreshed, had a great idea for a new blog post. Decided to write about GPT and SQLite.
2 | 2023-10-26 | Project Update | Spent most of the day coding the backend for the new project. Ran into a few bugs but fixed them.
3 | 2023-10-26 | Evening Walk | Took a long walk in the park. Saw some interesting birds. Good for clearing my head.
4 | 2023-10-27 | Blog Post Progress| Made significant progress on the GPT-SQLite blog post. Examples are coming along nicely.
5 | 2023-09-15 | Old Idea | Had an idea for a simple CLI tool for tracking habits. Maybe revisit later.
---------------
Example 2: Filtered Journal Query
Your question: journal entries from October 26, 2023
Thinking... (Generating SQL for: 'journal entries from October 26, 2023')
Generated SQL: SELECT id, entry_date, title, content FROM journal_entries WHERE entry_date = '2023-10-26' LIMIT 10;
--- Results ---
id | entry_date | title | content
---+------------+---------------+-----------------------------------------------------------------------
2 | 2023-10-26 | Project Update| Spent most of the day coding the backend for the new project. Ran into a few bugs but fixed them.
3 | 2023-10-26 | Evening Walk | Took a long walk in the park. Saw some interesting birds. Good for clearing my head.
---------------
Example 3: Querying Books
Your question: what science fiction books have I read?
Thinking... (Generating SQL for: 'what science fiction books have I read?')
Generated SQL: SELECT title, author, end_date, rating FROM books_read WHERE genre = 'Science Fiction' LIMIT 10;
--- Results ---
title | author | end_date | rating
------------------------------------+----------------+------------+-------
Dune | Frank Herbert | 2023-09-20 | 5
The Hitchhikers Guide to the Galaxy | Douglas Adams | 2023-10-10 | 5
---------------
Example 4: Aggregation Query
Your question: how many books did I read in September 2023?
Thinking... (Generating SQL for: 'how many books did I read in September 2023?')
Generated SQL: SELECT COUNT(*) FROM books_read WHERE end_date LIKE '2023-09-%' LIMIT 10;
--- Results ---
COUNT(*)
--------
1
---------------
This demonstrates GPT’s ability to understand date patterns and aggregate functions when given the schema context.
Advanced Considerations and Tips
- Error Handling and Robustness: The current script provides basic error messages. For production use, you’d want more sophisticated error handling, including retries for API calls, more precise SQLite error reporting, and validation of generated SQL before execution (e.g., using a dry run or checking for destructive commands like
DROP TABLE
). - Prompt Engineering: Experiment with your
system_message
.- You might add instructions like “Prioritize searching by date if a date is mentioned.”
- “If an exact date is given, use ‘=’, otherwise use ‘LIKE’ for year/month patterns.”
- “Always order results by date descending if a date column is present.”
- Caution: The more complex your instructions, the more tokens they consume, potentially increasing cost and inference time.
- Preventing Destructive Queries: GPT is powerful. It could generate
DROP TABLE
orDELETE FROM
statements. The current system message (You MUST ONLY return the SQL query, and nothing else.
) is a deterrent, but not a guarantee.- Safety Tip: If this were a user-facing tool, you’d implement a SQL parser to specifically disallow destructive commands, or run queries in a read-only mode, or require user confirmation for any
UPDATE
/DELETE
/INSERT
queries. For personal use, be mindful of what you type and review theGenerated SQL
before interpreting results.
- Safety Tip: If this were a user-facing tool, you’d implement a SQL parser to specifically disallow destructive commands, or run queries in a read-only mode, or require user confirmation for any
- Data Types and Formatting: SQLite is flexible, but dates (
TEXT
) can be tricky for GPT. Explicitly telling GPT the format (YYYY-MM-DD
) in the prompt helps. - Adding More Tables: Want to track expenses, habits, or fitness data? Just create new tables in
db_setup.py
, add them tolife_data.db
, andget_full_db_schema()
will automatically include them in the context sent to GPT. - Offline Mode / Caching: For common queries, you might cache GPT’s SQL generation or pre-define certain queries to reduce API calls and latency.
- Data Privacy: By keeping
life_data.db
local and only sending schema + a single natural language question to OpenAI, you minimize data exposure. No actual life data leaves your machine.
Conclusion
You’ve now built a powerful, private, and extensible system to query your personal data using natural language. This combination of SQLite’s robust local storage and GPT’s intelligent language processing opens up entirely new ways to interact with your own information.
From tracking habits to analyzing past decisions, your life data is no longer siloed or hard to access. It’s a database, and you’re the natural language wizard querying it. This is just the beginning of how AI can empower us to better understand our own lives. Go forth and query!