Personal Expense Tracker with AI

what you build

An intelligent expense tracker where you enter expenses in natural language and the AI automatically categorises them and generates financial insights.

Natural language expense entry: "coffee at Starbucks ₹250" → auto-categorised
LLM-powered categorisation (Food, Transport, Entertainment, etc.)
SQLite database for persistent storage across sessions
Monthly summary with interactive plotly charts (pie, bar, trend)
AI-generated insights: top spending category, comparison to last month, savings tips
CSV export of all transactions

prerequisites

Before You Start

Python basics (functions, file I/O)
Basic SQL understanding (SELECT, INSERT)
Familiarity with OpenAI API or willingness to learn

architecture

How It Works

📝 Natural Language
Expense Input

→

🤖 LLM Categorisation
(GPT-4o-mini)

→

🗄 SQLite Storage
(persistent)

→

📊 Plotly Charts
+ AI Insights

3-week plan

Milestone Breakdown

Week 1

Database + LLM Categorisation

Set up SQLite database (expenses table: date, description, amount, category)
Build LLM categorisation function with 10 standard categories
Test with 50 diverse expense descriptions

Week 2

Streamlit UI + Charts

Build expense entry form with date, description, amount
Display expense history table with edit/delete
Monthly pie chart and weekly trend bar chart with plotly

Week 3

AI Insights + Deploy

Generate monthly summary with LLM (top categories, unusual spends)
Add budget alerts (warn when category exceeds threshold)
Deploy to Streamlit Cloud

key code

Core Implementation

categorise.py

import sqlite3
from openai import OpenAI

client = OpenAI()
CATEGORIES = ["Food", "Transport", "Shopping", "Entertainment",
              "Utilities", "Health", "Education", "Other"]

def categorise_expense(description: str) -> str:
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role":"user",
                   "content":f"Categorise: '{description}'. Choose from: {CATEGORIES}. Reply with category only."}]
    )
    category = resp.choices[0].message.content.strip()
    return category if category in CATEGORIES else "Other"

deployment

Deploy to Streamlit Cloud (Free)

deploy.sh

# requirements.txt: openai streamlit pandas plotly
# Set OPENAI_API_KEY in Streamlit secrets:
# share.streamlit.io → App → Settings → Secrets → OPENAI_API_KEY = "sk-..."

Create a free account at share.streamlit.io, connect your GitHub repo, and deploy in one click. Your app gets a public URL instantly.

viva prep

10 Viva Questions with Answers

Q1. Why use SQLite for storage instead of a CSV file?

SQLite supports concurrent reads, ACID transactions, SQL queries for aggregation, and handles large datasets efficiently. CSV requires loading everything into memory for every query.

Q2. How do you prevent the LLM from returning an invalid category?

Validate: if response not in CATEGORIES, default to "Other". Add few-shot examples in the prompt showing correct category selection. Use response_format=json_object for structured output.

Q3. What is the cost of categorising 100 expenses with gpt-4o-mini?

Each categorisation: ~50 input tokens + 5 output tokens. 100 expenses: ~5500 tokens total. At $0.15/1M input: approximately $0.001. Very cheap for personal use.

Q4. How would you make this work without an OpenAI API key?

Use a free local model via Ollama (llama3.2:1b). Or use a rule-based fallback: keyword matching (coffee→Food, uber→Transport). Or HuggingFace zero-shot classification.

Q5. How does your app handle multi-user scenarios?

SQLite is single-writer — fine for personal use. For multi-user: PostgreSQL (Supabase), add user_id column, Supabase auth for per-user data isolation.

Q6. What data visualisations did you implement and why?

Pie chart for category breakdown (proportion), bar chart for monthly comparison (trend), line chart for daily spending (pattern). Each tells a different story about spending behavior.

Q7. How would you add budget tracking?

Budget table in SQLite: category, monthly_limit. Compare actual spending per category to limit. Show traffic light indicator (green/amber/red) based on % of budget used.

Q8. What are the security considerations for storing financial data?

Encrypt the SQLite database (sqlcipher). Never log transaction descriptions to console. Use Supabase RLS if multi-user. Don't include API keys in source code (use environment variables/secrets).

Q9. How would you test this application?

Unit tests for categorise_expense (mock OpenAI API), integration tests for SQLite operations, UI tests with Streamlit's testing framework. Test edge cases: duplicate entries, very large amounts, special characters.

Q10. What ML techniques could replace the LLM for categorisation?

Train a text classifier (TF-IDF + LogReg) on a labelled expense dataset. Rule-based with keyword matching + regex. Embedding similarity to category descriptions. LLM is easiest but costliest.