What is a Document Q&A Assistant?

A web app that lets users upload documents and ask questions in natural language. It uses Retrieval-Augmented Generation (RAG) with a vector database and an LLM to answer questions grounded in the document content.

What tech stack does this project use?

Python, FastAPI, OpenAI API (gpt-4o-mini and text-embedding-3-small), ChromaDB for vector storage, and a simple HTML frontend.

Document Q&A Assistant — Final Year AI Project Kit

the story

What You're Building

Final-year student Arjun needs a real project he can actually explain — not a code copy he found online. This project gives him a working RAG system with a clean API, a simple web interface, and a complete explanation path from architecture to deployment.

Core Features

Upload PDF or text documents via web interface or API
Documents are chunked, embedded, and stored in ChromaDB
Natural language Q&A powered by GPT-4o-mini
Answers cite the source chunk from the document
Admin interface to manage uploaded documents
Deployable on Render.com with a live URL for demo

prerequisites

Prerequisites

Skill	Level needed
Python	Intermediate (functions, classes, async)
REST APIs	Basic (what a GET/POST is)
OpenAI API	Beginner (just need an API key)
Git / GitHub	Basic (commit, push)
Terminal / Bash	Basic (cd, pip, run commands)

No ML theory required. No GPU needed. Runs on a laptop.

6-week plan

Milestone Breakdown

Each milestone produces working, demonstrable code. You can pause at any milestone and have something to show.

Week 1

Project Setup & Document Parsing

Create the project structure, install dependencies, set up the OpenAI API key, parse a PDF, and split it into chunks. Output: a Python script that prints all text chunks from a PDF.

Set up virtual environment and install fastapi, chromadb, openai, pypdf2
Write a PDF loader that extracts and cleans text
Implement recursive text splitting with chunk size 500, overlap 100
Print the first 5 chunks to verify correctness

Week 2

Embedding & Vector Storage

Embed all document chunks using OpenAI's text-embedding-3-small model. Store in ChromaDB with document metadata. Output: a persistent ChromaDB collection you can query.

Set up ChromaDB with persistent storage
Batch-embed chunks (handle API rate limits)
Store embeddings with chunk_id, doc_name, page_num metadata
Verify: run a similarity query and print top-3 results

Week 3

RAG Query Engine

Build the retrieval + generation pipeline. Output: a Python function that takes a question string and returns a grounded answer with cited source chunk.

Embed the query and retrieve top-k chunks from ChromaDB
Construct a system prompt that injects the retrieved context
Call GPT-4o-mini with the context-enriched prompt
Return answer + source chunk reference in JSON format

Week 4

FastAPI Backend

Wrap the RAG engine in a REST API. Output: a running FastAPI app with /upload, /query, and /documents endpoints.

POST /upload — accept a PDF file, run indexing pipeline, return doc_id
POST /query — accept question + doc_id, return answer with citations
GET /documents — list all indexed documents
DELETE /documents/{doc_id} — remove a document and its vectors

Week 5

Frontend Interface

Build a minimal HTML/CSS/JS frontend. Output: a web page that lets you upload a document and ask questions — without needing cURL or Postman.

File upload form with drag-and-drop support
Chat-style Q&A interface
Display answer with source chunk highlighted
Serve frontend as static files from FastAPI

Week 6

Deployment & Demo Polish

Deploy to Render.com or Railway. Add a Dockerfile. Write the README. Prepare the demo script. Output: a live public URL you can share with your college.

Dockerfile with Python 3.11-slim base
Environment variables for OPENAI_API_KEY
Deploy to Render using Free tier
Write README with setup, API docs, and demo GIF
Record a 2-minute demo screencast

code patterns

Key Code Patterns

rag_engine.py — query function

from openai import OpenAI
import chromadb

client     = OpenAI()
chroma     = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_or_create_collection("documents")

def embed_text(text: str) -> list[float]:
    res = client.embeddings.create(
        input=text, model="text-embedding-3-small"
    )
    return res.data[0].embedding

def query(question: str, doc_id: str, k: int = 3) -> dict:
    q_vec = embed_text(question)
    results = collection.query(
        query_embeddings=[q_vec],
        n_results=k,
        where={"doc_id": doc_id}
    )
    context = "\n---\n".join(results["documents"][0])
    sources = results["metadatas"][0]

    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content":
                "Answer ONLY using the context below. If unsure, say so."},
            {"role": "user", "content":
                f"Context:\n{context}\n\nQuestion: {question}"}
        ]
    )
    return {
        "answer": resp.choices[0].message.content,
        "sources": sources
    }

main.py — FastAPI endpoints

from fastapi import FastAPI, UploadFile, File
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from rag_engine import query, index_document

app = FastAPI(title="Document Q&A API")
app.mount("/ui", StaticFiles(directory="frontend"), name="ui")

class QueryRequest(BaseModel):
    question: str
    doc_id:   str

@app.post("/upload")
async def upload(file: UploadFile = File(...)):
    content = await file.read()
    doc_id  = await index_document(content, file.filename)
    return {"doc_id": doc_id, "filename": file.filename}

@app.post("/query")
async def ask(req: QueryRequest):
    result = query(req.question, req.doc_id)
    return result

viva prep

20 Viva Questions with Answers

These are the questions your evaluator is most likely to ask. Understand these — don't memorize them word-for-word.

Q1: What is RAG and why did you choose it over direct LLM answering?

RAG stands for Retrieval-Augmented Generation. Instead of asking the LLM to answer from its training data (which may be outdated or wrong), RAG retrieves relevant context from your own documents and injects it into the prompt. This grounds the answer in the actual document, reduces hallucinations, and allows domain-specific Q&A without retraining the model.

Q2: What is a vector embedding? Why do we use cosine similarity instead of keyword matching?

A vector embedding is a dense numerical representation (an array of floats) of text, where semantically similar texts are close in vector space. Cosine similarity measures the angle between two vectors — it finds semantically similar text even when the exact words are different. Keyword matching (like Ctrl+F) fails when the user asks "What is the penalty for late payment?" if the document says "charges apply for delayed remittance."

Q3: What is ChromaDB and why did you use it?

ChromaDB is an open-source vector database optimized for storing and searching embedding vectors. It supports persistent local storage, metadata filtering, and fast approximate nearest-neighbor search (using HNSW). I used it because it's Python-native, requires no separate server setup, and is free — ideal for a student project and small-to-medium deployments.

Q4: How did you chunk the document? What chunk size and overlap did you use?

I used recursive character text splitting with a chunk size of 500 characters and an overlap of 100 characters. Chunking splits the document into manageable pieces small enough to fit in the context window alongside other chunks. Overlap ensures that context at chunk boundaries isn't lost — a sentence split across two chunks will still appear fully in at least one.

Q5: What model did you use for embeddings and why?

I used OpenAI's text-embedding-3-small. It produces 1536-dimensional embeddings, is significantly cheaper than text-embedding-ada-002, and benchmarks higher on most retrieval tasks. For a production system with cost constraints, it's the recommended choice. text-embedding-3-large would give slightly better quality at 3× the cost.

Q6: How does your system handle hallucinations?

Three ways: First, the system prompt explicitly instructs the LLM to answer ONLY from the provided context and to say "I don't know" if the answer isn't in the document. Second, I return the source chunk alongside the answer so the user can verify. Third, I use a low temperature (0.1) to make the model more deterministic and less creative.

Q7: What is FastAPI and why not Flask?

FastAPI is a modern Python web framework that is async by default, generates OpenAPI documentation automatically, and validates request/response data using Pydantic models. It's 2–3× faster than Flask for I/O-bound operations (like waiting for OpenAI API calls) because of async support. Flask is synchronous by default and doesn't have built-in data validation.

Q8: What is the difference between gpt-4o-mini and gpt-4o?

gpt-4o-mini is a smaller, much cheaper version of GPT-4o. It's approximately 15× cheaper per token and responds faster. For simple Q&A tasks where the context is already retrieved and structured, it performs nearly as well as gpt-4o. I used gpt-4o-mini to keep costs low — for a student project with limited API budget, this is the right choice.

Q9: What are the limitations of your system?

Key limitations: (1) Can only answer from uploaded documents — it doesn't have internet access or general knowledge beyond what's in the docs. (2) Chunking can split context across chunks, occasionally losing context for complex multi-sentence answers. (3) The free Render tier has cold starts — first request after idle takes ~10 seconds. (4) ChromaDB's local storage isn't horizontally scalable — for production at scale, Pinecone or pgvector would be better.

Q10: How would you extend this project?

Several directions: (1) Multi-document support with document-level filtering. (2) Conversation memory — use a LangChain ConversationBufferMemory to let users ask follow-up questions. (3) Hybrid search — combine vector search with BM25 keyword search for better recall. (4) Answer evaluation — add a faithfulness score using RAGAs to automatically measure answer quality. (5) User authentication — add Supabase auth so each user sees only their own documents.

Full Viva Guide: The complete kit includes 20 viva questions with detailed answers, plus common follow-up questions for each answer.

Document Q&A Assistant

What You're Building

Core Features

Prerequisites

Architecture

Milestone Breakdown

Project Setup & Document Parsing

Embedding & Vector Storage

RAG Query Engine

FastAPI Backend

Frontend Interface

Deployment & Demo Polish

Key Code Patterns

Deploy to Render.com (Free)

1. Add Dockerfile

2. Render Configuration

20 Viva Questions with Answers

Quiz — Document Q&A Assistant

Mark Project Complete

💡 Learn the RAG concepts behind this project