Featured Major Project Kit ~6 Weeks Intermediate

Document Q&A Assistant

Build a web app that answers questions from uploaded PDF and text documents using Retrieval-Augmented Generation (RAG). Suitable for final-year B.Tech, BCA, or MCA submission.

Python 3.11+ FastAPI OpenAI API ChromaDB RAG LangChain HTML/CSS Docker
Source code + README Architecture diagram 6-milestone breakdown Deployment guide (Render/Railway) Report template (A4) PPT deck (10 slides) 20 viva Q&A Testing checklist
Start Building → See Viva Q&A Learn RAG First

What You're Building

Final-year student Arjun needs a real project he can actually explain — not a code copy he found online. This project gives him a working RAG system with a clean API, a simple web interface, and a complete explanation path from architecture to deployment.

Core Features

  • Upload PDF or text documents via web interface or API
  • Documents are chunked, embedded, and stored in ChromaDB
  • Natural language Q&A powered by GPT-4o-mini
  • Answers cite the source chunk from the document
  • Admin interface to manage uploaded documents
  • Deployable on Render.com with a live URL for demo

Prerequisites

SkillLevel needed
PythonIntermediate (functions, classes, async)
REST APIsBasic (what a GET/POST is)
OpenAI APIBeginner (just need an API key)
Git / GitHubBasic (commit, push)
Terminal / BashBasic (cd, pip, run commands)

No ML theory required. No GPU needed. Runs on a laptop.

Architecture

// Indexing Pipeline (runs once per document)

PDF / TXT Upload
PyPDF2 / langchain text splitter
text-embedding-3-small
ChromaDB (vector store)

// Query Pipeline (runs on each question)

User Question
Embed Query
Top-3 Chunks (similarity search)
GPT-4o-mini + Context
Cited Answer

// API Layer

HTML Frontend
FastAPI Backend
ChromaDB + OpenAI

Milestone Breakdown

Each milestone produces working, demonstrable code. You can pause at any milestone and have something to show.

01
Week 1

Project Setup & Document Parsing

Create the project structure, install dependencies, set up the OpenAI API key, parse a PDF, and split it into chunks. Output: a Python script that prints all text chunks from a PDF.

  • Set up virtual environment and install fastapi, chromadb, openai, pypdf2
  • Write a PDF loader that extracts and cleans text
  • Implement recursive text splitting with chunk size 500, overlap 100
  • Print the first 5 chunks to verify correctness
02
Week 2

Embedding & Vector Storage

Embed all document chunks using OpenAI's text-embedding-3-small model. Store in ChromaDB with document metadata. Output: a persistent ChromaDB collection you can query.

  • Set up ChromaDB with persistent storage
  • Batch-embed chunks (handle API rate limits)
  • Store embeddings with chunk_id, doc_name, page_num metadata
  • Verify: run a similarity query and print top-3 results
03
Week 3

RAG Query Engine

Build the retrieval + generation pipeline. Output: a Python function that takes a question string and returns a grounded answer with cited source chunk.

  • Embed the query and retrieve top-k chunks from ChromaDB
  • Construct a system prompt that injects the retrieved context
  • Call GPT-4o-mini with the context-enriched prompt
  • Return answer + source chunk reference in JSON format
04
Week 4

FastAPI Backend

Wrap the RAG engine in a REST API. Output: a running FastAPI app with /upload, /query, and /documents endpoints.

  • POST /upload — accept a PDF file, run indexing pipeline, return doc_id
  • POST /query — accept question + doc_id, return answer with citations
  • GET /documents — list all indexed documents
  • DELETE /documents/{doc_id} — remove a document and its vectors
05
Week 5

Frontend Interface

Build a minimal HTML/CSS/JS frontend. Output: a web page that lets you upload a document and ask questions — without needing cURL or Postman.

  • File upload form with drag-and-drop support
  • Chat-style Q&A interface
  • Display answer with source chunk highlighted
  • Serve frontend as static files from FastAPI
06
Week 6

Deployment & Demo Polish

Deploy to Render.com or Railway. Add a Dockerfile. Write the README. Prepare the demo script. Output: a live public URL you can share with your college.

  • Dockerfile with Python 3.11-slim base
  • Environment variables for OPENAI_API_KEY
  • Deploy to Render using Free tier
  • Write README with setup, API docs, and demo GIF
  • Record a 2-minute demo screencast

Key Code Patterns

rag_engine.py — query function
from openai import OpenAI
import chromadb

client     = OpenAI()
chroma     = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_or_create_collection("documents")

def embed_text(text: str) -> list[float]:
    res = client.embeddings.create(
        input=text, model="text-embedding-3-small"
    )
    return res.data[0].embedding

def query(question: str, doc_id: str, k: int = 3) -> dict:
    q_vec = embed_text(question)
    results = collection.query(
        query_embeddings=[q_vec],
        n_results=k,
        where={"doc_id": doc_id}
    )
    context = "\n---\n".join(results["documents"][0])
    sources = results["metadatas"][0]

    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content":
                "Answer ONLY using the context below. If unsure, say so."},
            {"role": "user", "content":
                f"Context:\n{context}\n\nQuestion: {question}"}
        ]
    )
    return {
        "answer": resp.choices[0].message.content,
        "sources": sources
    }
main.py — FastAPI endpoints
from fastapi import FastAPI, UploadFile, File
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from rag_engine import query, index_document

app = FastAPI(title="Document Q&A API")
app.mount("/ui", StaticFiles(directory="frontend"), name="ui")

class QueryRequest(BaseModel):
    question: str
    doc_id:   str

@app.post("/upload")
async def upload(file: UploadFile = File(...)):
    content = await file.read()
    doc_id  = await index_document(content, file.filename)
    return {"doc_id": doc_id, "filename": file.filename}

@app.post("/query")
async def ask(req: QueryRequest):
    result = query(req.question, req.doc_id)
    return result

Deploy to Render.com (Free)

1. Add Dockerfile

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0"]

2. Render Configuration

  • Connect GitHub repo to Render
  • Add environment variable: OPENAI_API_KEY
  • Select "Docker" as runtime
  • Free tier: deploy in ~3 minutes
  • Live URL: yourapp.onrender.com

20 Viva Questions with Answers

These are the questions your evaluator is most likely to ask. Understand these — don't memorize them word-for-word.

Q1: What is RAG and why did you choose it over direct LLM answering?
RAG stands for Retrieval-Augmented Generation. Instead of asking the LLM to answer from its training data (which may be outdated or wrong), RAG retrieves relevant context from your own documents and injects it into the prompt. This grounds the answer in the actual document, reduces hallucinations, and allows domain-specific Q&A without retraining the model.
Q2: What is a vector embedding? Why do we use cosine similarity instead of keyword matching?
A vector embedding is a dense numerical representation (an array of floats) of text, where semantically similar texts are close in vector space. Cosine similarity measures the angle between two vectors — it finds semantically similar text even when the exact words are different. Keyword matching (like Ctrl+F) fails when the user asks "What is the penalty for late payment?" if the document says "charges apply for delayed remittance."
Q3: What is ChromaDB and why did you use it?
ChromaDB is an open-source vector database optimized for storing and searching embedding vectors. It supports persistent local storage, metadata filtering, and fast approximate nearest-neighbor search (using HNSW). I used it because it's Python-native, requires no separate server setup, and is free — ideal for a student project and small-to-medium deployments.
Q4: How did you chunk the document? What chunk size and overlap did you use?
I used recursive character text splitting with a chunk size of 500 characters and an overlap of 100 characters. Chunking splits the document into manageable pieces small enough to fit in the context window alongside other chunks. Overlap ensures that context at chunk boundaries isn't lost — a sentence split across two chunks will still appear fully in at least one.
Q5: What model did you use for embeddings and why?
I used OpenAI's text-embedding-3-small. It produces 1536-dimensional embeddings, is significantly cheaper than text-embedding-ada-002, and benchmarks higher on most retrieval tasks. For a production system with cost constraints, it's the recommended choice. text-embedding-3-large would give slightly better quality at 3× the cost.
Q6: How does your system handle hallucinations?
Three ways: First, the system prompt explicitly instructs the LLM to answer ONLY from the provided context and to say "I don't know" if the answer isn't in the document. Second, I return the source chunk alongside the answer so the user can verify. Third, I use a low temperature (0.1) to make the model more deterministic and less creative.
Q7: What is FastAPI and why not Flask?
FastAPI is a modern Python web framework that is async by default, generates OpenAPI documentation automatically, and validates request/response data using Pydantic models. It's 2–3× faster than Flask for I/O-bound operations (like waiting for OpenAI API calls) because of async support. Flask is synchronous by default and doesn't have built-in data validation.
Q8: What is the difference between gpt-4o-mini and gpt-4o?
gpt-4o-mini is a smaller, much cheaper version of GPT-4o. It's approximately 15× cheaper per token and responds faster. For simple Q&A tasks where the context is already retrieved and structured, it performs nearly as well as gpt-4o. I used gpt-4o-mini to keep costs low — for a student project with limited API budget, this is the right choice.
Q9: What are the limitations of your system?
Key limitations: (1) Can only answer from uploaded documents — it doesn't have internet access or general knowledge beyond what's in the docs. (2) Chunking can split context across chunks, occasionally losing context for complex multi-sentence answers. (3) The free Render tier has cold starts — first request after idle takes ~10 seconds. (4) ChromaDB's local storage isn't horizontally scalable — for production at scale, Pinecone or pgvector would be better.
Q10: How would you extend this project?
Several directions: (1) Multi-document support with document-level filtering. (2) Conversation memory — use a LangChain ConversationBufferMemory to let users ask follow-up questions. (3) Hybrid search — combine vector search with BM25 keyword search for better recall. (4) Answer evaluation — add a faithfulness score using RAGAs to automatically measure answer quality. (5) User authentication — add Supabase auth so each user sees only their own documents.
Full Viva Guide: The complete kit includes 20 viva questions with detailed answers, plus common follow-up questions for each answer.

Quiz — Document Q&A Assistant

1. Why does the system prompt tell the LLM to answer "ONLY using the context below"?

2. What does chunk overlap in text splitting prevent?

3. The RAG system uses vector similarity search (not keyword search) because:

4. Why use gpt-4o-mini instead of gpt-4o for this project?

5. What does the DELETE /documents/{doc_id} endpoint need to do for correctness?

🎓

Mark Project Complete

When you've completed all 6 milestones, mark it done and claim your project completion certificate.

💡 Learn the RAG concepts behind this project

The Generative AI course covers RAG architecture, embeddings, vector databases, and LLM evaluation in depth — before you build.

GenAI Course → RAG Topic