Build a web app that answers questions from uploaded PDF and text documents using Retrieval-Augmented Generation (RAG). Suitable for final-year B.Tech, BCA, or MCA submission.
Final-year student Arjun needs a real project he can actually explain — not a code copy he found online. This project gives him a working RAG system with a clean API, a simple web interface, and a complete explanation path from architecture to deployment.
| Skill | Level needed |
|---|---|
| Python | Intermediate (functions, classes, async) |
| REST APIs | Basic (what a GET/POST is) |
| OpenAI API | Beginner (just need an API key) |
| Git / GitHub | Basic (commit, push) |
| Terminal / Bash | Basic (cd, pip, run commands) |
No ML theory required. No GPU needed. Runs on a laptop.
// Indexing Pipeline (runs once per document)
// Query Pipeline (runs on each question)
// API Layer
Each milestone produces working, demonstrable code. You can pause at any milestone and have something to show.
Create the project structure, install dependencies, set up the OpenAI API key, parse a PDF, and split it into chunks. Output: a Python script that prints all text chunks from a PDF.
Embed all document chunks using OpenAI's text-embedding-3-small model. Store in ChromaDB with document metadata. Output: a persistent ChromaDB collection you can query.
Build the retrieval + generation pipeline. Output: a Python function that takes a question string and returns a grounded answer with cited source chunk.
Wrap the RAG engine in a REST API. Output: a running FastAPI app with /upload, /query, and /documents endpoints.
Build a minimal HTML/CSS/JS frontend. Output: a web page that lets you upload a document and ask questions — without needing cURL or Postman.
Deploy to Render.com or Railway. Add a Dockerfile. Write the README. Prepare the demo script. Output: a live public URL you can share with your college.
from openai import OpenAI
import chromadb
client = OpenAI()
chroma = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_or_create_collection("documents")
def embed_text(text: str) -> list[float]:
res = client.embeddings.create(
input=text, model="text-embedding-3-small"
)
return res.data[0].embedding
def query(question: str, doc_id: str, k: int = 3) -> dict:
q_vec = embed_text(question)
results = collection.query(
query_embeddings=[q_vec],
n_results=k,
where={"doc_id": doc_id}
)
context = "\n---\n".join(results["documents"][0])
sources = results["metadatas"][0]
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content":
"Answer ONLY using the context below. If unsure, say so."},
{"role": "user", "content":
f"Context:\n{context}\n\nQuestion: {question}"}
]
)
return {
"answer": resp.choices[0].message.content,
"sources": sources
}
from fastapi import FastAPI, UploadFile, File
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from rag_engine import query, index_document
app = FastAPI(title="Document Q&A API")
app.mount("/ui", StaticFiles(directory="frontend"), name="ui")
class QueryRequest(BaseModel):
question: str
doc_id: str
@app.post("/upload")
async def upload(file: UploadFile = File(...)):
content = await file.read()
doc_id = await index_document(content, file.filename)
return {"doc_id": doc_id, "filename": file.filename}
@app.post("/query")
async def ask(req: QueryRequest):
result = query(req.question, req.doc_id)
return result
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0"]
OPENAI_API_KEYyourapp.onrender.comThese are the questions your evaluator is most likely to ask. Understand these — don't memorize them word-for-word.
1. Why does the system prompt tell the LLM to answer "ONLY using the context below"?
2. What does chunk overlap in text splitting prevent?
3. The RAG system uses vector similarity search (not keyword search) because:
4. Why use gpt-4o-mini instead of gpt-4o for this project?
5. What does the DELETE /documents/{doc_id} endpoint need to do for correctness?
The Generative AI course covers RAG architecture, embeddings, vector databases, and LLM evaluation in depth — before you build.
GenAI Course → RAG Topic