Mini Project ~2–3 Weeks Beginner Friendly

Sentiment Analyzer Web App

Build a Streamlit app that classifies text sentiment (positive/negative/neutral) using HuggingFace transformers. Upload CSV for batch analysis.

PythonStreamlitHuggingFacetransformerspandasplotly
Source code + README Milestone breakdown Deployment guide 10 viva Q&A
Start Building → Viva Q&A Learn GenAI First

A text sentiment classification tool that works on single texts or uploaded CSV files.

  • Single text input: type text, get instant sentiment + confidence score
  • Batch CSV upload: process thousands of reviews, tweets, or feedback at once
  • Interactive charts: sentiment distribution pie chart, score histogram
  • Download results as CSV with sentiment labels and confidence scores
  • Supports multiple pre-trained models from HuggingFace hub

Before You Start

  • Python basics (functions, file I/O)
  • Basic pandas for data manipulation
  • Understanding of what a classification model does

How It Works

📝 Text Input
or CSV Upload
🤗 HuggingFace
Transformer Model
📊 Sentiment Score
(Positive/Negative)
📈 Streamlit Dashboard
+ CSV Export

Milestone Breakdown

1
Week 1
Core Classifier
  • Install transformers and load distilbert-sst2 sentiment model
  • Build single text classification function with confidence score
  • Test with 20 example sentences to verify accuracy
2
Week 2
Batch Processing + Streamlit UI
  • Implement CSV upload with pandas (handle text column detection)
  • Process batch with progress bar (@st.cache_data for speed)
  • Build Streamlit UI: text input tab + CSV upload tab
3
Week 3
Charts + Deploy
  • Add plotly pie chart for sentiment distribution
  • Add confidence score histogram
  • Download button for results CSV
  • Deploy to Streamlit Cloud

Core Implementation

sentiment.py
from transformers import pipeline
import streamlit as st

# Load model once at startup
@st.cache_resource
def load_model():
    return pipeline("sentiment-analysis",
                     model="distilbert-base-uncased-finetuned-sst-2-english")

classifier = load_model()

# Classify text
result = classifier("This product is absolutely amazing!")[0]
# {'label': 'POSITIVE', 'score': 0.9998}

Deploy to Streamlit Cloud (Free)

deploy.sh
# requirements.txt
transformers streamlit pandas plotly torch

# share.streamlit.io → New app → connect GitHub repo
# Main file path: app.py → Deploy

Create a free account at share.streamlit.io, connect your GitHub repo, and deploy in one click. Your app gets a public URL instantly.

10 Viva Questions with Answers

Q1. What is sentiment analysis?
NLP task that classifies text as positive, negative, or neutral. Used for: customer feedback analysis, brand monitoring, product review processing.
Q2. What model are you using and why?
DistilBERT fine-tuned on SST-2 (Stanford Sentiment Treebank). DistilBERT is 40% smaller than BERT, 60% faster, retains 97% accuracy. Good balance of speed and quality.
Q3. What is @st.cache_resource in Streamlit?
Caches a resource (like a loaded ML model) so it is not reloaded on every user interaction. Models are expensive to load — caching means they load once per session.
Q4. How would you handle non-English text?
Use a multilingual model like nlptown/bert-base-multilingual-uncased-sentiment, or add a translation step using googletrans before passing to the English sentiment model.
Q5. What are the limitations of this sentiment analyzer?
Pre-trained on movie reviews (SST-2), may not generalise well to technical, medical, or domain-specific text. Cannot handle sarcasm. Does not understand context across multiple sentences.
Q6. How would you evaluate if the model is accurate for your use case?
Collect 200-500 domain-specific examples, manually label them, run the model, compute precision/recall/F1. If accuracy < 80%, fine-tune on domain data.
Q7. What is the difference between rule-based and ML-based sentiment analysis?
Rule-based: predefined word lists (VADER, TextBlob). Fast, interpretable, but misses context. ML-based: learns patterns from data, handles context, sarcasm better, but needs training data.
Q8. How do you handle very long texts?
Transformers have a 512 token limit. For longer texts: chunk into sentences, classify each, aggregate (majority vote or average confidence). Or use longformer which handles longer inputs.
Q9. What would you add to make this production-ready?
Authentication, rate limiting, API endpoint (FastAPI), database for storing results, monitoring for model drift, batch processing queue (Celery), cost management.
Q10. How would you add neutral sentiment?
Switch to a 3-class model like cardiffnlp/twitter-roberta-base-sentiment-latest which predicts positive/neutral/negative. Or threshold: scores 0.4-0.6 = neutral.
🏆

Mark Project Complete

Record your completion and earn your project certificate.