Sentiment Analyzer Web App

what you build

A text sentiment classification tool that works on single texts or uploaded CSV files.

Single text input: type text, get instant sentiment + confidence score
Batch CSV upload: process thousands of reviews, tweets, or feedback at once
Interactive charts: sentiment distribution pie chart, score histogram
Download results as CSV with sentiment labels and confidence scores
Supports multiple pre-trained models from HuggingFace hub

prerequisites

Before You Start

Python basics (functions, file I/O)
Basic pandas for data manipulation
Understanding of what a classification model does

architecture

How It Works

📝 Text Input
or CSV Upload

→

🤗 HuggingFace
Transformer Model

→

📊 Sentiment Score
(Positive/Negative)

→

📈 Streamlit Dashboard
+ CSV Export

3-week plan

Milestone Breakdown

Week 1

Core Classifier

Install transformers and load distilbert-sst2 sentiment model
Build single text classification function with confidence score
Test with 20 example sentences to verify accuracy

Week 2

Batch Processing + Streamlit UI

Implement CSV upload with pandas (handle text column detection)
Process batch with progress bar (@st.cache_data for speed)
Build Streamlit UI: text input tab + CSV upload tab

Week 3

Charts + Deploy

Add plotly pie chart for sentiment distribution
Add confidence score histogram
Download button for results CSV
Deploy to Streamlit Cloud

key code

Core Implementation

sentiment.py

from transformers import pipeline
import streamlit as st

# Load model once at startup
@st.cache_resource
def load_model():
    return pipeline("sentiment-analysis",
                     model="distilbert-base-uncased-finetuned-sst-2-english")

classifier = load_model()

# Classify text
result = classifier("This product is absolutely amazing!")[0]
# {'label': 'POSITIVE', 'score': 0.9998}

deployment

Deploy to Streamlit Cloud (Free)

deploy.sh

# requirements.txt
transformers streamlit pandas plotly torch

# share.streamlit.io → New app → connect GitHub repo
# Main file path: app.py → Deploy

Create a free account at share.streamlit.io, connect your GitHub repo, and deploy in one click. Your app gets a public URL instantly.

viva prep

10 Viva Questions with Answers

Q1. What is sentiment analysis?

NLP task that classifies text as positive, negative, or neutral. Used for: customer feedback analysis, brand monitoring, product review processing.

Q2. What model are you using and why?

DistilBERT fine-tuned on SST-2 (Stanford Sentiment Treebank). DistilBERT is 40% smaller than BERT, 60% faster, retains 97% accuracy. Good balance of speed and quality.

Q3. What is @st.cache_resource in Streamlit?

Caches a resource (like a loaded ML model) so it is not reloaded on every user interaction. Models are expensive to load — caching means they load once per session.

Q4. How would you handle non-English text?

Use a multilingual model like nlptown/bert-base-multilingual-uncased-sentiment, or add a translation step using googletrans before passing to the English sentiment model.

Q5. What are the limitations of this sentiment analyzer?

Pre-trained on movie reviews (SST-2), may not generalise well to technical, medical, or domain-specific text. Cannot handle sarcasm. Does not understand context across multiple sentences.

Q6. How would you evaluate if the model is accurate for your use case?

Collect 200-500 domain-specific examples, manually label them, run the model, compute precision/recall/F1. If accuracy < 80%, fine-tune on domain data.

Q7. What is the difference between rule-based and ML-based sentiment analysis?

Rule-based: predefined word lists (VADER, TextBlob). Fast, interpretable, but misses context. ML-based: learns patterns from data, handles context, sarcasm better, but needs training data.

Q8. How do you handle very long texts?

Transformers have a 512 token limit. For longer texts: chunk into sentences, classify each, aggregate (majority vote or average confidence). Or use longformer which handles longer inputs.

Q9. What would you add to make this production-ready?

Authentication, rate limiting, API endpoint (FastAPI), database for storing results, monitoring for model drift, batch processing queue (Celery), cost management.

Q10. How would you add neutral sentiment?

Switch to a 3-class model like cardiffnlp/twitter-roberta-base-sentiment-latest which predicts positive/neutral/negative. Or threshold: scores 0.4-0.6 = neutral.