Q1. What is sentiment analysis?
NLP task that classifies text as positive, negative, or neutral. Used for: customer feedback analysis, brand monitoring, product review processing.
Q2. What model are you using and why?
DistilBERT fine-tuned on SST-2 (Stanford Sentiment Treebank). DistilBERT is 40% smaller than BERT, 60% faster, retains 97% accuracy. Good balance of speed and quality.
Q3. What is @st.cache_resource in Streamlit?
Caches a resource (like a loaded ML model) so it is not reloaded on every user interaction. Models are expensive to load — caching means they load once per session.
Q4. How would you handle non-English text?
Use a multilingual model like nlptown/bert-base-multilingual-uncased-sentiment, or add a translation step using googletrans before passing to the English sentiment model.
Q5. What are the limitations of this sentiment analyzer?
Pre-trained on movie reviews (SST-2), may not generalise well to technical, medical, or domain-specific text. Cannot handle sarcasm. Does not understand context across multiple sentences.
Q6. How would you evaluate if the model is accurate for your use case?
Collect 200-500 domain-specific examples, manually label them, run the model, compute precision/recall/F1. If accuracy < 80%, fine-tune on domain data.
Q7. What is the difference between rule-based and ML-based sentiment analysis?
Rule-based: predefined word lists (VADER, TextBlob). Fast, interpretable, but misses context. ML-based: learns patterns from data, handles context, sarcasm better, but needs training data.
Q8. How do you handle very long texts?
Transformers have a 512 token limit. For longer texts: chunk into sentences, classify each, aggregate (majority vote or average confidence). Or use longformer which handles longer inputs.
Q9. What would you add to make this production-ready?
Authentication, rate limiting, API endpoint (FastAPI), database for storing results, monitoring for model drift, batch processing queue (Celery), cost management.
Q10. How would you add neutral sentiment?
Switch to a 3-class model like cardiffnlp/twitter-roberta-base-sentiment-latest which predicts positive/neutral/negative. Or threshold: scores 0.4-0.6 = neutral.