Q1. How does conversation memory work in the OpenAI API?
The API is stateless — it has no memory. You implement memory by sending the full conversation history (list of role/content dicts) with every request. The model "remembers" by reading the history.
Q2. What is the difference between system, user, and assistant roles?
system: sets the AI's persona and behaviour (not shown to user). user: the human's input. assistant: the AI's previous responses. All are sent in the messages array.
Q3. How does increasing conversation length affect cost?
You pay for every token sent, including the full history. A 100-message conversation sends all 100 messages with every new request. Cost grows O(n) where n is conversation length.
Q4. How would you handle very long conversations to control cost?
Sliding window: keep only the last N messages. Summarisation: periodically summarise old messages into a compact summary (using the LLM). Truncation: drop oldest messages when approaching context limit.
Q5. What is the context window and why does it matter?
The maximum tokens a model can process in one request. GPT-4o-mini: 128K tokens. If your history exceeds this, you get an error. Track token count with tiktoken library.
Q6. How would you add streaming to show responses word by word?
stream=True in the API call returns a generator. Iterate over chunks: for chunk in stream: print(chunk.choices[0].delta.content, end=""). Users see immediate feedback.
Q7. How do you save and load conversation sessions?
json.dump(history, file) to save. json.load(file) to restore. Include metadata: session_id, created_at, total_tokens. Store in ~/.chatbot/sessions/ for per-user isolation.
Q8. What persona examples would make this useful for students?
ML Tutor (explains concepts with examples), Code Reviewer (reviews Python code, suggests improvements), Study Buddy (quizzes on ML topics, gives hints without answers), Viva Coach (asks hard project questions).
Q9. How would you add a GUI version of this chatbot?
Use Gradio (gr.ChatInterface — 5 lines of code for a web chat UI) or Streamlit with st.chat_message. Both provide browser-based interfaces using the same backend logic.
Q10. What error handling would you add for production?
RateLimitError (exponential backoff retry), APIConnectionError (retry with timeout), AuthenticationError (clear error message). Track and display remaining budget. Gracefully handle empty responses.