CLI Chatbot with Memory

what you build

A terminal-based AI chatbot with persistent memory, multiple personas, and session management — demonstrates LLM API usage without a web framework.

Multi-turn conversation with full history sent to the API
Custom system personas: teacher, code reviewer, study buddy
Save and load conversation sessions to/from JSON files
Commands: /save, /load, /clear, /history, /persona, /quit
Rich terminal UI with colored output (rich library)
Token usage tracking and cost estimation per session

prerequisites

Before You Start

Python basics (functions, JSON, argparse)
Basic understanding of LLM APIs (OpenAI)
Command line comfort

architecture

How It Works

⌨ User Input
(terminal)

→

📝 Message History
(list of dicts)

→

🤖 OpenAI API
(chat.completions)

→

💬 Response Display
(rich library)

→

💾 JSON Session
Save/Load

3-week plan

Milestone Breakdown

Week 1

Core Chat Loop

Build basic REPL loop: input → OpenAI API → print response
Implement conversation history with role: user/assistant
Add system prompt and persona support

Week 2

Commands + Session Management

Implement /save, /load, /clear, /history commands
JSON serialisation of conversation history
Token counting and cost tracking

Week 3

Polish + Documentation

Add rich library for colored terminal output
argparse for --persona and --session CLI flags
README with usage examples

key code

Core Implementation

chatbot.py

from openai import OpenAI
import json

client = OpenAI()

def chat(history: list, user_message: str) -> str:
    history.append({"role": "user", "content": user_message})
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=history
    )
    reply = response.choices[0].message.content
    history.append({"role": "assistant", "content": reply})
    return reply

history = [{"role": "system", "content": "You are a helpful ML tutor."}]
while (msg := input("You: ")) != "/quit":
    print(f"Bot: {chat(history, msg)}")

deployment

Deploy to Streamlit Cloud (Free)

deploy.sh

# requirements.txt: openai rich
# Run: python chatbot.py --persona teacher
# Package as PyPI: python -m build && pip install dist/*.whl

Create a free account at share.streamlit.io, connect your GitHub repo, and deploy in one click. Your app gets a public URL instantly.

viva prep

10 Viva Questions with Answers

Q1. How does conversation memory work in the OpenAI API?

The API is stateless — it has no memory. You implement memory by sending the full conversation history (list of role/content dicts) with every request. The model "remembers" by reading the history.

Q2. What is the difference between system, user, and assistant roles?

system: sets the AI's persona and behaviour (not shown to user). user: the human's input. assistant: the AI's previous responses. All are sent in the messages array.

Q3. How does increasing conversation length affect cost?

You pay for every token sent, including the full history. A 100-message conversation sends all 100 messages with every new request. Cost grows O(n) where n is conversation length.

Q4. How would you handle very long conversations to control cost?

Sliding window: keep only the last N messages. Summarisation: periodically summarise old messages into a compact summary (using the LLM). Truncation: drop oldest messages when approaching context limit.

Q5. What is the context window and why does it matter?

The maximum tokens a model can process in one request. GPT-4o-mini: 128K tokens. If your history exceeds this, you get an error. Track token count with tiktoken library.

Q6. How would you add streaming to show responses word by word?

stream=True in the API call returns a generator. Iterate over chunks: for chunk in stream: print(chunk.choices[0].delta.content, end=""). Users see immediate feedback.

Q7. How do you save and load conversation sessions?

json.dump(history, file) to save. json.load(file) to restore. Include metadata: session_id, created_at, total_tokens. Store in ~/.chatbot/sessions/ for per-user isolation.

Q8. What persona examples would make this useful for students?

ML Tutor (explains concepts with examples), Code Reviewer (reviews Python code, suggests improvements), Study Buddy (quizzes on ML topics, gives hints without answers), Viva Coach (asks hard project questions).

Q9. How would you add a GUI version of this chatbot?

Use Gradio (gr.ChatInterface — 5 lines of code for a web chat UI) or Streamlit with st.chat_message. Both provide browser-based interfaces using the same backend logic.

Q10. What error handling would you add for production?

RateLimitError (exponential backoff retry), APIConnectionError (retry with timeout), AuthenticationError (clear error message). Track and display remaining budget. Gracefully handle empty responses.