Quick Start — Get Your Chatbot Running in 5 Minutes
Picture this: it’s late, you have a client demo tomorrow morning, and you need a working chatbot prototype — now. Here’s the fastest path.
First, install the dependency:
pip install openai
Grab your API key from platform.openai.com → API Keys, then write this:
from openai import OpenAI
client = OpenAI(api_key="sk-...") # or set OPENAI_API_KEY env var
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "Hello, who are you?"}
]
)
print(response.choices[0].message.content)
Run it. You should see a response. One API call, one reply — that’s the core. The demo works. Now let’s make it useful.
Deep Dive: How Conversation Context Really Works
The single biggest mistake with OpenAI chatbots is treating each call as isolated. The API is stateless — it remembers nothing between requests. Sending conversation history every time is your job.
Here’s a minimal but functional chatbot loop:
from openai import OpenAI
client = OpenAI() # reads OPENAI_API_KEY from environment
conversation_history = [
{"role": "system", "content": "You are a helpful assistant specializing in Linux and DevOps."}
]
def chat(user_message: str) -> str:
conversation_history.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=conversation_history,
max_tokens=1024,
temperature=0.7
)
assistant_reply = response.choices[0].message.content
conversation_history.append({"role": "assistant", "content": assistant_reply})
return assistant_reply
# Simple interactive loop
while True:
user_input = input("You: ").strip()
if user_input.lower() in ("exit", "quit"):
break
reply = chat(user_input)
print(f"Bot: {reply}\n")
conversation_history grows with every turn. The system message stays at position 0, then user/assistant messages alternate. The model sees the full thread on each call.
Understanding the Three Roles
- system — sets the chatbot’s persona and rules. Define this once at startup.
- user — messages from the human side.
- assistant — the model’s previous replies, appended to maintain multi-turn context.
Keeping the Token Budget in Check
Every message in history counts toward the context limit. With gpt-4o-mini you have a 128K token window — generous, but long sessions accumulate cost fast. A simple trim strategy prevents unbounded growth:
MAX_HISTORY_MESSAGES = 20 # keep last 20 exchanges (excluding system prompt)
def trim_history():
system_msg = conversation_history[0]
recent = conversation_history[1:][-MAX_HISTORY_MESSAGES:]
conversation_history.clear()
conversation_history.append(system_msg)
conversation_history.extend(recent)
Call trim_history() before each API call. Two lines that save you from a nasty billing surprise.
Advanced Usage: Streaming, Error Handling, and Persistence
Streaming Responses for Real-Time Feel
Streaming is the difference between a chatbot that feels fast and one that just is fast. Tokens print as they arrive:
def chat_stream(user_message: str) -> str:
conversation_history.append({"role": "user", "content": user_message})
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=conversation_history,
stream=True
)
full_response = ""
print("Bot: ", end="", flush=True)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
full_response += delta
print() # newline after stream ends
conversation_history.append({"role": "assistant", "content": full_response})
return full_response
I ran this in production for a customer support interface. The first words appeared almost immediately after the user hit Enter — even on 600-token replies. Users stopped asking “is it loading?”
Retry Logic for Rate Limits and Network Hiccups
When something breaks at 2 AM, it’s usually rate limits or a flaky connection. A simple exponential backoff wrapper:
import time
import openai
def chat_with_retry(user_message: str, max_retries: int = 3) -> str:
for attempt in range(max_retries):
try:
return chat(user_message)
except openai.RateLimitError:
wait_time = 2 ** attempt # 1s, 2s, 4s
print(f"Rate limit hit. Waiting {wait_time}s...")
time.sleep(wait_time)
except openai.APIConnectionError as e:
print(f"Connection error: {e}")
time.sleep(1)
except openai.APIStatusError as e:
print(f"API error {e.status_code}: {e.message}")
break
return "Sorry, I'm having trouble connecting right now."
Saving and Restoring Conversation State
Need persistence across sessions? Think support bots that remember context between page reloads. The solution is simpler than most people expect:
import json
def save_history(filepath: str):
with open(filepath, "w") as f:
json.dump(conversation_history, f, ensure_ascii=False, indent=2)
def load_history(filepath: str):
global conversation_history
try:
with open(filepath, "r") as f:
conversation_history = json.load(f)
except FileNotFoundError:
pass # start fresh
No external database needed for single-user deployments. JSON on disk is plenty.
Practical Tips That Actually Matter in Production
Always Use Environment Variables for API Keys
Never hardcode credentials. Use environment variables or a .env file:
pip install python-dotenv
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
Pick the Right Model for the Job
gpt-4o-mini handles most chatbot use cases well — fast, cheap (~$0.15/1M input tokens), and sharp enough for general Q&A. Save gpt-4o for tasks that need deeper reasoning. A minimal router:
def get_model(task_complexity: str) -> str:
if task_complexity == "complex":
return "gpt-4o"
return "gpt-4o-mini"
Write a Specific System Prompt
Vague prompts get vague results. Be explicit about scope, tone, and format:
SYSTEM_PROMPT = """You are a technical support assistant for Linux server issues.
- Answer only questions related to Linux, shell scripting, and server administration.
- When providing commands, always wrap them in code blocks.
- If you don't know the answer, say so directly — do not guess.
- Keep responses concise unless the user explicitly asks for detail."""
Log Inputs, Outputs, and Token Usage
When production breaks at 3 AM, you’ll want the conversation log and cost data ready. Add logging directly into your chat() function:
import logging
logging.basicConfig(
filename="chatbot.log",
level=logging.INFO,
format="%(asctime)s | %(message)s"
)
def chat(user_message: str) -> str:
conversation_history.append({"role": "user", "content": user_message})
logging.info(f"USER: {user_message}")
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=conversation_history
)
assistant_reply = response.choices[0].message.content
usage = response.usage
conversation_history.append({"role": "assistant", "content": assistant_reply})
logging.info(f"ASSISTANT: {assistant_reply}")
logging.info(f"TOKENS: prompt={usage.prompt_tokens}, completion={usage.completion_tokens}, total={usage.total_tokens}")
return assistant_reply
Set a daily alert if cumulative tokens exceed, say, 500K — that’s roughly $0.075 in input costs on gpt-4o-mini, a reasonable sanity check. Month-end billing surprises are far worse than a noisy alert.
You now have a working multi-turn chatbot: streaming, retry logic, persistence, and cost controls. Add a FastAPI layer to expose it as an API, wire it into a Telegram bot, or connect it to a vector database for RAG. The conversation loop stays exactly the same.

