OpenAI APIでPythonチャットボットを構築する：ゼロから本番環境まで

Table of Contents

クイックスタート — 5分でチャットボットを動かす

こんな状況を想像してほしい。深夜、明日の朝にクライアントへのデモがある。今すぐ動くチャットボットのプロトタイプが必要だ。最速のやり方はこれだ。

まず依存ライブラリをインストールする：

pip install openai

platform.openai.com → API Keysからアクセスキーを取得して、次のコードを書く：

from openai import OpenAI

client = OpenAI(api_key="sk-...")  # または環境変数 OPENAI_API_KEY を設定

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "こんにちは、あなたは誰ですか？"}
    ]
)

print(response.choices[0].message.content)

実行してみよう。レスポンスが返ってくるはずだ。APIコール1回、返答1つ — これが核心だ。デモはこれで動く。あとは実用的にしていこう。

詳細解説：会話コンテキストの仕組みを理解する

OpenAIチャットボットで最もよくある間違いは、各APIコールを独立したものとして扱うことだ。APIはステートレスで、リクエスト間で何も記憶しない。会話履歴を毎回送るのはこちらの仕事だ。

最小限でありながら実用的なチャットボットのループはこうなる：

from openai import OpenAI

client = OpenAI()  # 環境変数から OPENAI_API_KEY を読み込む

conversation_history = [
    {"role": "system", "content": "あなたはLinuxとDevOpsを専門とする役立つアシスタントです。"}
]

def chat(user_message: str) -> str:
    conversation_history.append({"role": "user", "content": user_message})

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=conversation_history,
        max_tokens=1024,
        temperature=0.7
    )

    assistant_reply = response.choices[0].message.content
    conversation_history.append({"role": "assistant", "content": assistant_reply})
    return assistant_reply

# シンプルな対話ループ
while True:
    user_input = input("あなた: ").strip()
    if user_input.lower() in ("exit", "quit"):
        break
    reply = chat(user_input)
    print(f"ボット: {reply}\n")

conversation_historyはターンごとに大きくなる。システムメッセージは0番目の位置に固定され、user/assistantのメッセージが交互に続く。モデルはAPIコールのたびにスレッド全体を参照する。

3つのロールを理解する

system — チャットボットのペルソナとルールを設定する。起動時に一度だけ定義する。
user — ユーザー側からのメッセージ。
assistant — モデルの過去の返答。マルチターンのコンテキストを維持するために追加する。

トークン予算を管理する

履歴の全メッセージがコンテキスト上限に加算される。gpt-4o-miniでは128Kトークンのウィンドウが使えるが、長いセッションではコストが急増する。シンプルなトリム戦略で無制限の増加を防げる：

MAX_HISTORY_MESSAGES = 20  # 最新の20件のやり取りを保持（システムプロンプトを除く）

def trim_history():
    system_msg = conversation_history[0]
    recent = conversation_history[1:][-MAX_HISTORY_MESSAGES:]
    conversation_history.clear()
    conversation_history.append(system_msg)
    conversation_history.extend(recent)

APIコールの前にtrim_history()を呼び出そう。請求の嫌な驚きを防いでくれる2行だ。

応用編：ストリーミング、エラーハンドリング、データ永続化

リアルタイム感を出すストリーミングレスポンス

ストリーミングは、速く感じるチャットボットと実際に速いチャットボットの違いを生む。トークンが届いた順に表示される：

def chat_stream(user_message: str) -> str:
    conversation_history.append({"role": "user", "content": user_message})

    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=conversation_history,
        stream=True
    )

    full_response = ""
    print("ボット: ", end="", flush=True)

    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)
        full_response += delta

    print()  # ストリーム終了後に改行
    conversation_history.append({"role": "assistant", "content": full_response})
    return full_response

これをカスタマーサポートのインターフェースで本番運用した経験がある。ユーザーがEnterを押した直後に最初の言葉が表示された — 600トークンの返答でも同様だ。「読み込み中ですか？」という質問がなくなった。

レートリミットとネットワーク障害に対するリトライロジック

深夜2時に何かが壊れるとしたら、大体レートリミットか不安定な接続のせいだ。シンプルな指数バックオフのラッパー：

import time
import openai

def chat_with_retry(user_message: str, max_retries: int = 3) -> str:
    for attempt in range(max_retries):
        try:
            return chat(user_message)
        except openai.RateLimitError:
            wait_time = 2 ** attempt  # 1秒、2秒、4秒
            print(f"レートリミットに達しました。{wait_time}秒待機中...")
            time.sleep(wait_time)
        except openai.APIConnectionError as e:
            print(f"接続エラー: {e}")
            time.sleep(1)
        except openai.APIStatusError as e:
            print(f"APIエラー {e.status_code}: {e.message}")
            break
    return "申し訳ありませんが、現在接続に問題が発生しています。"

会話状態の保存と復元

セッションをまたいだデータ永続化が必要か？ページリロード間でコンテキストを記憶するサポートボットを想像してほしい。解決策は多くの人が思うよりシンプルだ：

import json

def save_history(filepath: str):
    with open(filepath, "w") as f:
        json.dump(conversation_history, f, ensure_ascii=False, indent=2)

def load_history(filepath: str):
    global conversation_history
    try:
        with open(filepath, "r") as f:
            conversation_history = json.load(f)
    except FileNotFoundError:
        pass  # 新しく開始する

シングルユーザーのデプロイでは外部データベースは不要だ。ディスク上のJSONで十分だ。

本番環境で実際に重要な実践的ヒント

APIキーには必ず環境変数を使う

認証情報をハードコードしてはいけない。環境変数または.envファイルを使おう：

pip install python-dotenv

from dotenv import load_dotenv
import os

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

用途に合ったモデルを選ぶ

gpt-4o-miniはほとんどのチャットボットのユースケースをカバーする — 高速で安価（約$0.15/100万入力トークン）、一般的なQ&Aには十分な精度だ。より深い推論が必要なタスクにはgpt-4oを使おう。最小限のルーター：

def get_model(task_complexity: str) -> str:
    if task_complexity == "complex":
        return "gpt-4o"
    return "gpt-4o-mini"

具体的なシステムプロンプトを書く

曖昧なプロンプトは曖昧な結果しか生まない。スコープ、トーン、フォーマットを明確に指定しよう：

SYSTEM_PROMPT = """あなたはLinuxサーバーの問題に対応するテクニカルサポートアシスタントです。
- Linux、シェルスクリプト、サーバー管理に関する質問のみ回答する。
- コマンドを提供する際は必ずコードブロックで囲む。
- 答えが分からない場合は正直にそう伝える — 推測しない。
- ユーザーが詳細を求めない限り、回答は簡潔に保つ。"""

入力、出力、トークン使用量をログに記録する

深夜3時に本番環境が落ちたとき、会話ログとコストデータがあれば助かる。ログをchat()関数に直接追加しよう：

import logging

logging.basicConfig(
    filename="chatbot.log",
    level=logging.INFO,
    format="%(asctime)s | %(message)s"
)

def chat(user_message: str) -> str:
    conversation_history.append({"role": "user", "content": user_message})
    logging.info(f"ユーザー: {user_message}")

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=conversation_history
    )

    assistant_reply = response.choices[0].message.content
    usage = response.usage
    conversation_history.append({"role": "assistant", "content": assistant_reply})
    logging.info(f"アシスタント: {assistant_reply}")
    logging.info(f"トークン使用量: prompt={usage.prompt_tokens}, completion={usage.completion_tokens}, total={usage.total_tokens}")
    return assistant_reply

累積トークンが500Kを超えたら日次アラートを設定しよう — gpt-4o-miniでは入力コスト約$0.075に相当する、妥当なサニティチェックだ。月末の請求ショックは、うるさいアラートよりずっと辛い。

これで動作するマルチターンチャットボットが完成した：ストリーミング、リトライロジック、データ永続化、コスト管理を備えている。FastAPIレイヤーを追加してAPIとして公開したり、Telegramボットに組み込んだり、RAG用にベクターデータベースと接続したりできる。会話ループはそのままで変わらない。