From Chaos to Automation: Building a Production Ticket Tagging System with LLMs and FastAPI

AI tutorial - IT technology blog
AI tutorial - IT technology blog

The Shift from Regex to LLMs

Scaling an IT department usually hits a snag when manual ticket triage becomes a full-time job. Six months ago, my team was buried under a mountain of support requests, spending hours every week just moving tickets to the right folders.

We experimented with several automation methods, and after running a live system for half a year, I’ve seen exactly where the pitfalls lie. Transitioning from manual sorting to an automated pipeline isn’t just a luxury; it’s how you reclaim 20% of your engineering time.

Evaluating the Three Main Approaches

Before we committed to Large Language Models (LLMs), we tested three distinct strategies. Understanding the failure points of the first two is essential for anyone building a modern support stack.

  • Rule-based (Regex and Keyword matching): Our first attempt was fast and essentially free, but it was incredibly brittle. A rule looking for “Internet” would catch “My internet is down,” but it would completely miss “I’m having connectivity issues with the remote gateway.” Maintenance quickly turned into a nightmare as we tried to manage over 200 conflicting edge cases.
  • Traditional Machine Learning (Scikit-learn/NLP): We built a Random Forest classifier using spacy. While it was more robust than keywords, it required a clean dataset of at least 5,000 manually labeled historical tickets to break the 80% accuracy ceiling. Every time we launched a new internal tool, we had to re-label data and re-train the model from scratch.
  • LLM-based Classification: We eventually landed on models like GPT-4o-mini. This approach won because of its “Zero-shot” capability. It understands context, technical jargon, and even frustrated sarcasm without needing a single row of training data.

The Reality of LLMs in Production

Monitoring six months of logs and billing cycles gave us a clear picture of the trade-offs involved in this architecture.

The Wins

  • True Contextual Awareness: The model recognizes that “I can’t get into my laptop” and “Login is failing on the portal” both belong in the Authentication category.
  • Instant Multilingual Support: Our system processes tickets in English, Vietnamese, and Japanese natively. We didn’t have to write a single line of translation logic.
  • Reliable Structured Data: By pairing FastAPI with Pydantic, we force the LLM to return valid JSON. This allows us to pipe data directly into SQL databases or trigger Jira webhooks without parsing errors.

The Trade-offs

  • Latency: Traditional ML models respond in milliseconds. LLM APIs usually take 1.2 to 3.5 seconds. For background ticket triage, this delay is negligible, but it rules out “real-time” UI updates.
  • Predictable Costs: Using GPT-4o-mini is surprisingly affordable. We averaged roughly $15 USD per month to process 50,000 tickets.
  • Model Drift: Without a strict prompt, the model might occasionally “hallucinate” a new tag like “Urgent-ASAP” instead of using the standard “Urgent” tag.

The Recommended Tech Stack

For a resilient production system, I recommend this specific combination of tools:

  1. FastAPI: It is the gold standard for Python-based AI services. Its native async support is critical for handling multiple concurrent API calls to OpenAI or Anthropic.
  2. Pydantic: This handles the heavy lifting of data validation and schema enforcement.
  3. OpenAI SDK: A straightforward interface for model interaction.
  4. Python-dotenv: Essential for keeping your API keys out of your git history.

Building the Classifier

We’ll build a microservice that transforms a raw ticket into a categorized, prioritized, and summarized JSON object.

1. Environment Setup

Install the core dependencies. I recommend using a virtual environment or a requirements.txt file.

pip install fastapi uvicorn openai pydantic python-dotenv

2. Designing the Data Schema

A strict schema prevents downstream system failures. We define exactly what we expect the LLM to produce.

from pydantic import BaseModel, Field
from typing import List

class TicketRequest(BaseModel):
    content: str

class TicketAnalysis(BaseModel):
    category: str = Field(description="Hardware, Software, Network, Access, or Billing")
    priority: str = Field(description="Low, Medium, High, or Urgent")
    tags: List[str] = Field(description="3-5 technical keywords")
    summary: str = Field(description="A concise one-sentence summary")

3. The FastAPI Logic

The secret to success is the SYSTEM_PROMPT. By defining “Allowed Categories,” we prevent the model from making up its own taxonomy.

import os
import json
from fastapi import FastAPI
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
app = FastAPI()

SYSTEM_PROMPT = """
You are an IT Support Triage agent. Analyze the ticket and return a JSON object.
Allowed Categories: Hardware, Software, Network, Access, Billing, General.
Allowed Priorities: Low, Medium, High, Urgent.
"""

@app.post("/analyze-ticket", response_model=TicketAnalysis)
async def analyze_ticket(ticket: TicketRequest):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": ticket.content}
        ],
        response_format={ "type": "json_object" }
    )
    
    return TicketAnalysis(**json.loads(response.choices[0].message.content))

Hard-Won Lessons from Production

Building the API is only half the battle. Operating it at scale taught us three specific lessons that you won’t find in the basic documentation.

Dealing with Multi-Issue Tickets

Users rarely report just one problem. A single ticket might say: “My monitor is flickering and I can’t log into the VPN.” We found that instructing the LLM to “Categorize based on the most critical blocker” is the most effective way to ensure the ticket reaches the right specialist.

Enforcing the JSON Format

In our first month, the system crashed several times because the LLM would prepend “Here is your analysis:” to the JSON. Always use the response_format={ "type": "json_object" } parameter. It is the single most important setting for production stability.

Prompt Versioning

Avoid hardcoding your prompts inside your Python files. As your support team refines their categories, you’ll need to update the prompt without redeploying your entire application. We eventually moved our prompts to external YAML files. This allows us to split a broad “Software” category into “SaaS” and “Internal Apps” in seconds, simply by updating a config file and hitting refresh.

This architecture creates a system that is both intelligent and maintainable. You aren’t just adding an AI feature; you’re building a resilient data pipeline that evolves with your department’s needs.

Share: