It was 2 AM when our chatbot started dropping responses mid-sentence. Users were seeing half-finished answers, the loading spinner was spinning forever, and Slack was blowing up. The root cause? We had rolled our own streaming implementation using raw fetch() and Server-Sent Events — and it was fragile in ways we never anticipated.
After rebuilding everything properly, I’ll say this: streaming is foundational. Get it wrong and users feel every dropped token.
The Problem with DIY Streaming
When you first integrate an LLM into a Next.js app, the naive path looks like this:
// The naive approach — don't do this in production
const res = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
body: JSON.stringify({ model: 'gpt-4o', messages, stream: true }),
});
const reader = res.body?.getReader();
// ... 80 more lines of brittle stream parsing
This works — until it doesn’t. You end up writing custom parsers for SSE chunks, handling backpressure, managing connection timeouts, and then doing it all over again when you swap OpenAI for Anthropic. I spent three days on this before the 2 AM incident made me throw it all out.
Approach Comparison: Three Ways to Handle AI Streaming
Before settling on a tool, I ran all three approaches under load. Here’s what stood out.
Option 1: Raw Fetch + Manual SSE Parsing
Maximum control, maximum pain. You handle every byte of the stream yourself. Good for learning, terrible for shipping.
Option 2: Provider SDKs (OpenAI SDK, Anthropic SDK)
Each provider ships their own SDK with streaming helpers. The OpenAI SDK’s stream() method is genuinely well-designed:
import OpenAI from 'openai';
const stream = await openai.chat.completions.stream({
model: 'gpt-4o',
messages,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
Catch: you write this once for OpenAI, then rewrite it entirely for Anthropic’s messages.stream(), then again for Google’s generateContentStream(). When your team decides to switch providers — and they will — you’re refactoring your entire data layer.
Option 3: Vercel AI SDK
This is what I wish I had started with. One API for every provider. Deep Next.js App Router integration. And crucially — it handles the React state management side too, which is exactly where DIY approaches tend to fall apart.
Pros and Cons: Honest Assessment
Vercel AI SDK
- Pro: One API for OpenAI, Anthropic, Google, Mistral, and more — swap providers with one config change
- Pro:
useChathook manages loading state, error handling, and streaming updates automatically - Pro: Built-in support for tool calls, structured output, and multi-step reasoning
- Pro: Plugs directly into Next.js Route Handlers and Server Actions
- Con: Another abstraction layer — if the SDK has a bug, you’re at Vercel’s mercy
- Con: Some advanced provider-specific features require dropping down to the raw SDK
Provider SDKs Directly
- Pro: Access to every provider-specific feature the day it ships
- Pro: No abstraction overhead
- Con: Full reimplementation when switching providers
- Con: You build your own React state management layer
Raw Fetch
- Pro: Zero dependencies
- Con: You will have a 2 AM incident. I promise.
Recommended Setup
Here’s the exact stack I rebuilt with after the incident — it’s held up across three major provider updates without a single breaking change:
- Next.js 14+ with App Router
- Vercel AI SDK (
aipackage) for the core streaming layer - Provider-specific adapter packages (
@ai-sdk/openai,@ai-sdk/anthropic) - TypeScript throughout — the SDK’s type safety catches provider API changes at compile time
Implementation Guide
Step 1: Install Dependencies
npm install ai @ai-sdk/openai @ai-sdk/anthropic
Google Gemini support is a separate package:
npm install @ai-sdk/google
Step 2: Create the API Route Handler
Create app/api/chat/route.ts:
import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { streamText } from 'ai';
export const runtime = 'edge';
export async function POST(req: Request) {
const { messages, provider = 'openai' } = await req.json();
const model = provider === 'anthropic'
? anthropic('claude-sonnet-4-6')
: openai('gpt-4o');
const result = streamText({
model,
system: 'You are a helpful assistant. Be concise and direct.',
messages,
});
return result.toDataStreamResponse();
}
That’s it for the backend. toDataStreamResponse() replaces the 80-line SSE parser we had before — it handles chunking, backpressure, and connection cleanup automatically. One note on the runtime flag: set it to 'edge'. Without it, Vercel buffers the entire response before sending, which kills the streaming effect entirely.
Step 3: Build the Chat UI
Create app/chat/page.tsx:
'use client';
import { useChat } from 'ai/react';
import { useState } from 'react';
export default function ChatPage() {
const [provider, setProvider] = useState('openai');
const { messages, input, handleInputChange, handleSubmit, isLoading, error } =
useChat({
api: '/api/chat',
body: { provider },
onError: (err) => {
console.error('Chat error:', err);
},
});
return (
<div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
<div className="mb-4 flex gap-2">
<select
value={provider}
onChange={(e) => setProvider(e.target.value)}
className="border rounded px-2 py-1"
>
<option value="openai">GPT-4o</option>
<option value="anthropic">Claude Sonnet</option>
</select>
<span className="text-sm text-gray-500 self-center">Switch provider live</span>
</div>
<div className="flex-1 overflow-y-auto space-y-4 mb-4">
{messages.map((m) => (
<div
key={m.id}
className={`p-3 rounded-lg ${
m.role === 'user' ? 'bg-blue-100 ml-8' : 'bg-gray-100 mr-8'
}`}
>
<span className="font-semibold text-xs uppercase text-gray-500">
{m.role}
</span>
<p className="mt-1 whitespace-pre-wrap">{m.content}</p>
</div>
))}
{isLoading && (
<div className="text-gray-400 text-sm">Thinking...</div>
)}
{error && (
<div className="text-red-500 text-sm">Error: {error.message}</div>
)}
</div>
<form onSubmit={handleSubmit} className="flex gap-2">
<input
value={input}
onChange={handleInputChange}
placeholder="Ask anything..."
className="flex-1 border rounded px-3 py-2"
disabled={isLoading}
/>
<button
type="submit"
disabled={isLoading || !input.trim()}
className="bg-blue-500 text-white px-4 py-2 rounded disabled:opacity-50"
>
Send
</button>
</form>
</div>
);
}
Step 4: Environment Variables
# .env.local
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
Both provider packages detect these names automatically. Zero extra wiring.
Step 5: Production Deployment Checklist
A few things that will save you from the 2 AM call:
- Set
export const runtime = 'edge'on the route handler — mandatory for streaming on Vercel - Add rate limiting before the
streamTextcall (the SDK has no built-in rate limiting) - Validate and sanitize
messagesinput — never pass raw user data directly to the model - Set a
maxTokenslimit instreamTextto prevent runaway costs - Use Vercel’s environment variables UI, not
.envfiles, for production secrets
const result = streamText({
model,
messages,
maxTokens: 1000, // Hard cap — critical for cost control
temperature: 0.7,
});
One Thing That Still Trips People Up
Streaming seems broken after deploy — text arrives all at once after a long delay. Before blaming the SDK, check your runtime. On a standard Node.js runtime (not edge), the platform’s reverse proxy often buffers the full response. Fix it by switching to edge runtime, or add X-Accel-Buffering: no to your response headers.
The SDK itself holds up under real load. When streaming breaks in production, it’s almost always infrastructure — a buffering proxy, a missing header, the wrong runtime config. Start with edge and you sidestep 90% of those issues before they become 2 AM incidents.

