Vercel AI SDK: Build AI Streaming Chatbots in Next.js with Multi-Provider LLM Support

It was 2 AM when our chatbot started dropping responses mid-sentence. Users were seeing half-finished answers, the loading spinner was spinning forever, and Slack was blowing up. The root cause? We had rolled our own streaming implementation using raw fetch() and Server-Sent Events — and it was fragile in ways we never anticipated.

After rebuilding everything properly, I’ll say this: streaming is foundational. Get it wrong and users feel every dropped token.

Table of Contents

The Problem with DIY Streaming

When you first integrate an LLM into a Next.js app, the naive path looks like this:

// The naive approach — don't do this in production
const res = await fetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
  body: JSON.stringify({ model: 'gpt-4o', messages, stream: true }),
});

const reader = res.body?.getReader();
// ... 80 more lines of brittle stream parsing

This works — until it doesn’t. You end up writing custom parsers for SSE chunks, handling backpressure, managing connection timeouts, and then doing it all over again when you swap OpenAI for Anthropic. I spent three days on this before the 2 AM incident made me throw it all out.

Approach Comparison: Three Ways to Handle AI Streaming

Before settling on a tool, I ran all three approaches under load. Here’s what stood out.

Option 1: Raw Fetch + Manual SSE Parsing

Maximum control, maximum pain. You handle every byte of the stream yourself. Good for learning, terrible for shipping.

Option 2: Provider SDKs (OpenAI SDK, Anthropic SDK)

Each provider ships their own SDK with streaming helpers. The OpenAI SDK’s stream() method is genuinely well-designed:

import OpenAI from 'openai';

const stream = await openai.chat.completions.stream({
  model: 'gpt-4o',
  messages,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

Catch: you write this once for OpenAI, then rewrite it entirely for Anthropic’s messages.stream(), then again for Google’s generateContentStream(). When your team decides to switch providers — and they will — you’re refactoring your entire data layer.

Option 3: Vercel AI SDK

This is what I wish I had started with. One API for every provider. Deep Next.js App Router integration. And crucially — it handles the React state management side too, which is exactly where DIY approaches tend to fall apart.

Pros and Cons: Honest Assessment

Vercel AI SDK

Pro: One API for OpenAI, Anthropic, Google, Mistral, and more — swap providers with one config change
Pro: useChat hook manages loading state, error handling, and streaming updates automatically
Pro: Built-in support for tool calls, structured output, and multi-step reasoning
Pro: Plugs directly into Next.js Route Handlers and Server Actions
Con: Another abstraction layer — if the SDK has a bug, you’re at Vercel’s mercy
Con: Some advanced provider-specific features require dropping down to the raw SDK

Provider SDKs Directly

Pro: Access to every provider-specific feature the day it ships
Pro: No abstraction overhead
Con: Full reimplementation when switching providers
Con: You build your own React state management layer

Raw Fetch

Pro: Zero dependencies
Con: You will have a 2 AM incident. I promise.

Recommended Setup

Here’s the exact stack I rebuilt with after the incident — it’s held up across three major provider updates without a single breaking change:

Next.js 14+ with App Router
Vercel AI SDK (ai package) for the core streaming layer
Provider-specific adapter packages (@ai-sdk/openai, @ai-sdk/anthropic)
TypeScript throughout — the SDK’s type safety catches provider API changes at compile time

Implementation Guide

Step 1: Install Dependencies

npm install ai @ai-sdk/openai @ai-sdk/anthropic

Google Gemini support is a separate package:

npm install @ai-sdk/google

Step 2: Create the API Route Handler

Create app/api/chat/route.ts:

import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { streamText } from 'ai';

export const runtime = 'edge';

export async function POST(req: Request) {
  const { messages, provider = 'openai' } = await req.json();

  const model = provider === 'anthropic'
    ? anthropic('claude-sonnet-4-6')
    : openai('gpt-4o');

  const result = streamText({
    model,
    system: 'You are a helpful assistant. Be concise and direct.',
    messages,
  });

  return result.toDataStreamResponse();
}

That’s it for the backend. toDataStreamResponse() replaces the 80-line SSE parser we had before — it handles chunking, backpressure, and connection cleanup automatically. One note on the runtime flag: set it to 'edge'. Without it, Vercel buffers the entire response before sending, which kills the streaming effect entirely.

Step 3: Build the Chat UI

Create app/chat/page.tsx:

'use client';

import { useChat } from 'ai/react';
import { useState } from 'react';

export default function ChatPage() {
  const [provider, setProvider] = useState('openai');

  const { messages, input, handleInputChange, handleSubmit, isLoading, error } =
    useChat({
      api: '/api/chat',
      body: { provider },
      onError: (err) => {
        console.error('Chat error:', err);
      },
    });

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      <div className="mb-4 flex gap-2">
        <select
          value={provider}
          onChange={(e) => setProvider(e.target.value)}
          className="border rounded px-2 py-1"
        >
          <option value="openai">GPT-4o</option>
          <option value="anthropic">Claude Sonnet</option>
        </select>
        <span className="text-sm text-gray-500 self-center">Switch provider live</span>
      </div>

      <div className="flex-1 overflow-y-auto space-y-4 mb-4">
        {messages.map((m) => (
          <div
            key={m.id}
            className={`p-3 rounded-lg ${
              m.role === 'user' ? 'bg-blue-100 ml-8' : 'bg-gray-100 mr-8'
            }`}
          >
            <span className="font-semibold text-xs uppercase text-gray-500">
              {m.role}
            </span>
            <p className="mt-1 whitespace-pre-wrap">{m.content}</p>
          </div>
        ))}
        {isLoading && (
          <div className="text-gray-400 text-sm">Thinking...</div>
        )}
        {error && (
          <div className="text-red-500 text-sm">Error: {error.message}</div>
        )}
      </div>

      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything..."
          className="flex-1 border rounded px-3 py-2"
          disabled={isLoading}
        />
        <button
          type="submit"
          disabled={isLoading || !input.trim()}
          className="bg-blue-500 text-white px-4 py-2 rounded disabled:opacity-50"
        >
          Send
        </button>
      </form>
    </div>
  );
}

Step 4: Environment Variables

# .env.local
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Both provider packages detect these names automatically. Zero extra wiring.

Step 5: Production Deployment Checklist

A few things that will save you from the 2 AM call:

Set export const runtime = 'edge' on the route handler — mandatory for streaming on Vercel
Add rate limiting before the streamText call (the SDK has no built-in rate limiting)
Validate and sanitize messages input — never pass raw user data directly to the model
Set a maxTokens limit in streamText to prevent runaway costs
Use Vercel’s environment variables UI, not .env files, for production secrets

const result = streamText({
  model,
  messages,
  maxTokens: 1000,  // Hard cap — critical for cost control
  temperature: 0.7,
});

One Thing That Still Trips People Up

Streaming seems broken after deploy — text arrives all at once after a long delay. Before blaming the SDK, check your runtime. On a standard Node.js runtime (not edge), the platform’s reverse proxy often buffers the full response. Fix it by switching to edge runtime, or add X-Accel-Buffering: no to your response headers.

The SDK itself holds up under real load. When streaming breaks in production, it’s almost always infrastructure — a buffering proxy, a missing header, the wrong runtime config. Start with edge and you sidestep 90% of those issues before they become 2 AM incidents.