Blog/Streaming LLM Responses with the Vercel AI SDK + KairosRoute

MigrationApril 18, 2026•10 min read•KairosRoute

Streaming LLM Responses with the Vercel AI SDK + KairosRoute

The Vercel AI SDK has quietly become the default way to ship streaming LLM UIs in Next.js. streamText, generateObject, and the useChat hook handle the parts that used to be painful: token-by-token streaming over HTTP, React Server Component streaming, structured output with Zod schemas, and multi-turn tool calls.

What the AI SDK does not do is pick the right model for each request. By default you hardcode openai('gpt-5') or anthropic('claude-sonnet-4.5') and pay frontier prices on every single call — including the ones that would have been just fine on Haiku or Gemini Flash. This guide wires KairosRoute into the AI SDK so every call routes to the cheapest model that meets your quality bar. The React side does not change.

Why this migration is two lines

The AI SDK's @ai-sdk/openai package is built around a createOpenAI factory. You can pass any baseURL you want. KairosRoute speaks the OpenAI wire protocol, so the SDK treats us as "OpenAI with a different URL and key." Every feature — streaming, tool calls, JSON mode, structured output, vision — flows through unchanged.

The one thing we change is the model string. Pass "kr-auto" to get cost-aware routing, or pass any model ID from our 45-model registry ("claude-sonnet-4.5", "gpt-5", "gemini-2.5-pro", etc.) to pin a specific model. Zero markup on provider costs — we take a flat gateway fee against your plan's token allotment.

Step 1: Create the KairosRoute provider

bash

npm install ai @ai-sdk/openai zod

Create a tiny module that builds a shared KairosRoute-backed OpenAI provider. Do this once; import it everywhere.

typescript

// src/lib/kr.ts
import { createOpenAI } from '@ai-sdk/openai';

export const kr = createOpenAI({
  baseURL: 'https://api.kairosroute.com/v1',
  apiKey: process.env.KAIROSROUTE_API_KEY!,
  // The SDK adds "openai" as the provider ID in traces.
  // Override if you want traces to say "kairosroute":
  name: 'kairosroute',
});

That is the whole provider setup. Now kr('kr-auto') returns a model object you can hand to any AI SDK function.

Step 2: Routed chat completions

Here is a standard Next.js route handler that streams a chat response to the browser. Note that nothing about this code is KairosRoute-specific beyond the import — the SDK does not know or care where the tokens come from.

typescript

// src/app/api/chat/route.ts
import { streamText } from 'ai';
import { kr } from '@/lib/kr';

export const runtime = 'edge';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: kr('kr-auto'),
    messages,
    system: 'You are a concise, friendly assistant.',
  });

  return result.toDataStreamResponse();
}

Every incoming message is classified and routed server-side. A casual one-liner might land on Haiku. A dense technical question might land on Sonnet. A coding task might land on GPT-5. You pay provider cost on each, plus our thin gateway fee.

The React side stays identical

typescript

// src/app/chat/page.tsx
'use client';
import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat();

  return (
    <div className="mx-auto max-w-2xl p-6">
      {messages.map((m) => (
        <div key={m.id} className="mb-4">
          <strong>{m.role}:</strong> {m.content}
        </div>
      ))}
      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything..."
          disabled={isLoading}
          className="w-full rounded border px-3 py-2"
        />
      </form>
    </div>
  );
}

Token-by-token streaming works out of the box. KairosRoute streams Server-Sent Events from the upstream provider directly to your route handler, which pipes them to the browser. First-token latency is typically under 300ms on fast models and under 900ms on frontier ones — the router adds roughly 8–15ms to pick the model.

Step 3: Tool calling (multi-step)

The AI SDK makes multi-step tool-use ergonomic with tools and maxSteps. These work through KairosRoute without modification. The router notices tool schemas in the request and biases toward models that score well on tool use in our internal evals.

typescript

import { streamText, tool } from 'ai';
import { z } from 'zod';
import { kr } from '@/lib/kr';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: kr('kr-auto'),
    messages,
    tools: {
      getWeather: tool({
        description: 'Get current weather for a city.',
        parameters: z.object({ city: z.string() }),
        execute: async ({ city }) => {
          const data = await fetch(`https://weather.example/${city}`).then((r) =>
            r.json(),
          );
          return { tempF: data.temp, conditions: data.conditions };
        },
      }),
      searchOrders: tool({
        description: 'Look up a customer order by ID.',
        parameters: z.object({ orderId: z.string() }),
        execute: async ({ orderId }) => db.orders.find(orderId),
      }),
    },
    maxSteps: 5,
  });

  return result.toDataStreamResponse();
}

The SDK handles the tool-call loop automatically. Each step's completion is routed independently — the first step might pick a tool-reliable model, the final synthesis step might pick something cheaper. You can inspect which model served each step in the response metadata.

Step 4: Structured output with generateObject

generateObject is the SDK's best-kept secret. You hand it a Zod schema, and it returns a validated, typed object. KairosRoute translates OpenAI's JSON-schema mode into each provider's native equivalent, so your Zod schema works regardless of which model the router picks.

typescript

import { generateObject } from 'ai';
import { z } from 'zod';
import { kr } from '@/lib/kr';

const TicketSchema = z.object({
  priority: z.enum(['low', 'medium', 'high', 'urgent']),
  category: z.enum(['billing', 'auth', 'bug', 'feature', 'other']),
  summary: z.string().max(140),
  suggestedReply: z.string(),
});

export async function classifyTicket(body: string) {
  const { object } = await generateObject({
    model: kr('kr-auto'),
    schema: TicketSchema,
    prompt: `Classify this customer message:\n\n${body}`,
  });

  return object; // fully typed: { priority, category, summary, suggestedReply }
}

Structured-output workloads are where routing pays off the hardest. Classification and extraction rarely need frontier reasoning, so kr-auto almost always picks a Haiku/Flash-class model — and you get the exact same validated Zod object.

Step 5: React Server Components streaming

The AI SDK's streamUI / createStreamableUI primitives let you stream React components from a server action straight into your client tree. This also works over KairosRoute.

typescript

// src/app/actions.tsx
'use server';

import { streamUI } from 'ai/rsc';
import { z } from 'zod';
import { kr } from '@/lib/kr';
import { WeatherCard, OrderTable, Spinner } from '@/components/stream-ui';

export async function ask(question: string) {
  const result = await streamUI({
    model: kr('kr-auto'),
    prompt: question,
    text: ({ content }) => <div>{content}</div>,
    tools: {
      showWeather: {
        description: 'Render a weather card.',
        parameters: z.object({ city: z.string() }),
        generate: async function* ({ city }) {
          yield <Spinner />;
          const data = await fetch(`https://weather.example/${city}`).then(
            (r) => r.json(),
          );
          return <WeatherCard city={city} {...data} />;
        },
      },
      showOrders: {
        description: 'Render a table of recent orders.',
        parameters: z.object({ customerId: z.string() }),
        generate: async function* ({ customerId }) {
          yield <Spinner />;
          const orders = await db.orders.forCustomer(customerId);
          return <OrderTable orders={orders} />;
        },
      },
    },
  });

  return result.value;
}

The client calls ask('What orders did customer 42 place?'), and React streams a <Spinner /> then an <OrderTable /> into place. The LLM decision of which component to render is routed through kr-auto — you get cost savings even on UI-generation workloads.

Before and after: a real AI SDK cost comparison

Illustrative numbers from a Next.js SaaS product using streamText for its in-app assistant (~120K calls/month):

Before (pinned to GPT-5): ~800 tokens avg × 120K = 96M tokens × GPT-5 blended = roughly $1,900/month.
After (kr-auto): router distribution across chat, classify, tool-use, and summarize. Blended cost ≈ $380/month.
Savings: 80%. p50 latency actually dropped — lighter models answered faster on most requests.

A common follow-on change is adding generateObject-based ticket triage or email classification. Those classification jobs are where routing shines brightest; expect 90%+ savings on those specific endpoints versus pinning a frontier model.

Gotchas we have seen in the field

Edge runtime vs. Node runtime

KairosRoute works on both. runtime = 'edge' is great for latency but your tool execute functions cannot use Node-only APIs. If a tool needs the Node runtime, set runtime = 'nodejs' on that route — streaming still works.

Tracking the actual model picked

The SDK's onFinish callback surfaces raw response metadata. Every KairosRoute response includes kr_routed_model — log it to your analytics pipe so you can slice cost and latency by actual-model-used, not just by kr-auto.

typescript

const result = streamText({
  model: kr('kr-auto'),
  messages,
  onFinish: ({ rawResponse, usage }) => {
    const routed = rawResponse?.headers?.['x-kr-routed-model'];
    analytics.track('llm_call', {
      routedModel: routed,
      promptTokens: usage.promptTokens,
      completionTokens: usage.completionTokens,
    });
  },
});

Environment variable on Vercel

Set KAIROSROUTE_API_KEY in the Vercel project settings, not just locally. For preview deployments, give each environment its own key so you can revoke preview keys without touching production.

Client-side vs. server-side

Never put your KairosRoute key in a client component — the AI SDK is designed so the key lives on the server, and the browser only talks to your route handlers. If you are doing something exotic, use per-user scoped keys from our POST /v1/keys endpoint and rotate them aggressively.

Request-scoped model override

If a specific route needs a pinned model (say, a legal-doc summarizer that must use Sonnet), pass the model ID directly instead of kr-auto. You can mix and match in the same app — most routes on kr-auto, a few critical ones pinned.

typescript

// Critical route: pin to Sonnet for deterministic behavior.
const result = streamText({
  model: kr('claude-sonnet-4.5'),
  messages,
});

Rollout checklist

Add KAIROSROUTE_API_KEY to your environment. Create the shared kr provider in src/lib/kr.ts.
Swap openai('gpt-5') to kr('kr-auto') in one route handler. Deploy to preview. Verify the stream still works and the dashboard shows traffic.
Do the same for every other streamText, generateText, generateObject, and streamUI call. Keep pinned models only where you need them.
Add onFinish logging of x-kr-routed-model so you can audit the routing distribution.
Run your eval suite (you have one, right?). Compare quality against the pinned-GPT-5 baseline. Cut over production when the numbers look right.

Try it in your Next.js app

The playground will let you paste an AI SDK prompt and watch the routed stream in real time. Full end-to-end recipes — including a copy-paste Next.js starter template — live at docs/migration.

Ready to route smarter?

KairosRoute gives you a single OpenAI-compatible endpoint that routes every request to the cheapest model meeting your quality bar — plus the observability, A/B testing, and cost analytics that turn cheaper infrastructure into a durable margin.

Calculate Your Savings Start Free — 100K tokens