fragJulia
Dev

Streaming (Text Chat)

AI SDK v6 stack, streamText config, the stubbed server onFinish gap, chat history tiers, anon→auth handoff, provider migration (#358).

Status: Current as of 2026-04-18 (repo commit 06d84cb). Audience: Contributors touching chat, anyone preparing the Bedrock migration (#358), client engineers debugging broken streams.

Scope: text-chat streaming only. Voice streaming (WebRTC / LiveKit) lives in architecture §3.2 and container-system.


1. Stack

  • Vercel AI SDK v6streamText, convertToModelMessages, UIMessage types
  • Provider: @ai-sdk/openaiopenai/gpt-4o-mini (hardcoded in the route today)
  • Transport: SSE via result.toUIMessageStreamResponse()
  • Client: @ai-sdk/reactuseChat hook with DefaultChatTransport
  • Runtime: Next.js 16 App Router, server route with maxDuration = 30

Provider migration to AWS Bedrock Mistral Large (eu-central-1) tracked in epic #358.


2. Server flow — apps/web/app/api/chat/route.ts

export const maxDuration = 30;

export async function POST(req: Request) {
  // 1. Rate limit (20 msg / 60 s per IP, Upstash sliding window)
  const clientIP = getClientIP(req);
  const rateLimit = await checkRateLimit(clientIP, CHAT_RATE_LIMIT);
  if (!rateLimit.success) return 429 with X-RateLimit-* headers;

  // 2. Auth — Supabase JWT
  const { data: { user } } = await supabase.auth.getUser();
  if (!user) return 401;

  // 3. Optional: inject patient-confirmed documents
  let documentContext = "";
  if (includeDocuments) {
    const access = await checkFeatureAccess(supabase, user.id, "document_storage");
    if (access.allowed) {
      const masterMd = await getMasterMdTruncated(user.id, 8000);
      if (masterMd) documentContext = wrapPatientDocument(masterMd);
    }
  }

  // 4. Stream
  const result = streamText({
    model: "openai/gpt-4o-mini",
    system: JULIA_SYSTEM_PROMPT + documentContext,
    messages: await convertToModelMessages(messages),
    abortSignal: req.signal,
  });

  return result.toUIMessageStreamResponse({
    originalMessages: messages,
    onFinish: async ({ messages: allMessages, isAborted }) => {
      // ⚠️ Server-side persistence is STUBBED
      if (isAborted) return;
      // Could save chat history here if needed
    },
    consumeSseStream: consumeStream,
  });
}

Key properties

PropertyValue
maxDuration30 s (Vercel function timeout)
Max input tokensNot explicitly set — relies on provider defaults
Max output tokensNot set
TemperatureNot set (provider default)
toolsNone — pure text generation
System promptJULIA_SYSTEM_PROMPT from apps/web/lib/chat/julia-prompt.ts
Document injectionapps/web/lib/documents/master-md.tsgetMasterMdTruncated(userId, 8000)
Injection security wrapperapps/web/lib/documents/security-prompt.tswrapPatientDocument

The security-prompt wrapper is load-bearing: it tells the model the wrapped text is patient content and MUST NOT be treated as instructions. Do not remove or modify without reviewing the injection surface holistically.


3. The onFinish gap (known issue)

Server-side onFinish is stubbed — it only checks isAborted and returns. Chat message persistence happens client-side via useChat({ onFinish: persistNewMessages }).

Failure mode: If the user closes the tab during a stream, the partial (or complete) assistant message is lost — the server never persisted it.

Why it's like this: Historical. A proper server-side persist requires handling chunked writes (so partial streams are saved), deduplication (so client-side saves don't double-write), and tier-aware retention (chat_history_7d vs chat_history_full). Moving to Bedrock via #358 is a natural time to refactor.

Mitigation until fixed: The client saves on every stream chunk, not just onFinish. Lost messages are still possible but bounded to the last few chunks.


4. Chat history

  • Storage: Supabase tables chat_sessions and chat_messages
  • Session creation: Client-side on first message
  • Retention:
    • Free tier — last 7 days (feature chat_history_7d)
    • Plus / Premium — unlimited (feature chat_history_full)
  • Cleanup: Vercel cron /api/chat/cleanup at 03:00 UTC daily, guarded by CRON_SECRET (see configuration-system §5)
  • Related routes: chat/history (read), chat/export (PDF export, Plus gate)

5. Anonymous chat (/api/chat/anon)

Separate route with no user context. Uses:

MechanismValue
Session identifierCookie fj_anon_chat_sid (24 h TTL)
Per-session quota10 messages
Per-IP quota3 sessions / day
Upgrade CTATriggered after 5 messages

Gap: Anon messages are NOT migrated to the user's account on signup. Users who signed up after talking to Julia lose that chat history. Candidate fix: persist anon sessions keyed by cookie hash, map on first authenticated /api/chat call with same cookie. Deliberately deferred — low priority until post-BfArM.


6. Client consumption

React (web)

import { useChat } from "@ai-sdk/react";
import { DefaultChatTransport } from "ai";

const { messages, sendMessage, status, stop } = useChat({
  transport: new DefaultChatTransport({ api: "/api/chat" }),
  onFinish: ({ messages }) => persistNewMessages(messages),
  onError: (err) => showErrorBanner(err),
});

Render via message.parts to handle streaming text + any tool invocations (none used today, but the shape supports it).

Mobile (Expo)

The mobile app consumes chat via the shared API client in packages/shared/. Uses the same SSE protocol; @ai-sdk/react is not used — the RN side owns its own UI.


7. Error handling

HTTPMeaningClient behavior
401Not authenticatedRedirect to login
403Feature not accessible (e.g., document inject gated)Suppress inject, retry without includeDocuments
429Rate-limitedShow rate-limit banner with reset time from X-RateLimit-Reset
5xxProvider / server errorShow generic error banner; allow retry

No provider fallback today. If OpenAI is down, chat is down. The Bedrock migration (#358) should add a secondary fallback path.

No partial-stream recovery. If the SSE connection drops mid-stream, the client receives what it received; nothing resumes. Aligned with the onFinish gap in §3.


8. Provider migration path (#358)

Why and what, not when. The "when" is tracked in #358's 10 sub-tasks.

Swap shape

- model: "openai/gpt-4o-mini",
+ model: bedrock("mistral.mistral-large-2407-v1:0"),  // eu-central-1

Bedrock wrapper comes from @ai-sdk/amazon-bedrock (install alongside the swap). streamText itself is provider-agnostic — the call site shape does not change.

What moves with the provider

  • EU data residency (OpenAI US → Bedrock Frankfurt) — the main driver
  • Guardrails: Llama Guard 4 input classifier + Prompt Guard 2 + Bedrock Guardrails output + Turnstile bot defense — all new in #358, not just a model swap
  • Crisis-detection (German cancer-specific signals) — new infra tracked in #358
  • guardrail_events logging table — new Supabase migration, alignable with audit-system schema

What does NOT move

  • Client code (useChat, SSE transport)
  • Rate limits, auth middleware, wrapPatientDocument boundary
  • Session / history tables
  • chat/cleanup cron

9. Tuning knobs NOT set today (candidates for the migration)

  • maxOutputTokens — default is provider-dependent; cap explicitly to avoid runaway costs
  • temperature — fragJulia wants empathetic but grounded; 0.4–0.7 range, test per model
  • topP — leave at provider default unless reasoning changes
  • frequencyPenalty / presencePenalty — probably not needed for short patient responses
  • stopSequences — useful if you want hard cutoffs on multi-turn injections

Record the chosen values in-code (not env) so Git history shows the rationale.


#Relevance
#579Parent docs epic
#586Pillar B parent
#358Bedrock migration + guardrails (the big refactor this doc pre-documents)
architecture §3.1Higher-level data flow
api-documentation §7Short cross-reference to this doc
subscription-auth-systemchat_history_* tier mapping
audit-systemTarget schema for guardrail_events to align with
apps/web/app/api/chat/route.tsSource of truth for §2
apps/web/lib/chat/julia-prompt.tsSystem prompt
apps/web/lib/documents/master-md.tsDocument context assembly
apps/web/lib/documents/security-prompt.tsInjection wrapper

Changelog

  • 2026-04-18 — Initial version. Route shape verified against apps/web/app/api/chat/route.ts. Provider still openai/gpt-4o-mini; onFinish still stubbed; anon-migration still missing. All three are intentional until #358 lands, not oversights.

On this page