Streaming (Text Chat)
AI SDK v6 stack, streamText config, the stubbed server onFinish gap, chat history tiers, anon→auth handoff, provider migration (#358).
Status: Current as of 2026-04-18 (repo commit 06d84cb).
Audience: Contributors touching chat, anyone preparing the Bedrock migration (#358), client engineers debugging broken streams.
Scope: text-chat streaming only. Voice streaming (WebRTC / LiveKit) lives in architecture §3.2 and container-system.
1. Stack
- Vercel AI SDK v6 —
streamText,convertToModelMessages,UIMessagetypes - Provider:
@ai-sdk/openai—openai/gpt-4o-mini(hardcoded in the route today) - Transport: SSE via
result.toUIMessageStreamResponse() - Client:
@ai-sdk/react—useChathook withDefaultChatTransport - Runtime: Next.js 16 App Router, server route with
maxDuration = 30
Provider migration to AWS Bedrock Mistral Large (eu-central-1) tracked in epic #358.
2. Server flow — apps/web/app/api/chat/route.ts
export const maxDuration = 30;
export async function POST(req: Request) {
// 1. Rate limit (20 msg / 60 s per IP, Upstash sliding window)
const clientIP = getClientIP(req);
const rateLimit = await checkRateLimit(clientIP, CHAT_RATE_LIMIT);
if (!rateLimit.success) return 429 with X-RateLimit-* headers;
// 2. Auth — Supabase JWT
const { data: { user } } = await supabase.auth.getUser();
if (!user) return 401;
// 3. Optional: inject patient-confirmed documents
let documentContext = "";
if (includeDocuments) {
const access = await checkFeatureAccess(supabase, user.id, "document_storage");
if (access.allowed) {
const masterMd = await getMasterMdTruncated(user.id, 8000);
if (masterMd) documentContext = wrapPatientDocument(masterMd);
}
}
// 4. Stream
const result = streamText({
model: "openai/gpt-4o-mini",
system: JULIA_SYSTEM_PROMPT + documentContext,
messages: await convertToModelMessages(messages),
abortSignal: req.signal,
});
return result.toUIMessageStreamResponse({
originalMessages: messages,
onFinish: async ({ messages: allMessages, isAborted }) => {
// ⚠️ Server-side persistence is STUBBED
if (isAborted) return;
// Could save chat history here if needed
},
consumeSseStream: consumeStream,
});
}Key properties
| Property | Value |
|---|---|
maxDuration | 30 s (Vercel function timeout) |
| Max input tokens | Not explicitly set — relies on provider defaults |
| Max output tokens | Not set |
| Temperature | Not set (provider default) |
tools | None — pure text generation |
| System prompt | JULIA_SYSTEM_PROMPT from apps/web/lib/chat/julia-prompt.ts |
| Document injection | apps/web/lib/documents/master-md.ts — getMasterMdTruncated(userId, 8000) |
| Injection security wrapper | apps/web/lib/documents/security-prompt.ts — wrapPatientDocument |
The security-prompt wrapper is load-bearing: it tells the model the wrapped text is patient content and MUST NOT be treated as instructions. Do not remove or modify without reviewing the injection surface holistically.
3. The onFinish gap (known issue)
Server-side onFinish is stubbed — it only checks isAborted and returns. Chat message persistence happens client-side via useChat({ onFinish: persistNewMessages }).
Failure mode: If the user closes the tab during a stream, the partial (or complete) assistant message is lost — the server never persisted it.
Why it's like this: Historical. A proper server-side persist requires handling chunked writes (so partial streams are saved), deduplication (so client-side saves don't double-write), and tier-aware retention (chat_history_7d vs chat_history_full). Moving to Bedrock via #358 is a natural time to refactor.
Mitigation until fixed: The client saves on every stream chunk, not just onFinish. Lost messages are still possible but bounded to the last few chunks.
4. Chat history
- Storage: Supabase tables
chat_sessionsandchat_messages - Session creation: Client-side on first message
- Retention:
- Free tier — last 7 days (feature
chat_history_7d) - Plus / Premium — unlimited (feature
chat_history_full)
- Free tier — last 7 days (feature
- Cleanup: Vercel cron
/api/chat/cleanupat 03:00 UTC daily, guarded byCRON_SECRET(see configuration-system §5) - Related routes:
chat/history(read),chat/export(PDF export, Plus gate)
5. Anonymous chat (/api/chat/anon)
Separate route with no user context. Uses:
| Mechanism | Value |
|---|---|
| Session identifier | Cookie fj_anon_chat_sid (24 h TTL) |
| Per-session quota | 10 messages |
| Per-IP quota | 3 sessions / day |
| Upgrade CTA | Triggered after 5 messages |
Gap: Anon messages are NOT migrated to the user's account on signup. Users who signed up after talking to Julia lose that chat history. Candidate fix: persist anon sessions keyed by cookie hash, map on first authenticated /api/chat call with same cookie. Deliberately deferred — low priority until post-BfArM.
6. Client consumption
React (web)
import { useChat } from "@ai-sdk/react";
import { DefaultChatTransport } from "ai";
const { messages, sendMessage, status, stop } = useChat({
transport: new DefaultChatTransport({ api: "/api/chat" }),
onFinish: ({ messages }) => persistNewMessages(messages),
onError: (err) => showErrorBanner(err),
});Render via message.parts to handle streaming text + any tool invocations (none used today, but the shape supports it).
Mobile (Expo)
The mobile app consumes chat via the shared API client in packages/shared/. Uses the same SSE protocol; @ai-sdk/react is not used — the RN side owns its own UI.
7. Error handling
| HTTP | Meaning | Client behavior |
|---|---|---|
| 401 | Not authenticated | Redirect to login |
| 403 | Feature not accessible (e.g., document inject gated) | Suppress inject, retry without includeDocuments |
| 429 | Rate-limited | Show rate-limit banner with reset time from X-RateLimit-Reset |
| 5xx | Provider / server error | Show generic error banner; allow retry |
No provider fallback today. If OpenAI is down, chat is down. The Bedrock migration (#358) should add a secondary fallback path.
No partial-stream recovery. If the SSE connection drops mid-stream, the client receives what it received; nothing resumes. Aligned with the onFinish gap in §3.
8. Provider migration path (#358)
Why and what, not when. The "when" is tracked in #358's 10 sub-tasks.
Swap shape
- model: "openai/gpt-4o-mini",
+ model: bedrock("mistral.mistral-large-2407-v1:0"), // eu-central-1Bedrock wrapper comes from @ai-sdk/amazon-bedrock (install alongside the swap). streamText itself is provider-agnostic — the call site shape does not change.
What moves with the provider
- EU data residency (OpenAI US → Bedrock Frankfurt) — the main driver
- Guardrails: Llama Guard 4 input classifier + Prompt Guard 2 + Bedrock Guardrails output + Turnstile bot defense — all new in #358, not just a model swap
- Crisis-detection (German cancer-specific signals) — new infra tracked in #358
guardrail_eventslogging table — new Supabase migration, alignable with audit-system schema
What does NOT move
- Client code (
useChat, SSE transport) - Rate limits, auth middleware,
wrapPatientDocumentboundary - Session / history tables
chat/cleanupcron
9. Tuning knobs NOT set today (candidates for the migration)
maxOutputTokens— default is provider-dependent; cap explicitly to avoid runaway coststemperature— fragJulia wants empathetic but grounded; 0.4–0.7 range, test per modeltopP— leave at provider default unless reasoning changesfrequencyPenalty/presencePenalty— probably not needed for short patient responsesstopSequences— useful if you want hard cutoffs on multi-turn injections
Record the chosen values in-code (not env) so Git history shows the rationale.
10. Related
| # | Relevance |
|---|---|
| #579 | Parent docs epic |
| #586 | Pillar B parent |
| #358 | Bedrock migration + guardrails (the big refactor this doc pre-documents) |
| architecture §3.1 | Higher-level data flow |
| api-documentation §7 | Short cross-reference to this doc |
| subscription-auth-system | chat_history_* tier mapping |
| audit-system | Target schema for guardrail_events to align with |
apps/web/app/api/chat/route.ts | Source of truth for §2 |
apps/web/lib/chat/julia-prompt.ts | System prompt |
apps/web/lib/documents/master-md.ts | Document context assembly |
apps/web/lib/documents/security-prompt.ts | Injection wrapper |
Changelog
- 2026-04-18 — Initial version. Route shape verified against
apps/web/app/api/chat/route.ts. Provider stillopenai/gpt-4o-mini;onFinishstill stubbed; anon-migration still missing. All three are intentional until #358 lands, not oversights.
API Documentation
17 route domains, auth patterns, rate limiting, tier-gating matrix, mobile-app surface, streaming pointer, security headers. No per-endpoint table — see §9.
Container System (Voice Pipeline)
EC2 g6.xlarge voice stack — 5 services sharing one NVIDIA L4, host/bridge networking, GPU/VRAM allocation, healthchecks, startup order, ops runbook.