2026-04-24 — Voice stack bring-up + MEGA close-out tracker
Bring-up on the single-L4 GPU instance reached healthy state for all services; direct-endpoint TTS validated. 18 findings, 9 live-not-PR'd changes, and 2 open research questions consolidated into a single close-out tracker.
What changed
- Added
apps/docs/content/docs/operations/handover-2026-04-24-voice-bringup.mdx— session-level handover for the 2026-04-24 voice-stack bring-up. - Added
apps/docs/content/docs/operations/voice-stack-architecture.mdx— three-service layout on a single NVIDIA L4 (24 GiB), VRAM accounting under vLLM-Omni and vLLM, runtime-download lifecycle gap, and the two architectural questions that seeded the research tickets below. - Filed
#672(MEGA close-out) as a native sub-issue of R-0#660, consolidating: 18 findings with Tier-1 evidence citations; per-R-* comment payloads; a live-on-instance PR backlog of 9 divergences; session-memory + raw-transcript attachment slots. - Filed
#673(RT-1: GPU VRAM architecture viability on L4) and#674(RT-2: turn-detector / livekit-plugins HF-cache lifecycle) as native sub-issues of#672. Both are decision-only tickets — they produce writeups that feed subsequent R-*-scoped implementation PRs. - Updated
apps/docs/content/docs/operations/meta.jsonto list both new pages in sidebar order.
Why
The 2026-04-24 bring-up session stabilised the voice stack on the target GPU instance — all services healthy, POST /v1/audio/speech against vllm-voxtral returning a valid 24 kHz WAV — but left two kinds of debt behind:
-
SSOT divergence. Seven
voice/docker-compose.ymlchanges and twovoice/agent/Dockerfilechanges are live on the instance only; none have been mirrored back tomain. These are the biggest single source of risk if the instance needs to be re-provisioned. Each divergence maps to an existing R-* child of#660and will land via that child's PR. -
Open architectural questions. The stack runs at 22.4 / 23.0 GiB = 97% VRAM utilisation. It works at steady state but has no headroom. Separately,
livekit-plugins-turn-detectordownloads its model into container-ephemeral HF cache and loses it on every--force-recreate. Both deserve researched decisions, not rushed fixes — hence RT-1 and RT-2.
Consolidating this into a single MEGA close-out tracker (#672) with two research sub-issues gives the next session one entry point and a clear split between "open PR now" vs "decide first, then PR."
Scope of this entry
This is documentation + tracker scaffolding. No voice/** code change lands with this entry. The nine live-on-instance divergences remain to be PR'd under their individual R-* children (#663, #664, #666, #667, #668, #669, with one headcheck fix in #668 also touching #528).
R-10 (#670) bring-up verification — the end-to-end test through voice-agent — has NOT been run; it remains the closing gate for R-0 (#660).
Follow-ups
- Open the R-5 (
#666) guard-tuning PR first (smallest, most self-contained). - Open the R-4 (
#664) + R-6 (#667) atomic PR (voxtral image + stage-overlay + Dockerfile fixes). - Close R-9 (
#669) as whisper-already-not-baked (doc-only); the turn-detector sibling issue stays on RT-2. - Close R-8 (
#668) with a one-line Caddyfile fix (/healthzroute) + healthcheck port8080 → 8081. - Close R-3 (
#663) after the trivial.envline-5 text fix. - Complete R-12 (
#671) docs migration — this entry starts it; legacyvoice/DEPLOY-AWS.mdandvoice/CREDENTIALS-CHECKLIST.mdmigration remain. - Attach session memory files and the raw session transcript to
#672via GitHub web UI (API cannot attach files). - Run R-10 (
#670) only after RT-1 has produced its decision — that decision may alter the stack being verified.
2026-04-25 — Changelog hygiene: backfill pr: and rebuild meta.json pages
Backfilled the pr: frontmatter on three existing changelog entries to reference their merging PRs, and rewrote apps/docs/content/docs/changelog/meta.json to list every entry in chronological order so the sidebar reflects the directory.
2026-04-22 — Ingest voice deploy handoffs into operations/
Triaged the two voice-deploy session handoffs out of OneDrive into apps/docs/content/docs/operations/ before