Handover 2026-04-22 — fragJulia voice deploy (v2, post-correction)

Post-correction session handoff anchored to the fragJulia Voice Infra Spec v2 self-hosted PDF. Corrects v1 on Voxtral variant (4B-TTS-2603) and runtime (vllm-omni). Secrets redacted during ingestion.
Provenance

Source: ~/OneDrive/Dokumente/Claude/Projects/fragJUlia/HANDOFF-2026-04-22-fragjulia-voice-v2.md (14768 bytes, mtime 2026-04-22 14:36 local)

Ingested: 2026-04-22 as R-0.5 prerequisite for the Voice Deploy Repair epic, before #644 (SSOT-4) OneDrive teardown.

Status: Current. Supersedes v1 on Voxtral weights & runtime sections.

Redactions applied during ingestion: the original contained a live HF_TOKEN value (twice) and a reference to the PR that introduced it. Both have been redacted per the SSOT discipline: token value → hf_<redacted>; PR reference → generic "a recent merged PR". Token rotation is tracked separately in #654.

Body is fenced as raw markdown so MDX does not reinterpret <secret>, { ... }, or nested-backtick patterns from the source.
# fragJulia Voice — Handoff 2026-04-22 (v2, post-correction)

Epic #513 / Sub-epic #507 · Self-hosted AWS, eu-central-1
Supersedes `HANDOFF-2026-04-22-fragjulia-voice.md` on the Voxtral weights & runtime section.
Anchored to spec document: `fragJulia-Voice-Infra-Spec-v2-Self-Hosted.pdf` (user-uploaded, 18 pages).

---

## Executive Summary

- Success #1 — `curl https://livekit.fragjulia.de/` → HTTP/2 200, Let's Encrypt leaf. **GREEN.** No change vs v1.
- Success #2 — `POST /api/voice/test-token` → LiveKit JWT. **WIRED, not end-to-end probed from this session** (sandbox egress blocks fragjulia.de). Credentials parity verified across Vercel env, EC2 `voice/.env`, and `voice/config/livekit.yaml`.
- Success #3 — `/knotencheck → Start → Julia greets`. **RED, multiple new blockers surfaced.**
- **Voxtral correction (new this session):** the TTS-capable Voxtral variant is `mistralai/Voxtral-4B-TTS-2603` per Spec v2 ADR-3 + env-var table + Day-2 step 7. This session initially pulled `Voxtral-Mini-3B-2507` (ASR variant, ~18 GB) into `/models/voxtral-4b-tts/` on EC2. Those weights are wrong for the TTS service and must be replaced before `vllm-voxtral` can start.
- **Runtime correction:** live `voice/docker-compose.yml` uses `vllm/vllm-openai:latest`. Spec v2 mandates **vLLM-Omni ≥0.18.0** (github.com/vllm-project/vllm-omni) because the audio-out / streaming TTS path is not in upstream vLLM. This is a live-vs-spec divergence that has to be reconciled in compose before Success #3.
- **License posture:** Voxtral weights are CC BY-NC 4.0. Spec's Risk Register requires a commercial self-hosting license from Mistral AI sales for production `/knotencheck`. Per prior handover the request is "in flight"; status has not advanced this session.

---

## What actually changed this session (deltas vs v1)

| # | v1 state | v2 state | Driver |
|---|---|---|---|
| Voxtral weights | empty `/opt/models/hf-cache` | `/models/voxtral-4b-tts/` populated with **wrong** repo (`Voxtral-Mini-3B-2507`, ~18 GB) | HF token landed via a recent merged PR → download executed against misidentified repo |
| faster-whisper-large-v3 | not downloaded | `/models/faster-whisper-large-v3/` populated, 2.9 GB | HF token landed |
| Llama Guard 3 1B | not downloaded, gated | license request filed; user reports Meta approved but HF has not synced | Access request landed this session |
| HF token | not on host | written to `~/fragjulia/voice/.env` via a recent merged PR (value rotated; tracked in #654) | — |
| Spec anchoring | implicit / tribal | **explicit:** Spec v2 PDF pins model IDs, runtime version, env-var names, voice-ID | User uploaded spec during pushback ("not true") |
| Compose runtime | assumed upstream vLLM is fine for Voxtral | confirmed divergence: upstream vLLM ≠ vLLM-Omni; only vLLM-Omni serves audio-out | Spec v2 ADR-3 + vllm-omni upstream repo |

---

## Verified state on EC2 (i-0aeb3778c5078baa1, 3.64.25.163)

| Layer | State | Evidence |
|---|---|---|
| Host (Ubuntu 24.04 DLAMI, NVIDIA L4 23 GB) | Up | `nvidia-smi` returns device info inside `nvidia/cuda:12.4.1` container |
| Docker + NVIDIA Container Toolkit | Configured | `docker run --gpus all` succeeded |
| `/models/faster-whisper-large-v3/` | Done (2.9 GB) | `huggingface-cli` snapshot complete |
| `/models/voxtral-4b-tts/` | **WRONG repo** — contains `Voxtral-Mini-3B-2507` | `consolidated.safetensors` 9.3 GB + HF shards 9.3 GB + `tekken.json` |
| `/models/llama-guard-3-1b/` | LICENSE only; repo access gated | `README.md`, `LICENSE.txt`, empty `original/` — awaiting HF sync of Meta approval |
| Repo `~/fragjulia` | Present | SCPed earlier; last-written: `voice/.env` with `HF_TOKEN` |
| `voice/.env` (mode 600) | Present | Contains `LIVEKIT_API_KEY`, `LIVEKIT_API_SECRET`, `HF_TOKEN`; `MISTRAL_API_KEY` / `DEEPGRAM_API_KEY` removed |
| `voice/config/livekit.yaml` | Rewritten in v1 | port 7880, rtc 50000–60000/UDP, tcp 7881, TURN udp 443, region eu-central-1, single keys entry |
| `voice/config/Caddyfile` | Rewritten in v1 | global `protocols h1 h2`, reverse_proxy `localhost:7880`, CORS `https://fragjulia.de` |
| `voice/docker-compose.yml` | **Divergent from Spec v2** | Voxtral service points at `vllm/vllm-openai:latest`; spec requires `vllm-omni ≥0.18.0` |
| `voice-livekit-server-1` | Up, healthy | listening 7880/7881/443UDP/50000-60000UDP |
| `voice-caddy-1` | Up (badge unhealthy is cosmetic) | LE cert issued, external `/healthz` 200 |
| `voice-vllm-guard-1` | Down | weights gated + compose uses upstream image |
| `voice-vllm-voxtral-1` | Down | wrong weights + compose uses upstream image (needs vLLM-Omni) |
| `voice-voice-agent-1` | Down | depends on above |

---

## Voxtral variant reference (so this doesn't get confused again)

| HF repo | Role | Size (approx) | Status on `gott404` |
|---|---|---|---|
| `mistralai/Voxtral-Mini-3B-2507` | **ASR** (speech→text, multilingual) | ~18 GB on disk (dual-format) | Access granted; DOWNLOADED (but not needed — duplicates faster-whisper's role and is wrong for the TTS slot) |
| `mistralai/Voxtral-Small-24B-2507` | ASR / instruct (larger) | n/a — out of scope for this deploy | Not requested |
| `mistralai/Voxtral-4B-TTS-2603` | **TTS** (text→speech, voice cloning from 3-sec reference) | ~8 GB BF16 | **Access status unknown on `gott404` — must be probed separately** |

Spec v2 pins: `VOXTRAL_MODEL=mistralai/Voxtral-4B-TTS-2603`, `VOXTRAL_ENDPOINT=http://localhost:8001`, `VOXTRAL_VOICE_ID=julia_knotencheck`.

---

## Live-vs-Spec divergences to reconcile

1. **Wrong Voxtral weights on disk.**
   - Action: verify HF gate on `gott404` for `mistralai/Voxtral-4B-TTS-2603` (separate acceptance from Mini-3B-2507).
   - Then: `rm -rf /models/voxtral-4b-tts/*` and re-pull via `huggingface_hub.snapshot_download(repo_id="mistralai/Voxtral-4B-TTS-2603", local_dir="/models/voxtral-4b-tts", local_dir_use_symlinks=False)`.
   - EBS free space check before the pull; second pull is ~8 GB BF16, prior pull was ~18 GB so capacity shouldn't bite once the old tree is removed.

2. **Wrong runtime image for Voxtral.**
   - Live: `vllm/vllm-openai:latest`.
   - Spec: `vllm-omni ≥0.18.0`.
   - vllm-omni is at `github.com/vllm-project/vllm-omni`. Build locally from that repo or consume a published image if available; pin the ≥0.18.0 tag.
   - Upstream vLLM added audio-out work in-progress; the GA path for Voxtral TTS on vLLM-Omni is what spec targets today.
   - Llama Guard can stay on upstream vLLM (text-only guard), or be folded into the same vllm-omni runtime — compose currently bundles both under `vllm/vllm-openai:latest`, simplest is to split: `vllm-guard` on upstream vLLM, `vllm-voxtral` on vllm-omni.

3. **Commercial license on Voxtral weights (CC BY-NC 4.0).**
   - Dev / internal verification is fine.
   - Putting prod `/knotencheck` traffic on those weights without the Mistral sales license is a compliance break; Spec v2 Risk Register flags this.
   - Prior handover status: request "in flight". No delta this session.

4. **Llama Guard 3 1B access on HF.**
   - User reports Meta approved access but HF has not yet synced the gate.
   - No workaround — wait on HF sync, then `snapshot_download`.

---

## External verification (Success #1 — still GREEN)

```
curl -I https://livekit.fragjulia.de/
HTTP/2 200
server: Caddy

curl https://livekit.fragjulia.de/healthz
OK
```

TLS: leaf CN=`livekit.fragjulia.de`, issuer Let's Encrypt E8, `verify return: 0`.

---

## Vercel ↔ LiveKit credential parity (Success #2 — WIRED)

Unchanged from v1. Three credentials identical across:
- Vercel project `fragjulia-web` (`prj_A7vJr0mJg0yUgTEsnxlv4qdtgQDv`), all environments, redeploy `7n48RuvBU`.
- EC2 `~/fragjulia/voice/.env` (mode 600).
- EC2 `~/fragjulia/voice/config/livekit.yaml` (`keys:` map).

Closure options for Success #2 (pick one on next session):
1. Browser path: open `/knotencheck`, click Start, observe `/api/voice/test-token` → 200 with JWT in DevTools Network.
2. EC2-side probe (independent of the actual secret value):
   ```
   curl -sS -o /dev/null -w '%{http_code}\n' \
     -X POST -H 'Content-Type: application/json' \
     -d '{"secret":"wrong"}' \
     https://fragjulia.de/api/voice/test-token
   ```
   - `401` → wiring correct, only real `KNOTENCHECK_TEST_SECRET` needed for a JWT.
   - `403` → regression: secret env var lost from Vercel.
   - `503` → regression: LiveKit credentials lost from Vercel.

---

## Fixes from v1 that still apply

1. **UDP 443 collision** — Caddy global block `protocols h1 h2` disables HTTP/3, avoids UDP 443 clash with LiveKit TURN. Do not re-enable HTTP/3 unless TURN is moved off 443.
2. **502 to LiveKit upstream** — remove any `transport http { versions h2c 1.1 }` on the reverse_proxy; Caddy must speak default HTTP/1.1 to `localhost:7880`.
3. **Do not regex-patch `livekit.yaml`** — edit in place; previous regex patch had eaten the keys-map comment and risked corruption. Authoritative file is the rewrite on EC2.
4. **Caddy "unhealthy" badge is cosmetic** — internal probe hits `https://localhost:443/healthz` and fails on SNI; external is 200.
5. **PEM/SSH plumbing on Windows** — use Git-for-Windows `ssh.exe` / `scp.exe` via `.bat` launchers (`C:\Users\dapar\AppData\Local\Temp\sshtry.bat`, `scptry.bat`). Windows-MCP PowerShell has a 60 s hard timeout → use `Start-Process` with `RedirectStandardOutput` to files and poll. Always LF-convert shell scripts before SCP (`-replace CRLF with LF`), and avoid UTF-8 BOM on shebang lines.

---

## Open blockers

### Blocker #1 — Voxtral TTS weights (correct repo)
- Spec-required: `mistralai/Voxtral-4B-TTS-2603` (~8 GB BF16).
- Disk currently holds wrong repo (`Voxtral-Mini-3B-2507`, ~18 GB).
- Gate acceptance on `gott404` for the `-4B-TTS-2603` repo must be verified before re-pull.

### Blocker #2 — vLLM runtime for TTS
- Spec-required: `vllm-omni ≥0.18.0`.
- Live compose: `vllm/vllm-openai:latest` (upstream vLLM, no audio-out / streaming TTS path for Voxtral).
- Compose change + image pull / build required before `vllm-voxtral` service can serve TTS.

### Blocker #3 — Llama Guard 3 1B HF gate
- User reports Meta approved; HF has not synced.
- No action besides polling HF / re-attempting `snapshot_download` periodically.

### Blocker #4 — Voxtral commercial license (CC BY-NC 4.0)
- Prod `/knotencheck` path is commercial surface.
- Mistral sales license request "in flight" per prior handover.
- Dev bring-up can proceed; commercial go-live is gated.

### Blocker #5 — `KNOTENCHECK_TEST_SECRET` value
- Set on Vercel 2026-04-15 (redeploy `7n48RuvBU`); value itself not in any handover doc.
- Use Vercel UI to read it for an authenticated probe, or use the browser path via `NEXT_PUBLIC_KNOTENCHECK_TEST_SECRET` for `/knotencheck` Start.

---

## Outstanding work

| ID | State | Note |
|---|---|---|
| #16 Provision /models | partial | whisper done; voxtral wrong repo (redo); guard awaiting HF sync |
| #17 Process Nora Tschirner audio for voice clone | pending | File: `Nora Tschirner_ Lebt Vielfalt _ #VOXStimme.mp4`. Spec target: 3-sec reference clip, register as `julia_knotencheck`. Workflow: extract clean speech segment, transcode 16/24 kHz mono WAV, register against Voxtral voice-conditioning interface. Parked until correct Voxtral model on host. |
| #18 Bring up GPU services once `HF_TOKEN` is live | token landed; services not up | Bring-up blocked by #1 + #2 + #3 |
| #19 Accept Meta Llama Guard 3 1B license on HF | awaiting HF sync | User reports Meta side approved |
| #20 (new) Fix Voxtral weights — replace Mini-3B-2507 with 4B-TTS-2603 | pending | Blocker #1 |
| #21 (new) Reconcile compose runtime: vllm-omni ≥0.18.0 for voxtral service | pending | Blocker #2 |
| #22 (new) Confirm Mistral commercial self-hosting license (CC BY-NC carve-out) | pending | Blocker #4; status inherited from prior handover |
| #23 (new) HF_TOKEN rotation (original value redacted during ingestion) | tracked in #654 | Follow the SSOT-1 (#641) pattern: revoke, regenerate, update EC2 `.env`, restart services. |

---

## Resume path for the next session

1. Probe HF gate on `gott404` for `mistralai/Voxtral-4B-TTS-2603`. If not accepted, accept it in the HF UI.
2. `ssh ubuntu@3.64.25.163`, then:
   ```
   rm -rf /models/voxtral-4b-tts/*
   export HF_HUB_ENABLE_HF_TRANSFER=1
   huggingface-cli download mistralai/Voxtral-4B-TTS-2603 \
     --local-dir /models/voxtral-4b-tts --local-dir-use-symlinks False
   ```
3. Edit `voice/docker-compose.yml`:
   - `vllm-voxtral` image → `vllm-omni` ≥0.18.0 (local build from `vllm-project/vllm-omni` or a tagged image).
   - Keep `vllm-guard` on upstream `vllm/vllm-openai:latest` (text-only guard is fine).
   - Env: `VOXTRAL_MODEL=mistralai/Voxtral-4B-TTS-2603`, `VOXTRAL_ENDPOINT=http://localhost:8001`, `VOXTRAL_VOICE_ID=julia_knotencheck`.
4. Poll HF for Llama Guard 3 1B sync; when accessible, `huggingface-cli download meta-llama/Llama-Guard-3-1B --local-dir /models/llama-guard-3-1b`.
5. `docker compose up -d vllm-guard vllm-voxtral voice-agent`; tail logs on `voice-agent` for "ready".
6. Close Success #2 via EC2 curl probe or browser path.
7. Close Success #3: `/knotencheck` → Start → expect Julia greeting within ~5 s.
8. After GREEN: process Nora Tschirner audio per task #17 (3-sec reference → register as `julia_knotencheck`).
9. Confirm HF_TOKEN rotation landed per #654.

Rollback (unchanged from v1): revert Vercel env to pre-self-host LiveKit Cloud values; redeploy previous production deployment; DNS A record can stay.

---

## Key references

- Spec v2: `fragJulia-Voice-Infra-Spec-v2-Self-Hosted.pdf` (18 pages; uploaded 2026-04-22 via user).
- vLLM-Omni: https://github.com/vllm-project/vllm-omni (audio-out / streaming TTS runtime for Voxtral).
- Voxtral TTS model: `mistralai/Voxtral-4B-TTS-2603` on Hugging Face (CC BY-NC 4.0).
- Prior handovers: `HANDOVER-2026-04-23.md` (infra IDs, credentials), v1 of this handover (superseded on Voxtral sections).
- EC2: `i-0aeb3778c5078baa1` / `3.64.25.163`, eu-central-1, g6.xlarge, NVIDIA L4.
- Vercel project: `fragjulia-web` / `prj_A7vJr0mJg0yUgTEsnxlv4qdtgQDv`.

---

Adherence: ScopeCard=Yes | Mode=Deep | Browse=No | Uncertainty=Stated (HF gate status for Voxtral-4B-TTS-2603 on `gott404` not probed this turn; Mistral commercial license status inherited) | Eckpfeiler=Not triggered