2026-04-25 — R-4 weights provisioning script + Voxtral CC BY-NC policy note
Adds voice/scripts/provision-weights.sh, an idempotent downloader for the three model weights the voice stack needs (Voxtral-4B-TTS-2603, Llama-Guard-3-1B, faster-whisper-large-v3). Closes R-4 #664; partial-closes #521 with the local-deploy CC BY-NC policy note.
What changed
voice/scripts/provision-weights.sh— new, ~100-line bash. Downloads three models to/models/<subdir>on the host using the canonicalhfCLI (huggingface-cliis deprecated and partially disabled in currenthuggingface_hub). Idempotent: each model has a sentinel file (params.jsonfor Voxtral,config.jsonfor the other two) — present and non-empty → skip; otherwisehf download <repo> --local-dir <target>. Fails loudly if the sentinel doesn't reappear post-download.voice/scripts/README.md— new. Operator-facing usage doc: how to sourcevoice/.env, how to invoke (subset by key), Voxtral CC BY-NC policy paragraph, secret-hygiene notes (HF_HUB_DISABLE_IMPLICIT_TOKEN=1 + explicit--token, on-disk-token caveat), prerequisite note that PR-B #681 lands theHF_TOKEN=template line invoice/.env.example(existing operators with the token in their.envare unaffected).- Voxtral CC BY-NC stance (#521 partial) — local inference via
vllm-omni, output served to end-users of the in-product voice agent. Not a public TTS API, not a commercial-API offering. The policy paragraph in the README is the project's stated position.
Why
R-4 (#664) is the original "swap Voxtral weights" issue. The 2026-04-24 bring-up resolved it as a one-shot manual download via hf download on EC2, then cp -rL from the HF cache to /models/voxtral-4b-tts/ (the cp dance was a workaround for a CloudShell paste-wrap bug, not a fundamental requirement). That worked but left no repo-tracked artifact: the next operator on the next box would have to reverse-engineer the procedure from the bring-up handover.
A script captures the procedure as code, idempotent enough to re-run without harm, with a known-good list of model repos. It also factors in the two ergonomic lessons from 2026-04-24:
hfnothuggingface-cli— the latter prints a deprecation banner and refuses some commands.--local-dirdirectly, no cache-then-copy — avoids both the disk-doubling and the CloudShell paste-wrap class of bug.
Scope
voice/scripts/provision-weights.sh(new).voice/scripts/README.md(new).- Changelog entry + meta.json.
- Does NOT touch compose, Dockerfile, or
.env*(separate PRs in the bring-up batch). - Does NOT delete legacy doc references (
voice/DEPLOY-AWS.md,voice/CREDENTIALS-CHECKLIST.md) — that's PR-F (R-12 final closure).
Test plan
- Static lint:
bash -n voice/scripts/provision-weights.shexits 0. - Dry-run on a fresh box (no
/models/populated) with HF_TOKEN env set — expect three downloads, sentinel files appear, exit 0. - Re-run on the same box — expect three "already provisioned — skipping" log lines, no network calls, exit 0.
- Subset run:
bash voice/scripts/provision-weights.sh whisper(no token) — exit 0. - Subset run:
bash voice/scripts/provision-weights.sh voxtralwithout token — exit 1 with clear "gated repo" error. - After provisioning,
vllm-voxtralandvllm-guardstart cleanly with the new compose (PR-D). -
docker history voice-voice-agent | grep -iE 'hf_'empty (no token leaked into image, unrelated to this PR but worth confirming on the same EC2 redeploy).
Rollout / reversibility
Reversible — revert removes the script. No infra change. The script does not modify any existing files; it only writes to /models/ (which is host filesystem, not repo).
Follow-ups
- EC2 retro-application: once this PR + PR-B + PR-C + PR-D have all merged, the EC2 redeploy step in the bring-up plan replaces the prior 2026-04-24 manual
hf downloadshell history withbash voice/scripts/provision-weights.sh. Same outcome, repo-tracked. - #521 final closure: the policy paragraph here closes the legal question. PR-F (R-12) cross-references this in
voice-deploy.mdx.
2026-04-25 — Operations docs: stale-reference annotations on migrated files
Added inline 'Update 2026-04-25' annotations on plan.mdx, sprint-337-358-328.mdx, and launch-audit-2026-04-04.mdx to flag references that have changed since the docs were originally written. Annotation-only — no original content rewritten. Spot-check, not a full audit.
2026-04-25 — Voice compose canonicalization (R-5 + R-6 + R-8)
Folds the 2026-04-24 EC2 hand-edits to docker-compose.yml back into main — vllm-omni image swap for Voxtral, vllm-guard sampler tuning, voice-agent healthcheck port + whisper CPU mode + env-var rename, and a busybox-compatible caddy healthcheck against the admin API. Closes R-5 #666, R-6 #667, R-8 #668, #366, #528.