fragJulia
Changelog

2026-04-25 — VOXTRAL_VOICE_ID fixed (julia_knotencheck → de_female)

voice/.env.example had VOXTRAL_VOICE_ID=julia_knotencheck — a placeholder for a custom voice that was never loaded into vllm-omni. The bundled voxtral-tts voices are language-locale (de_female, fr_male, …); requesting julia_knotencheck returns 400 BadRequestError on every synth call. Default updated to de_female (German female, interim) until/if a custom Julia voice is added.

What changed

  • voice/.env.exampleVOXTRAL_VOICE_ID=julia_knotencheckVOXTRAL_VOICE_ID=de_female. Comment block lists the 20 supported voices in vllm-omni v0.18.0 so future operators don't re-introduce a custom-voice placeholder.

Why

Discovered during R-10 #670 verification on 2026-04-25. Direct call to vllm-voxtral with the configured voice:

$ curl -X POST http://localhost:8001/v1/audio/speech \
    -d '{"input":"Hallo","model":"/models/voxtral-4b-tts","voice":"julia_knotencheck",...}'
HTTP/1.1 400 Bad Request
{"error":{"message":"Invalid speaker 'julia_knotencheck'. Supported: ar_male, casual_female, casual_male, cheerful_female, de_female, de_male, es_female, es_male, fr_female, fr_male, hi_female, hi_male, it_female, it_male, neutral_female, neutral_male, nl_female, nl_male, pt_female, pt_male", ...}}

Same call with voice=de_female:

HTTP 200 | 184364 bytes | type audio/wav | total_time 2.888s

/tmp/voxtral-test.wav: RIFF (little-endian) WAVE audio, Microsoft PCM, 16-bit mono 24 kHz. Voxtral TTS works fine — the configured voice ID was just wrong.

julia_knotencheck looks like an aspirational placeholder for a custom Julia voice (matching the product persona) that would require either voice cloning, fine-tuning, or a custom-voice bundle for vllm-omni. None of that infrastructure exists yet. For the bring-up, de_female is the closest semantic match (German female, matching Julia's persona language) using only the ootb voice bundle. If the custom voice work happens later, switch back to julia_knotencheck (or whatever the real custom voice ID becomes) at that point.

This bug was on main since the file was first committed — every TTS call from voice-agent would have 400'd. Memory's "TTS direct-endpoint validated 2026-04-24" must have used a different voice ID during ad-hoc testing, not the configured one. The bug only surfaces under the actual agent code path.

Scope

  • voice/.env.example only. No code changes.
  • Operator must also update the VOXTRAL_VOICE_ID line in their voice/.env on EC2 to de_female and docker compose restart voice-agent for the change to take effect at runtime. (Repo template alone doesn't propagate.)

Test plan

  • Direct synth with voice=de_female returns HTTP 200 + valid WAV (verified 2026-04-25).
  • After EC2 .env update + agent restart, voice-agent's full pipeline (STT → LLM → Guard → TTS) produces audio without 400 errors.
  • R-10 Probe 2 E2E pipeline test passes once .env is updated.

Rollout / reversibility

Reversible via revert. Effect on a running container only after operator updates voice/.env and restarts voice-agent (~30 s downtime).

Follow-ups

  • Custom Julia voice (voice cloning or fine-tuning vllm-omni's voxtral-tts with Julia's persona) is a separate product decision. If pursued, the VOXTRAL_VOICE_ID value gets updated again — leaving the comment list in .env.example makes the relationship between the env var and vllm-omni's hardcoded set explicit.

On this page