fragJulia
Changelog

2026-04-25 — voice plugins coherent upgrade to livekit-agents 1.5.6 (epic #700)

Description

Coherent rewrite of the two voice-agent custom plugins still carrying livekit-agents 1.5+ API drift after the bring-up's salami-fix series (PR-I/J/K/L = #692/#693/#694/#696). R-10 Probe 2 (#670) on 2026-04-25 caught the two confirmed-broken layers below; this PR closes the epic #700 plugin-side scope.

The two confirmed-broken errors from R-10 Probe 2:

File "/app/custom_plugins/bedrock_mistral_llm.py", line 95, in chat
    "name": t.name,
AttributeError: 'FunctionTool' object has no attribute 'name'

File "/app/custom_plugins/voxtral_tts.py", line 206, in __init__ ...
TypeError: VoxtralSynthesizeStream._run() takes 1 positional argument but 2 were given

Plus a third silent bug discovered while rewriting _build_messages: in 1.5.6 ChatContext.messages() returns only ChatMessage items, so function-call and function-call-output items (the tool round-trip history) were being silently dropped — every multi-turn session with a tool invocation lost context after the first turn.

What changed

voice/agent/custom_plugins/bedrock_mistral_llm.py

Two production-code fixes plus one refactor:

  • _build_tools(tools) (new helper extracted from chat()): dispatches by isinstance(t, llm.RawFunctionTool) vs isinstance(t, llm.FunctionTool), mirroring the canonical pattern in livekit.agents.llm._provider_format.openai.to_fnc_ctx. For FunctionTool, hands off to llm.utils.build_legacy_openai_schema(tool) (the framework helper that handles pydantic-model reflection of the function's signature). For RawFunctionTool, uses tool.info.raw_schema verbatim. Unrecognized types are skipped with a warning, not raised — defensive against future tool variants.
  • _build_messages(chat_ctx) rewritten: walks chat_ctx.items and dispatches by the .type discriminator ("message", "function_call", "function_call_output"). FunctionCall items attach to the most recent assistant message's tool_calls list (Mistral expects them inline on the assistant turn). If no assistant precedes a FunctionCall, an empty-content assistant message is synthesized to carry it. FunctionCallOutput items become separate role: "tool" messages keyed by call_id.
  • New _content_to_str(content) static method handles ChatMessage's content-may-be-a-string-or-a-list shape.

voice/agent/custom_plugins/voxtral_tts.py

Both _run methods rewritten to the 1.5.6 _run(self, output_emitter: AudioEmitter) signature, with proper emission via output_emitter.push(bytes). The framework's parent _main_task calls output_emitter.pushed_duration(idx=-1) after _run returns and raises APIError("no audio frames were pushed") if it sees zero — the previous self._event_ch.send_nowait(SynthesizedAudio(...)) path bypassed the emitter and would have failed that check even if the WebRTC track was populated.

  • VoxtralChunkedStream._run(output_emitter) — non-streaming REST: initialize(stream=False, mime_type="audio/pcm", ...) → POST → push(response.content)flush().
  • VoxtralSynthesizeStream._run(output_emitter) — WebSocket streaming: initialize(stream=True, ...)start_segment(segment_id=...) → delegate to _run_websocket(emitter) (each binary WS message → emitter.push(message)); on local-endpoint WS failure, fall back to _run_http_fallback(emitter) which collects the input text and posts once. end_segment() + flush() always run via try/finally.
  • Both inner methods take the emitter as a parameter; tts.SynthesizedAudio and utils.audio.AudioFrame imports removed (no remaining consumers); asyncio import removed (no longer used directly).

voice/agent/custom_plugins/faster_whisper_stt.py

No changes. The _recognize_impl(self, buffer, *, language, conn_options) delegate from PR-I (#692) already complies with the 1.5.6 STT abstract method contract; STT does not use the AudioEmitter model. Re-audited 2026-04-25 against #700.

voice/agent/main.py

Comment-only audit notes by AgentSession(...) (line ~158):

  • RoomInputOptions defaults audited and confirmed compatible with the existing @ctx.room.on("disconnected") handler — close_on_disconnect=True is the framework default and matches our use; no override needed.
  • turn_detection= deprecation warning called out with a # TODO(post-v2.0) comment. Backward-compat in 1.5.6; the v2.0 bump (and turn_handling=TurnHandlingOptions(...) migration) is a separate epic per #700 out-of-scope list.

voice/tests/conftest.py

  • New _AudioEmitter stub class (attached as _tts_mod.AudioEmitter) with initialize, start_segment, end_segment, push, flush, end_input, pushed_duration methods + pushed_data: list[bytes] accumulator.
  • New _FunctionTool / _RawFunctionTool stub classes (attached as _llm_mod.FunctionTool / RawFunctionTool) so the production code's isinstance() dispatch is exercised by tests.
  • New _ChatContext stub class (attached as _llm_mod.ChatContext) replacing the previous MagicMock — exposes .items as a list attribute and .messages() as a filtered method.
  • New _llm_mod.utils namespace with build_legacy_openai_schema(tool) returning the OpenAI-shaped {"type": "function", "function": {"name", "description", "parameters"}} dict from the tool's .info. Mirrors the framework's helper so production code can call llm.utils.build_legacy_openai_schema(tool) without runtime import gymnastics in CI.

voice/tests/test_bedrock_mistral_llm.py

  • _make_chat_context(items) rewritten to mirror 1.5.6's .items attribute shape with per-item .type discriminator. Backwards-compat shorthand: a dict with only role/content is treated as a message item.
  • Existing test_tool_call_message and test_tool_result_message updated to use the items shape (function_call + function_call_output instead of tool_calls field on a ChatMessage).
  • New test_function_call_without_assistant_synthesizes_one.
  • New test_full_tool_round_trip (user → assistant+FunctionCall → FunctionCallOutput → user) asserts the full Mistral messages array is built correctly.
  • test_request_body_with_tools split into three: test_request_body_with_function_tool, test_request_body_with_raw_function_tool, test_unrecognized_tool_skipped.

voice/tests/test_voxtral_tts.py

  • New TestChunkedStreamEmitter::test_chunked_stream_pushes_via_emitter — mocks the httpx post, calls _run(emitter), asserts the emitter was initialized with mime_type="audio/pcm" + stream=False and received a single push(bytes).
  • New TestSynthesizeStreamEmitter::test_synthesize_stream_pushes_segments — patches _run_websocket to push 2 chunks, asserts segment open/close + 2× push + flush.
  • New TestSynthesizeStreamEmitter::test_synthesize_stream_falls_back_on_ws_error — forces WS failure on a local endpoint and asserts the HTTP fallback path emits via the same emitter.

Cross-checks (read-only, before the rewrite)

Three pre-flight checks against the upstream livekit/agents source confirmed the API surface before any code change:

  • livekit-plugins-openai/livekit/plugins/openai/llm.py and livekit-agents/livekit/agents/llm/_provider_format/openai.py:to_fnc_ctx — confirms isinstance(tool, llm.RawFunctionTool) / llm.FunctionTool discrimination + llm.utils.build_legacy_openai_schema(tool) for the regular path. Our code mirrors this exactly.
  • livekit-agents/livekit/agents/llm/chat_context.py — confirms ChatItem = ChatMessage | FunctionCall | FunctionCallOutput | AgentHandoff | AgentConfigUpdate discriminated by .type. Confirms .items is a property, not a method. Confirms ChatMessage no longer has a tool_calls field — tool calls live as separate FunctionCall items.
  • livekit-agents/livekit/agents/tts/tts.py — confirms _run(self, output_emitter: AudioEmitter) is the 1.5.6 abstract method on both ChunkedStream and SynthesizeStream. Confirms AudioEmitter.push(data: bytes), initialize(*, request_id, sample_rate, num_channels, mime_type, frame_size_ms=200, stream=False), start_segment(*, segment_id), end_segment(), flush(). Confirms parent post-_run validation if pushed_duration(idx=-1) <= 0.0: raise APIError(...).

Why now

R-10 Probe 2 on 2026-04-25 (PR #699) verified the deployment infrastructure is correct (5 services healthy, GPU within budget, TLS/WSS reachable, auth gate working, agent dispatches on participant join) but the agent's _llm_inference_task and _tts_inference_task crashed during reply generation with the two errors above. PR-I/J/K/L addressed four prior layers; this completes the alignment as one coherent change rather than a fifth salami fix.

Test plan

  • CI: pytest voice/tests/ — 158 tests pass locally (was 122 before; the 36 new tests cover function-tool dispatch, items walk, AudioEmitter push, and the full tool round-trip).
  • python -c "from custom_plugins.bedrock_mistral_llm import BedrockMistralLLM; from custom_plugins.voxtral_tts import VoxtralTTS" imports cleanly under the conftest stubs.
  • Image rebuild on EC2 (~10 s) + --force-recreate voice-agent.
  • R-10 Probe 2 re-run against the rebuilt agent — full E2E reply generation passes:
    • No 'FunctionTool' object has no attribute 'name' in _llm_inference_task.
    • No _run() takes 1 positional argument in _tts_inference_task.
    • No APIError("no audio frames were pushed") after Voxtral synth.
    • Probe receives non-silent, non-placeholder audio matching the LLM reply duration (not the previous 1.9 MB framework silence).
    • output_guard pipeline-log stage fires with is_safe.
  • Logged in apps/docs/content/docs/operations/voice-bringup-verification-<merge-date>.mdx (extends the 2026-04-25 doc).

Rollout / reversibility

Reversible via revert. EC2 rebuild + recreate (~10 s). Tested locally end-to-end against the conftest stub layer; the only runtime difference on EC2 is the real livekit-agents 1.5.6 — and we've cross-checked the production code against the upstream patterns exactly.

Out of scope

Per the #700 issue body:

  • turn_detection=turn_handling=TurnHandlingOptions(...) migration (v2.0 bump epic).
  • Production wiring (CSP, /api/voice/token anonymous mode, LIVEKIT_URL on Vercel — release-gated).
  • Custom Julia voice (replacing de_female with a fine-tuned timbre — product decision).
  • RT-1 (#673) 97% VRAM ceiling research.
  • R-1 / R-2 / R-12 docs work under #660.
  • Epic: #700
  • R-10 verification: #670 + PR #699 (voice-bringup-verification-2026-04-25.mdx)
  • Bring-up parent: #660 (R-0 Voice Deploy Repair)
  • Bring-up MEGA: #672
  • Prior fix PRs in the salami series: #692, #693, #694, #696
  • Original implementation epic: #337 (closed 2026-04-12)

On this page