2026-04-25 — FasterWhisperSTT _recognize_impl (livekit-agents 1.5 contract)
Description
The custom STT plugin overrode the public recognize() method, which is no longer abstract in livekit-agents 1.5+. The leaf abstract method moved to _recognize_impl (private). Result: the class could not be instantiated — voice-agent crashed on every job dispatch with TypeError: Can't instantiate abstract class FasterWhisperSTT without an implementation for abstract method '_recognize_impl'. Voice-agent has never been able to actually transcribe any audio on this stack — STT was broken before the agent ever saw a frame. PR-I rewires the override to satisfy the new contract; transcription logic is unchanged.
What changed
voice/agent/custom_plugins/faster_whisper_stt.py:
- Added
_recognize_impl(buffer, *, language, conn_options)as a thin delegate that callsrecognize(). This satisfies the abstract method contract that livekit-agents 1.5+ moved to private. Required to make the class instantiable; loop-safe because our subclass also overridesrecognize()(the body terminates the recursion). - Kept
recognize(*, buffer, language=str|None, **kwargs)exactly as it was — same signature, same body. Tests callingawait stt.recognize(buffer=...)continue to work unchanged. - Body of
recognize()got one tiny change:language or self._opts.language→language if isinstance(language, str) and language else self._opts.language. Reason: the framework may pass itsNOT_GIVENsentinel (truthy by default — would short-circuit the originalorcheck). The isinstance guard treats anything that isn't a real non-empty string as "use default." - Optional
modelandproviderproperties added so the framework's metrics emission carries useful labels instead ofunknown / unknown.
Imports: only from livekit.agents import stt, utils (unchanged). Deliberately did not add imports of APIConnectOptions, NOT_GIVEN, NotGivenOr, or is_given — see CI compatibility note below.
The framework's public recognize() (which wraps _recognize_impl with retry + metrics + error eventing) is overridden in this subclass — our recognize() runs the body directly. The _recognize_impl delegate is invoked when the framework calls it from any other entry point, and routes back to our recognize().
CI compatibility
voice/tests/conftest.py stubs livekit.agents as a types.ModuleType (with stt, utils, tts, llm set as attributes), NOT as a package. So:
from livekit.agents import sttworks (attribute access on the stub) ✓from livekit.agents.types import Xfails with'livekit.agents' is not a package✗from livekit.agents.utils import is_givenfails for the same reason ✗
Type annotations on _recognize_impl use Any (already-imported via from typing import Any). Lazy string evaluation under from __future__ import annotations means even this isn't strictly necessary, but Any is the most explicit and lint-friendly. Runtime behavior is unaffected.
Why now (R-10 verification, 2026-04-25)
Discovered during the R-10 #670 E2E probe today. A Python participant joined room verify-2026-04-25 from inside the voice-agent container, published 3.8s of synthesized German audio, and held for 45 seconds. voice-agent dispatched a worker job in response — and the job crashed at main.py:133 stt = FasterWhisperSTT() with the TypeError above. No audio was processed, no response came back. All five services were (healthy) in docker compose ps because that status reflects framework-level worker registration, not job execution.
The 2026-04-24 bring-up missed this because it never exercised the agent job path. Direct-endpoint probes (Voxtral synthesis, Llama Guard inference, Caddy /healthz) bypass the agent entirely.
Cross-check (read-only, no code copied)
Verified the signature against:
- Our installed
livekit-agents1.5.6 in the running container (/usr/local/local/lib/python3.12/dist-packages/livekit/agents/stt/stt.py:170). livekit-plugins-openai/stt.pyonlivekit/agentsmain branch as a reference for the abstract method's positional-buffer signature andconn_optionsparameter.voice/tests/conftest.pyto confirm what the test infra stubs vs. what's left non-stubbed (onlystt/utils/tts/llmattribute access works; submodule imports don't).
The delegate-based approach was chosen specifically because it satisfies all three constraints simultaneously: framework runtime, test stub, and minimal diff.
Test plan
- CI:
python -c "from custom_plugins.faster_whisper_stt import FasterWhisperSTT"does not raise — proves the class is no longer abstract. - After image rebuild + redeploy on EC2: re-run R-10 E2E probe (
/tmp/probe2_e2e.py). voice-agent's worker job no longer crashes at line 133. Logs showTranscription: …ms for …s audio → '…'from the plugin. - Direct verification: connect a participant, publish 3.8s of German audio, wait — voice-agent emits a TTS response audio track. R-10 Probe 2 passes.
- Metrics emit shows
model = /models/faster-whisper-large-v3andprovider = faster-whisperinstead ofunknown / unknown.
Rollout / reversibility
Reversible via revert. After merge, the voice-agent image must be rebuilt on EC2 (docker compose build voice-agent && docker compose up -d voice-agent) for the change to take effect at runtime. ~5 min downtime during the rebuild.
Out of scope
This PR fixes the bring-up gate only — agent can now process audio E2E on the self-hosted stack. Subsequent integration work to make https://fragjulia.de/knotencheck reach this self-hosted agent (production env, CSP, token-endpoint anonymous mode for the Knotencheck route) is gated by the product release schedule and not on the bring-up's plate.
2026-04-25 — main.py event handlers: sync wrapper + asyncio.create_task (livekit-agents 1.5+)
Previous Page
2026-04-25 — VOXTRAL_VOICE_ID fixed (julia_knotencheck → de_female)
voice/.env.example had VOXTRAL_VOICE_ID=julia_knotencheck — a placeholder for a custom voice that was never loaded into vllm-omni. The bundled voxtral-tts voices are language-locale (de_female, fr_male, …); requesting julia_knotencheck returns 400 BadRequestError on every synth call. Default updated to de_female (German female, interim) until/if a custom Julia voice is added.