fragJulia
Changelog

2026-04-25 — FasterWhisperSTT _recognize_impl (livekit-agents 1.5 contract)

Description

The custom STT plugin overrode the public recognize() method, which is no longer abstract in livekit-agents 1.5+. The leaf abstract method moved to _recognize_impl (private). Result: the class could not be instantiated — voice-agent crashed on every job dispatch with TypeError: Can't instantiate abstract class FasterWhisperSTT without an implementation for abstract method '_recognize_impl'. Voice-agent has never been able to actually transcribe any audio on this stack — STT was broken before the agent ever saw a frame. PR-I rewires the override to satisfy the new contract; transcription logic is unchanged.

What changed

voice/agent/custom_plugins/faster_whisper_stt.py:

  • Added _recognize_impl(buffer, *, language, conn_options) as a thin delegate that calls recognize(). This satisfies the abstract method contract that livekit-agents 1.5+ moved to private. Required to make the class instantiable; loop-safe because our subclass also overrides recognize() (the body terminates the recursion).
  • Kept recognize(*, buffer, language=str|None, **kwargs) exactly as it was — same signature, same body. Tests calling await stt.recognize(buffer=...) continue to work unchanged.
  • Body of recognize() got one tiny change: language or self._opts.languagelanguage if isinstance(language, str) and language else self._opts.language. Reason: the framework may pass its NOT_GIVEN sentinel (truthy by default — would short-circuit the original or check). The isinstance guard treats anything that isn't a real non-empty string as "use default."
  • Optional model and provider properties added so the framework's metrics emission carries useful labels instead of unknown / unknown.

Imports: only from livekit.agents import stt, utils (unchanged). Deliberately did not add imports of APIConnectOptions, NOT_GIVEN, NotGivenOr, or is_given — see CI compatibility note below.

The framework's public recognize() (which wraps _recognize_impl with retry + metrics + error eventing) is overridden in this subclass — our recognize() runs the body directly. The _recognize_impl delegate is invoked when the framework calls it from any other entry point, and routes back to our recognize().

CI compatibility

voice/tests/conftest.py stubs livekit.agents as a types.ModuleType (with stt, utils, tts, llm set as attributes), NOT as a package. So:

  • from livekit.agents import stt works (attribute access on the stub) ✓
  • from livekit.agents.types import X fails with 'livekit.agents' is not a package
  • from livekit.agents.utils import is_given fails for the same reason ✗

Type annotations on _recognize_impl use Any (already-imported via from typing import Any). Lazy string evaluation under from __future__ import annotations means even this isn't strictly necessary, but Any is the most explicit and lint-friendly. Runtime behavior is unaffected.

Why now (R-10 verification, 2026-04-25)

Discovered during the R-10 #670 E2E probe today. A Python participant joined room verify-2026-04-25 from inside the voice-agent container, published 3.8s of synthesized German audio, and held for 45 seconds. voice-agent dispatched a worker job in response — and the job crashed at main.py:133 stt = FasterWhisperSTT() with the TypeError above. No audio was processed, no response came back. All five services were (healthy) in docker compose ps because that status reflects framework-level worker registration, not job execution.

The 2026-04-24 bring-up missed this because it never exercised the agent job path. Direct-endpoint probes (Voxtral synthesis, Llama Guard inference, Caddy /healthz) bypass the agent entirely.

Cross-check (read-only, no code copied)

Verified the signature against:

  1. Our installed livekit-agents 1.5.6 in the running container (/usr/local/local/lib/python3.12/dist-packages/livekit/agents/stt/stt.py:170).
  2. livekit-plugins-openai/stt.py on livekit/agents main branch as a reference for the abstract method's positional-buffer signature and conn_options parameter.
  3. voice/tests/conftest.py to confirm what the test infra stubs vs. what's left non-stubbed (only stt/utils/tts/llm attribute access works; submodule imports don't).

The delegate-based approach was chosen specifically because it satisfies all three constraints simultaneously: framework runtime, test stub, and minimal diff.

Test plan

  • CI: python -c "from custom_plugins.faster_whisper_stt import FasterWhisperSTT" does not raise — proves the class is no longer abstract.
  • After image rebuild + redeploy on EC2: re-run R-10 E2E probe (/tmp/probe2_e2e.py). voice-agent's worker job no longer crashes at line 133. Logs show Transcription: …ms for …s audio → '…' from the plugin.
  • Direct verification: connect a participant, publish 3.8s of German audio, wait — voice-agent emits a TTS response audio track. R-10 Probe 2 passes.
  • Metrics emit shows model = /models/faster-whisper-large-v3 and provider = faster-whisper instead of unknown / unknown.

Rollout / reversibility

Reversible via revert. After merge, the voice-agent image must be rebuilt on EC2 (docker compose build voice-agent && docker compose up -d voice-agent) for the change to take effect at runtime. ~5 min downtime during the rebuild.

Out of scope

This PR fixes the bring-up gate only — agent can now process audio E2E on the self-hosted stack. Subsequent integration work to make https://fragjulia.de/knotencheck reach this self-hosted agent (production env, CSP, token-endpoint anonymous mode for the Knotencheck route) is gated by the product release schedule and not on the bring-up's plate.

On this page