dotfiles/flakes/stt_ime/README.md
RingOfStorms (Joshua Bell) 02f24bb524 stt test
2026-01-13 14:10:10 -06:00

2.9 KiB

stt_ime - Speech-to-Text Input Method for Fcitx5

Local, privacy-preserving speech-to-text that integrates as a native Fcitx5 input method.

Components

  • stt-stream: Rust CLI that captures audio, runs VAD, and transcribes with Whisper
  • fcitx5-stt: C++ Fcitx5 addon that spawns stt-stream and commits text to apps

Modes

  • Manual: Press Ctrl+Space or Ctrl+R to start/stop recording
  • Oneshot: Automatically starts on speech, commits on silence, then resets
  • Continuous: Always listening, commits each utterance automatically

Press Ctrl+M while STT is active to cycle between modes.

Keys (when STT input method is active)

Key Action
Ctrl+Space / Ctrl+R Toggle recording (manual mode)
Ctrl+M Cycle mode (manual → oneshot → continuous)
Enter Accept current preedit text
Escape Cancel recording / clear preedit

Usage

NixOS Module

# In your host's flake.nix inputs:
stt_ime.url = "git+https://git.ros.one/josh/nixos-config?dir=flakes/stt_ime";

# In your NixOS config:
{
  imports = [ inputs.stt_ime.nixosModules.default ];

  ringofstorms.sttIme = {
    enable = true;
    model = "base.en";  # tiny, base, small, medium, large-v3 (add .en for English-only)
    useGpu = false;     # set true for CUDA acceleration
  };
}

Standalone CLI

# Run with default settings (manual mode)
stt-stream

# Run in continuous mode
stt-stream --mode continuous

# Use a specific model
stt-stream --model small-en

# Commands via stdin (manual mode):
echo "start" | stt-stream  # begin recording
echo "stop" | stt-stream   # stop and transcribe
echo "cancel" | stt-stream # cancel without transcribing
echo "shutdown" | stt-stream # exit

Output Format (NDJSON)

{"type":"ready"}
{"type":"recording_started"}
{"type":"partial","text":"hello worl"}
{"type":"partial","text":"hello world"}
{"type":"final","text":"Hello world."}
{"type":"recording_stopped"}
{"type":"shutdown"}

Models

Models are automatically downloaded from Hugging Face on first run and cached in ~/.cache/stt-stream/models/.

Model Size Speed Quality
tiny.en ~75MB Fastest Basic
base.en ~150MB Fast Good (default)
small.en ~500MB Medium Better
medium.en ~1.5GB Slow Great
large-v3 ~3GB Slowest Best (multilingual)

Environment Variables

  • STT_STREAM_MODEL_PATH: Path to a specific model file
  • STT_STREAM_MODEL: Model name (overridden by CLI)
  • STT_STREAM_USE_GPU: Set to "1" for GPU acceleration

Building

cd flakes/stt_ime
nix build .#stt-stream    # Rust CLI only
nix build .#fcitx5-stt    # Fcitx5 addon (includes stt-stream)
nix build                  # Default: fcitx5-stt

Integration with de_plasma

The addon is automatically added to Fcitx5 when ringofstorms.sttIme.enable = true. It appears as "Speech to Text" (STT) in the input method switcher alongside US and Mozc.