2.9 KiB
2.9 KiB
stt_ime - Speech-to-Text Input Method for Fcitx5
Local, privacy-preserving speech-to-text that integrates as a native Fcitx5 input method.
Components
- stt-stream: Rust CLI that captures audio, runs VAD, and transcribes with Whisper
- fcitx5-stt: C++ Fcitx5 addon that spawns stt-stream and commits text to apps
Modes
- Manual: Press
Ctrl+SpaceorCtrl+Rto start/stop recording - Oneshot: Automatically starts on speech, commits on silence, then resets
- Continuous: Always listening, commits each utterance automatically
Press Ctrl+M while STT is active to cycle between modes.
Keys (when STT input method is active)
| Key | Action |
|---|---|
Ctrl+Space / Ctrl+R |
Toggle recording (manual mode) |
Ctrl+M |
Cycle mode (manual → oneshot → continuous) |
Enter |
Accept current preedit text |
Escape |
Cancel recording / clear preedit |
Usage
NixOS Module
# In your host's flake.nix inputs:
stt_ime.url = "git+https://git.ros.one/josh/nixos-config?dir=flakes/stt_ime";
# In your NixOS config:
{
imports = [ inputs.stt_ime.nixosModules.default ];
ringofstorms.sttIme = {
enable = true;
model = "base.en"; # tiny, base, small, medium, large-v3 (add .en for English-only)
useGpu = false; # set true for CUDA acceleration
};
}
Standalone CLI
# Run with default settings (manual mode)
stt-stream
# Run in continuous mode
stt-stream --mode continuous
# Use a specific model
stt-stream --model small-en
# Commands via stdin (manual mode):
echo "start" | stt-stream # begin recording
echo "stop" | stt-stream # stop and transcribe
echo "cancel" | stt-stream # cancel without transcribing
echo "shutdown" | stt-stream # exit
Output Format (NDJSON)
{"type":"ready"}
{"type":"recording_started"}
{"type":"partial","text":"hello worl"}
{"type":"partial","text":"hello world"}
{"type":"final","text":"Hello world."}
{"type":"recording_stopped"}
{"type":"shutdown"}
Models
Models are automatically downloaded from Hugging Face on first run and cached in ~/.cache/stt-stream/models/.
| Model | Size | Speed | Quality |
|---|---|---|---|
| tiny.en | ~75MB | Fastest | Basic |
| base.en | ~150MB | Fast | Good (default) |
| small.en | ~500MB | Medium | Better |
| medium.en | ~1.5GB | Slow | Great |
| large-v3 | ~3GB | Slowest | Best (multilingual) |
Environment Variables
STT_STREAM_MODEL_PATH: Path to a specific model fileSTT_STREAM_MODEL: Model name (overridden by CLI)STT_STREAM_USE_GPU: Set to "1" for GPU acceleration
Building
cd flakes/stt_ime
nix build .#stt-stream # Rust CLI only
nix build .#fcitx5-stt # Fcitx5 addon (includes stt-stream)
nix build # Default: fcitx5-stt
Integration with de_plasma
The addon is automatically added to Fcitx5 when ringofstorms.sttIme.enable = true.
It appears as "Speech to Text" (STT) in the input method switcher alongside US and Mozc.