stt test
This commit is contained in:
parent
c408693861
commit
02f24bb524
15 changed files with 4184 additions and 15 deletions
108
flakes/stt_ime/README.md
Normal file
108
flakes/stt_ime/README.md
Normal file
|
|
@ -0,0 +1,108 @@
|
|||
# stt_ime - Speech-to-Text Input Method for Fcitx5
|
||||
|
||||
Local, privacy-preserving speech-to-text that integrates as a native Fcitx5 input method.
|
||||
|
||||
## Components
|
||||
|
||||
- **stt-stream**: Rust CLI that captures audio, runs VAD, and transcribes with Whisper
|
||||
- **fcitx5-stt**: C++ Fcitx5 addon that spawns stt-stream and commits text to apps
|
||||
|
||||
## Modes
|
||||
|
||||
- **Manual**: Press `Ctrl+Space` or `Ctrl+R` to start/stop recording
|
||||
- **Oneshot**: Automatically starts on speech, commits on silence, then resets
|
||||
- **Continuous**: Always listening, commits each utterance automatically
|
||||
|
||||
Press `Ctrl+M` while STT is active to cycle between modes.
|
||||
|
||||
## Keys (when STT input method is active)
|
||||
|
||||
| Key | Action |
|
||||
|-----|--------|
|
||||
| `Ctrl+Space` / `Ctrl+R` | Toggle recording (manual mode) |
|
||||
| `Ctrl+M` | Cycle mode (manual → oneshot → continuous) |
|
||||
| `Enter` | Accept current preedit text |
|
||||
| `Escape` | Cancel recording / clear preedit |
|
||||
|
||||
## Usage
|
||||
|
||||
### NixOS Module
|
||||
|
||||
```nix
|
||||
# In your host's flake.nix inputs:
|
||||
stt_ime.url = "git+https://git.ros.one/josh/nixos-config?dir=flakes/stt_ime";
|
||||
|
||||
# In your NixOS config:
|
||||
{
|
||||
imports = [ inputs.stt_ime.nixosModules.default ];
|
||||
|
||||
ringofstorms.sttIme = {
|
||||
enable = true;
|
||||
model = "base.en"; # tiny, base, small, medium, large-v3 (add .en for English-only)
|
||||
useGpu = false; # set true for CUDA acceleration
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Standalone CLI
|
||||
|
||||
```bash
|
||||
# Run with default settings (manual mode)
|
||||
stt-stream
|
||||
|
||||
# Run in continuous mode
|
||||
stt-stream --mode continuous
|
||||
|
||||
# Use a specific model
|
||||
stt-stream --model small-en
|
||||
|
||||
# Commands via stdin (manual mode):
|
||||
echo "start" | stt-stream # begin recording
|
||||
echo "stop" | stt-stream # stop and transcribe
|
||||
echo "cancel" | stt-stream # cancel without transcribing
|
||||
echo "shutdown" | stt-stream # exit
|
||||
```
|
||||
|
||||
### Output Format (NDJSON)
|
||||
|
||||
```json
|
||||
{"type":"ready"}
|
||||
{"type":"recording_started"}
|
||||
{"type":"partial","text":"hello worl"}
|
||||
{"type":"partial","text":"hello world"}
|
||||
{"type":"final","text":"Hello world."}
|
||||
{"type":"recording_stopped"}
|
||||
{"type":"shutdown"}
|
||||
```
|
||||
|
||||
## Models
|
||||
|
||||
Models are automatically downloaded from Hugging Face on first run and cached in `~/.cache/stt-stream/models/`.
|
||||
|
||||
| Model | Size | Speed | Quality |
|
||||
|-------|------|-------|---------|
|
||||
| tiny.en | ~75MB | Fastest | Basic |
|
||||
| base.en | ~150MB | Fast | Good (default) |
|
||||
| small.en | ~500MB | Medium | Better |
|
||||
| medium.en | ~1.5GB | Slow | Great |
|
||||
| large-v3 | ~3GB | Slowest | Best (multilingual) |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
- `STT_STREAM_MODEL_PATH`: Path to a specific model file
|
||||
- `STT_STREAM_MODEL`: Model name (overridden by CLI)
|
||||
- `STT_STREAM_USE_GPU`: Set to "1" for GPU acceleration
|
||||
|
||||
## Building
|
||||
|
||||
```bash
|
||||
cd flakes/stt_ime
|
||||
nix build .#stt-stream # Rust CLI only
|
||||
nix build .#fcitx5-stt # Fcitx5 addon (includes stt-stream)
|
||||
nix build # Default: fcitx5-stt
|
||||
```
|
||||
|
||||
## Integration with de_plasma
|
||||
|
||||
The addon is automatically added to Fcitx5 when `ringofstorms.sttIme.enable = true`.
|
||||
It appears as "Speech to Text" (STT) in the input method switcher alongside US and Mozc.
|
||||
Loading…
Add table
Add a link
Reference in a new issue