|
All checks were successful
Generic zig build / build (push) Successful in 1m4s
|
||
|---|---|---|
| .gitea/workflows | ||
| src | ||
| .gitignore | ||
| .mise.toml | ||
| .pre-commit-config.yaml | ||
| alsa-example.conf | ||
| build.zig | ||
| build.zig.zon | ||
| fix_needed.sh | ||
| LICENSE | ||
| README.md | ||
| stt.service | ||
Real-time Speech Recognition with Vosk and Zig
This project implements a minimal real-time speech-to-text application using Vosk and Zig.
Audio Device Configuration
The application uses ALSA's default device, which is configured in alsa.conf. To use a different audio device:
- Find your audio devices:
aplay -lorarecord -l - Edit
alsa.confand update thepcm.!defaultsection:pcm.!default { type hw card 3 # Change to your card number device 0 # Change to your device number } - Rebuild and run the application
Prerequisites
- Zig 0.15.1 (configured via mise)
- Nix development environment configured for ALSA, and audio libraries
- patchelf (for fixing RPATH in release builds):
nix-env -iA nixpkgs.patchelf
Vosk Model Download
The application uses the Vosk small English model for speech recognition:
- Source: https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
- Size: ~50MB
- Language: English only
- Accuracy: Good for simple sentences and commands
Installation Steps
- Enter nix development environment:
nix develop - Build application:
zig build - Run:
zig build run
Release Builds and Portability
When building in release mode (-Doptimize=ReleaseSafe), Zig embeds the full path to libvosk.so in the ELF NEEDED entries, making the binary non-portable. The build system automatically fixes this by running fix_needed.sh which uses patchelf to replace the full path with just the library name.
Automatic fix: Just run zig build -Doptimize=ReleaseSafe - the NEEDED entries are fixed automatically.
Manual fix: If needed, you can run ./fix_needed.sh [binary_path] [library_name] manually.
The script uses patchelf (via nix-shell if not installed) to replace entries like:
- Before:
NEEDED: [/home/user/.cache/zig/.../libvosk.so] - After:
NEEDED: [libvosk.so]
This makes the binary portable while using the existing RPATH ($ORIGIN/../lib) to find the library at runtime.
Usage
The application will:
- Initialize audio capture from default microphone
- Load the Vosk speech recognition model
- Process audio in real-time
- Output recognized text to terminal
- Exit on Ctrl+C
Dependencies
- Vosk C API library
- ALSA for audio capture
Notes
Vosk tends to recognize "light" as lake or like