# Real-time Speech Recognition with Vosk and Zig This project implements a minimal real-time speech-to-text application using Vosk and Zig. ## Audio Device Configuration The application uses ALSA's default device, which is configured in `alsa.conf`. To use a different audio device: 1. Find your audio devices: `aplay -l` or `arecord -l` 2. Edit `alsa.conf` and update the `pcm.!default` section: ``` pcm.!default { type hw card 3 # Change to your card number device 0 # Change to your device number } ``` 3. Rebuild and run the application ### Prerequisites - Zig 0.15.1 (configured via mise) - Nix development environment configured for ALSA, and audio libraries - patchelf (for fixing RPATH in release builds): `nix-env -iA nixpkgs.patchelf` ### Vosk Model Download The application uses the Vosk small English model for speech recognition: - **Source**: https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip - **Size**: ~50MB - **Language**: English only - **Accuracy**: Good for simple sentences and commands ### Installation Steps 1. Enter nix development environment: `nix develop` 2. Build application: `zig build` 3. Run: `zig build run` ### Release Builds and Portability When building in release mode (`-Doptimize=ReleaseSafe`), Zig embeds the full path to libvosk.so in the ELF NEEDED entries, making the binary non-portable. The build system automatically fixes this by running `fix_needed.sh` which uses `patchelf` to replace the full path with just the library name. **Automatic fix**: Just run `zig build -Doptimize=ReleaseSafe` - the NEEDED entries are fixed automatically. **Manual fix**: If needed, you can run `./fix_needed.sh [binary_path] [library_name]` manually. The script uses `patchelf` (via nix-shell if not installed) to replace entries like: - Before: `NEEDED: [/home/user/.cache/zig/.../libvosk.so]` - After: `NEEDED: [libvosk.so]` This makes the binary portable while using the existing RPATH (`$ORIGIN/../lib`) to find the library at runtime. ## Usage The application will: - Initialize audio capture from default microphone - Load the Vosk speech recognition model - Process audio in real-time - Output recognized text to terminal - Exit on Ctrl+C ## Dependencies - Vosk C API library - ALSA for audio capture ## Notes Vosk tends to recognize "light" as lake or like