65 lines
2.3 KiB
Markdown
65 lines
2.3 KiB
Markdown
# Real-time Speech Recognition with Vosk and Zig
|
|
|
|
This project implements a minimal real-time speech-to-text application using Vosk and Zig.
|
|
|
|
## Audio Device Configuration
|
|
|
|
The application uses ALSA's default device, which is configured in `alsa.conf`. To use a different audio device:
|
|
|
|
1. Find your audio devices: `aplay -l` or `arecord -l`
|
|
2. Edit `alsa.conf` and update the `pcm.!default` section:
|
|
```
|
|
pcm.!default {
|
|
type hw
|
|
card 3 # Change to your card number
|
|
device 0 # Change to your device number
|
|
}
|
|
```
|
|
3. Rebuild and run the application
|
|
|
|
### Prerequisites
|
|
- Zig 0.15.1 (configured via mise)
|
|
- Nix development environment configured for ALSA, and audio libraries
|
|
- patchelf (for fixing RPATH in release builds): `nix-env -iA nixpkgs.patchelf`
|
|
|
|
### Vosk Model Download
|
|
The application uses the Vosk small English model for speech recognition:
|
|
- **Source**: https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
|
|
- **Size**: ~50MB
|
|
- **Language**: English only
|
|
- **Accuracy**: Good for simple sentences and commands
|
|
|
|
### Installation Steps
|
|
1. Enter nix development environment: `nix develop`
|
|
2. Build application: `zig build`
|
|
3. Run: `zig build run`
|
|
|
|
### Release Builds and Portability
|
|
|
|
When building in release mode (`-Doptimize=ReleaseSafe`), Zig embeds the full path to libvosk.so in the ELF NEEDED entries, making the binary non-portable. The build system automatically fixes this by running `fix_needed.sh` which uses `patchelf` to replace the full path with just the library name.
|
|
|
|
**Automatic fix**: Just run `zig build -Doptimize=ReleaseSafe` - the NEEDED entries are fixed automatically.
|
|
|
|
**Manual fix**: If needed, you can run `./fix_needed.sh [binary_path] [library_name]` manually.
|
|
|
|
The script uses `patchelf` (via nix-shell if not installed) to replace entries like:
|
|
- Before: `NEEDED: [/home/user/.cache/zig/.../libvosk.so]`
|
|
- After: `NEEDED: [libvosk.so]`
|
|
|
|
This makes the binary portable while using the existing RPATH (`$ORIGIN/../lib`) to find the library at runtime.
|
|
|
|
## Usage
|
|
The application will:
|
|
- Initialize audio capture from default microphone
|
|
- Load the Vosk speech recognition model
|
|
- Process audio in real-time
|
|
- Output recognized text to terminal
|
|
- Exit on Ctrl+C
|
|
|
|
## Dependencies
|
|
- Vosk C API library
|
|
- ALSA for audio capture
|
|
|
|
## Notes
|
|
|
|
Vosk tends to recognize "light" as lake or like
|