Live microphone speech to text using vosk
src | ||
.gitignore | ||
.mise.toml | ||
.pre-commit-config.yaml | ||
alsa.conf | ||
build.zig | ||
build.zig.zon | ||
LICENSE | ||
README.md |
Real-time Speech Recognition with Vosk and Zig
This project implements a minimal real-time speech-to-text application using Vosk and Zig.
Audio Device Configuration
The application uses ALSA's default device, which is configured in alsa.conf
. To use a different audio device:
- Find your audio devices:
aplay -l
orarecord -l
- Edit
alsa.conf
and update thepcm.!default
section:pcm.!default { type hw card 3 # Change to your card number device 0 # Change to your device number }
- Rebuild and run the application
Prerequisites
- Zig 0.15.1 (configured via mise)
- Nix development environment configured for ALSA, and audio libraries
Vosk Model Download
The application uses the Vosk small English model for speech recognition:
- Source: https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
- Size: ~50MB
- Language: English only
- Accuracy: Good for simple sentences and commands
Installation Steps
- Enter nix development environment:
nix develop
- Build application:
zig build
- Run:
zig build run
Usage
The application will:
- Initialize audio capture from default microphone
- Load the Vosk speech recognition model
- Process audio in real-time
- Output recognized text to terminal
- Exit on Ctrl+C
Dependencies
- Vosk C API library
- ALSA for audio capture
Notes
Vosk tends to recognize "light" as lake or like