36 lines
1.2 KiB
Markdown
36 lines
1.2 KiB
Markdown
# Real-time Speech Recognition with Vosk and Zig
|
|
|
|
This project implements a minimal real-time speech-to-text application using Vosk and Zig.
|
|
|
|
## Setup
|
|
|
|
### Prerequisites
|
|
- Zig 0.15.1 (configured via mise)
|
|
- Nix development environment with C compilation tools, ALSA, and audio libraries
|
|
|
|
### Vosk Model Download
|
|
The application uses the Vosk small English model for speech recognition:
|
|
- **Source**: https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
|
|
- **Size**: ~50MB
|
|
- **Language**: English only
|
|
- **Accuracy**: Good for simple sentences and commands
|
|
|
|
### Installation Steps
|
|
1. Enter nix development environment: `nix develop`
|
|
2. Download Vosk model: `wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip`
|
|
3. Extract model: `unzip vosk-model-small-en-us-0.15.zip`
|
|
4. Build application: `zig build`
|
|
5. Run: `./zig-out/bin/stt`
|
|
|
|
## Usage
|
|
The application will:
|
|
- Initialize audio capture from default microphone
|
|
- Load the Vosk speech recognition model
|
|
- Process audio in real-time
|
|
- Output recognized text to terminal
|
|
- Exit on Ctrl+C
|
|
|
|
## Dependencies
|
|
- Vosk C API library
|
|
- ALSA for audio capture
|
|
- Standard C libraries for audio processing
|