lobo/stt

Live microphone speech to text using vosk

Find a file

Emil Lerch 885283e486 All checks were successful Generic zig build / build (push) Successful in 1m4s Details add mutex locks around modifications to child process array		2025-10-23 14:15:28 -07:00
.gitea/workflows	add ci	2025-10-09 12:08:01 -07:00
src	add mutex locks around modifications to child process array	2025-10-23 14:15:28 -07:00
.gitignore	set vosk initializer sample rate to actual hardware/remove resample	2025-10-06 13:01:22 -07:00
.mise.toml	working sample without nix needed	2025-09-09 19:02:07 -07:00
.pre-commit-config.yaml	add structural aspects to repo	2025-09-09 14:08:57 -07:00
alsa-example.conf	restructure alsa.conf	2025-10-06 12:44:55 -07:00
build.zig	fix issues with script not being run every time	2025-10-09 09:29:18 -07:00
build.zig.zon	support linux arm64, riscv64, and x86_64	2025-09-09 20:14:49 -07:00
fix_needed.sh	fix release mode link path in final elf output	2025-10-09 08:53:19 -07:00
LICENSE	AI generated first pass	2025-09-09 10:15:16 -07:00
README.md	fix release mode link path in final elf output	2025-10-09 08:53:19 -07:00
stt.service	add example service file	2025-10-09 08:54:12 -07:00

README.md

Real-time Speech Recognition with Vosk and Zig

This project implements a minimal real-time speech-to-text application using Vosk and Zig.

Audio Device Configuration

The application uses ALSA's default device, which is configured in alsa.conf. To use a different audio device:

Find your audio devices: aplay -l or arecord -l

Edit alsa.conf and update the pcm.!default section:

pcm.!default {
    type hw
    card 3      # Change to your card number
    device 0    # Change to your device number
}

Rebuild and run the application

Prerequisites

Zig 0.15.1 (configured via mise)
Nix development environment configured for ALSA, and audio libraries
patchelf (for fixing RPATH in release builds): nix-env -iA nixpkgs.patchelf

Vosk Model Download

The application uses the Vosk small English model for speech recognition:

Source: https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
Size: ~50MB
Language: English only
Accuracy: Good for simple sentences and commands

Installation Steps

Enter nix development environment: nix develop
Build application: zig build
Run: zig build run

Release Builds and Portability

When building in release mode (-Doptimize=ReleaseSafe), Zig embeds the full path to libvosk.so in the ELF NEEDED entries, making the binary non-portable. The build system automatically fixes this by running fix_needed.sh which uses patchelf to replace the full path with just the library name.

Automatic fix: Just run zig build -Doptimize=ReleaseSafe - the NEEDED entries are fixed automatically.

Manual fix: If needed, you can run ./fix_needed.sh [binary_path] [library_name] manually.

The script uses patchelf (via nix-shell if not installed) to replace entries like:

Before: NEEDED: [/home/user/.cache/zig/.../libvosk.so]
After: NEEDED: [libvosk.so]

This makes the binary portable while using the existing RPATH ($ORIGIN/../lib) to find the library at runtime.

Usage

The application will:

Initialize audio capture from default microphone
Load the Vosk speech recognition model
Process audio in real-time
Output recognized text to terminal
Exit on Ctrl+C

Dependencies

Vosk C API library
ALSA for audio capture

Notes

Vosk tends to recognize "light" as lake or like