Simple Record Format - a fast, basic format for caching and simple config
Find a file
Emil Lerch b37fb7fb1a
switch to f64 for numbers (matches json)
That change improves performance. Before:

Benchmark 1: /home/lobo/shared/srf/zig-out/bin/srf srf <.zig-cache/o/51f43613e6e43ed5/test-srf-compact.srf
  Time (mean ± σ):     113.8 ms ±   3.8 ms    [User: 61.2 ms, System: 52.5 ms]
  Range (min … max):   109.4 ms … 128.5 ms    26 runs

Benchmark 2: /home/lobo/shared/srf/zig-out/bin/srf srf <.zig-cache/o/c105a3d3b0472f2e/test-srf-long.srf
  Time (mean ± σ):     115.5 ms ±   4.0 ms    [User: 59.1 ms, System: 56.3 ms]
  Range (min … max):   110.8 ms … 127.1 ms    26 runs

Benchmark 3: /home/lobo/shared/srf/zig-out/bin/srf json <.zig-cache/o/cb2a4e8b89e72422/test-json.json
  Time (mean ± σ):     131.5 ms ±   3.1 ms    [User: 83.2 ms, System: 48.3 ms]
  Range (min … max):   127.7 ms … 138.0 ms    23 runs

Summary
  /home/lobo/shared/srf/zig-out/bin/srf srf <.zig-cache/o/51f43613e6e43ed5/test-srf-compact.srf ran
    1.01 ± 0.05 times faster than /home/lobo/shared/srf/zig-out/bin/srf srf <.zig-cache/o/c105a3d3b0472f2e/test-srf-long.srf
    1.16 ± 0.05 times faster than /home/lobo/shared/srf/zig-out/bin/srf json <.zig-cache/o/cb2a4e8b89e72422/test-json.json

After:

Benchmark 1: /home/lobo/shared/srf/zig-out/bin/srf srf <.zig-cache/o/51f43613e6e43ed5/test-srf-compact.srf
  Time (mean ± σ):      98.9 ms ±   2.8 ms    [User: 51.3 ms, System: 47.7 ms]
  Range (min … max):    95.4 ms … 106.6 ms    29 runs

Benchmark 2: /home/lobo/shared/srf/zig-out/bin/srf srf <.zig-cache/o/c105a3d3b0472f2e/test-srf-long.srf
  Time (mean ± σ):     103.1 ms ±   3.4 ms    [User: 53.8 ms, System: 49.4 ms]
  Range (min … max):    99.0 ms … 112.8 ms    28 runs

Benchmark 3: /home/lobo/shared/srf/zig-out/bin/srf json <.zig-cache/o/cb2a4e8b89e72422/test-json.json
  Time (mean ± σ):     122.6 ms ±   4.4 ms    [User: 75.1 ms, System: 47.6 ms]
  Range (min … max):   117.7 ms … 130.7 ms    22 runs

Summary
  /home/lobo/shared/srf/zig-out/bin/srf srf <.zig-cache/o/51f43613e6e43ed5/test-srf-compact.srf ran
    1.04 ± 0.04 times faster than /home/lobo/shared/srf/zig-out/bin/srf srf <.zig-cache/o/c105a3d3b0472f2e/test-srf-long.srf
    1.24 ± 0.06 times faster than /home/lobo/shared/srf/zig-out/bin/srf json <.zig-cache/o/cb2a4e8b89e72422/test-json.json
2026-01-23 15:27:24 -08:00
src switch to f64 for numbers (matches json) 2026-01-23 15:27:24 -08:00
.gitignore initial implementation with long format tests passing 2026-01-13 12:12:51 -08:00
.mise.toml zig init + config 2026-01-12 13:16:27 -08:00
.pre-commit-config.yaml zig init + config 2026-01-12 13:16:27 -08:00
build.zig add benchmarking (we are faster than json if using FBA) 2026-01-23 14:54:39 -08:00
build.zig.zon zig init + config 2026-01-12 13:16:27 -08:00
LICENSE add MIT license 2026-01-12 13:13:09 -08:00
README.md initial implementation with long format tests passing 2026-01-13 12:12:51 -08:00

SRF (Simple Record Format)

SRF is a minimal data format designed for L2 caches and simple structured storage suitable for simple configuration as well. It provides human-readable key-value records with basic type hints, while avoiding the parsing complexity and escaping requirements of JSON.

Features:

  • No escaping required - use length-prefixed strings for complex data
  • Single-pass parsing with minimal memory allocation
  • Basic type system (string, num, bool, null, binary) with explicit type hints
  • Compact format for machine generation, long format for human editing
  • Built-in corruption detection with optional EOF markers

When to use SRF:

  • L2 caches that need occasional human inspection
  • Simple configuration files with mixed data types
  • Data exchange where JSON escaping is problematic
  • Applications requiring fast, predictable parsing

When not to use SRF:

  • Complex nested data structures (use JSON/TOML instead)
  • Schema validation requirements
  • Arrays or object hierarchies (arrays can be managed in the data itself, however)

Long format:

#!srfv1 # mandatory comment with format and version. Parser instructions start with #!
#!requireeof # Set this if you want parsing to fail when #!eof not present on last line
#!long # Mandatory to use multiline records, compact format is optional #!compact
# A comment
# empty lines ignored

key::string value, with any data except a \n. an optional string length between the colons
this is a number:num: 5 
null value:null:
array::array's don't exist. Use json or toml or something
data with newlines must have a length:7:foo
bar
boolean value:bool:false

  # Empty line separates records, but comments don't count as empty
key::this is the second record
this is a number:num:42 
null value:null:
array::array's still don't exist
data with newlines must have a length::single line
#!eof # eof marker, useful to make sure your file wasn't cut in half. Only considered if requireeof set at top

compact format:

#!srfv1 # mandatory comment with format and version. Parser instructions start with #!
key::string value must have a length between colons or end with a comma,this is a number:num:5 ,null value:null:,array::array's don't exist. Use json or toml or something,data with newlines must have a length:7:foo
bar,boolean value:bool:false
key::this is the second record

Second record problem...these

Implementation

Note: Long format may be tabled for later development. Initial implementation will focus on compact format only.

Unrecognized #!<keyword> should be an error: requireof -> requireeof will probably be a common problem. #! anywhere other than the beginning or end is an error

newline separates records in compact format. An empty line is required in long format

comma separates fields in compact format newline separates fields in long format

Should we have a #!hash directive to include all data not starting with #! ?

Implementation Concerns

Parser robustness:

  • Integer overflow: Length parsing could overflow on malformed input - need bounds checking
  • Memory exhaustion: Malicious length values could cause huge allocations before you realize the data isn't there
  • Partial reads: What happens if you read a length but the actual data is truncated?
  • Type coercion edge cases: How do you handle "5.0" for num type, or "TRUE" vs "true" for bool?

Format specification:

  • Zero-length keys are invalid
  • Key collisions are allowed - second occurrence overwrites the first
  • Whitespace is significant and preserved in values
  • Length-prefixed strings are bags of bytes
  • Binary type uses base64 encoding for binary data
  • Empty keys: Zero-length keys (::value) are invalid
  • Trailing separators are invalid in both formats (e.g., key:val, or extra newlines beyond record separators)

Cache-specific issues:

  • Corruption detection: Beyond #!eof, partial writes mid-record detection is an outstanding issue
  • Version compatibility: Decision should be made by library consumer (ignore or delete/recreate)
  • Record limits: No limits on record size or field count - handled by library consumer
  • Extra fields: When consumer provides struct, should extra fields in file be ignored or error? (configuration option, default to error)

Stream parsing compatibility:

  • Format designed to support stream parsing
  • Hash directive (#!hash) question relates to streaming support

Error handling:

  • Clear error types needed for different parse failure modes
  • Distinguish between format errors, data errors, and I/O errors