Compare commits

..

No commits in common. "3da4860ad27ba99e119b0a1bea0217ab62077632" and "fcce8e48d4c8a654e58639b9b2c864ff148f77de" have entirely different histories.

3 changed files with 113 additions and 427 deletions

1
.gitignore vendored
View file

@ -1,3 +1,2 @@
.zig-cache/ .zig-cache/
zig-out/ zig-out/
docs/

111
README.md
View file

@ -4,12 +4,10 @@ SRF is a minimal data format designed for L2 caches and simple structured storag
**Features:** **Features:**
- No escaping required - use length-prefixed strings for complex data - No escaping required - use length-prefixed strings for complex data
- Single-pass streaming parser with minimal memory allocation - Single-pass parsing with minimal memory allocation
- Basic type system (string, num, bool, null, binary) with explicit type hints - Basic type system (string, num, bool, null, binary) with explicit type hints
- Compact format for machine generation, long format for human editing - Compact format for machine generation, long format for human editing
- Built-in corruption detection with optional EOF markers - Built-in corruption detection with optional EOF markers
- Iterator-based API for zero-copy, low-allocation streaming
- Comptime type coercion directly from the iterator (no intermediate collections)
**When to use SRF:** **When to use SRF:**
- L2 caches that need occasional human inspection - L2 caches that need occasional human inspection
@ -22,70 +20,7 @@ SRF is a minimal data format designed for L2 caches and simple structured storag
- Schema validation requirements - Schema validation requirements
- Arrays or object hierarchies (arrays can be managed in the data itself, however) - Arrays or object hierarchies (arrays can be managed in the data itself, however)
## Parsing API Long format:
SRF provides two parsing APIs. The **iterator API is preferred** for most use cases
as it avoids collecting all records and fields into memory at once.
### Iterator (preferred)
The `iterator` function returns a `RecordIterator` that streams records lazily.
Each call to `RecordIterator.next` yields a `FieldIterator` for the next record,
and each call to `FieldIterator.next` yields individual `Field` values. No
intermediate slices or ArrayLists are allocated -- fields are yielded one at a
time directly from the parser state.
For type coercion, `FieldIterator.to(T)` consumes the remaining fields in the
current record and maps them into a Zig struct or tagged union at comptime,
with zero additional allocations beyond what field parsing itself requires. This
can further be minimized with the parsing option `.alloc_strings = false`.
```zig
const srf = @import("srf");
const Data = struct {
name: []const u8,
age: u8,
active: bool = false,
};
var reader = std.Io.Reader.fixed(raw_data);
var ri = try srf.iterator(&reader, allocator, .{});
defer ri.deinit();
while (try ri.next()) |fi| {
const record = try fi.to(Data);
// process record...
}
```
### Batch parse
The `parse` function collects all records into memory at once, returning a
`Parsed` struct with a `records: []Record` slice. This is built on top of
the iterator internally. It is convenient when you need random access to all
records, but costs more memory since every field is collected into ArrayLists
before being converted to owned slices.
```zig
const srf = @import("srf");
var reader = std.Io.Reader.fixed(raw_data);
const parsed = try srf.parse(&reader, allocator, .{});
defer parsed.deinit();
for (parsed.records) |record| {
const data = try record.to(Data);
// process data...
}
```
## Data Formats
### Long format
Long format uses newlines to delimit fields and blank lines to separate records.
It is human-friendly and suitable for hand-edited configuration files.
``` ```
#!srfv1 # mandatory comment with format and version. Parser instructions start with #! #!srfv1 # mandatory comment with format and version. Parser instructions start with #!
@ -111,11 +46,7 @@ data with newlines must have a length::single line
#!eof # eof marker, useful to make sure your file wasn't cut in half. Only considered if requireeof set at top #!eof # eof marker, useful to make sure your file wasn't cut in half. Only considered if requireeof set at top
``` ```
### Compact format compact format:
Compact format uses commas to delimit fields and newlines to separate records.
It is designed for machine generation where space efficiency matters.
``` ```
#!srfv1 # mandatory comment with format and version. Parser instructions start with #! #!srfv1 # mandatory comment with format and version. Parser instructions start with #!
key::string value must have a length between colons or end with a comma,this is a number:num:5 ,null value:null:,array::array's don't exist. Use json or toml or something,data with newlines must have a length:7:foo key::string value must have a length between colons or end with a comma,this is a number:num:5 ,null value:null:,array::array's don't exist. Use json or toml or something,data with newlines must have a length:7:foo
@ -123,38 +54,6 @@ bar,boolean value:bool:false
key::this is the second record key::this is the second record
``` ```
## Serialization
SRF supports serializing Zig structs, unions, and enums back to SRF format.
Use `Record.from` to create a record from a typed value, or `fmtFrom` to
format a slice of values directly to a writer.
```zig
const srf = @import("srf");
const all_data: []const Data = &.{
.{ .name = "alice", .age = 30, .active = true },
.{ .name = "bob", .age = 25 },
};
var buf: [4096]u8 = undefined;
const formatted = try std.fmt.bufPrint(&buf, "{f}", .{
srf.fmtFrom(Data, allocator, all_data, .{ .long_format = true }),
});
```
## Type System
Fields follow the format `key:type_hint:value`:
| Type | Hint | Example |
|------------------------|-----------------------|-------------------------|
| String | *(empty)* or `string` | `name::alice` |
| Number (internally f64)| `num` | `age:num:30` |
| Boolean | `bool` | `active:bool:true` |
| Null | `null` | `missing:null:` |
| Binary | `binary` | `data:binary:base64...` |
| Length-prefixed string | *(byte count)* | `bio:12:hello\nworld!` |
## Implementation Concerns ## Implementation Concerns
**Parser robustness:** **Parser robustness:**
@ -188,5 +87,5 @@ Fields follow the format `key:type_hint:value`:
## AI Use ## AI Use
AI was used in this project for comments, parts of the README, benchmarking code, AI was used in this project for comments, parts of the README, and unit test
build.zig and unit test generation. All other code is human generated. generation. All other code is human generated.

View file

@ -67,43 +67,25 @@ const ValueWithMetaData = struct {
error_parsing: bool = false, error_parsing: bool = false,
reader_advanced: bool = false, reader_advanced: bool = false,
}; };
/// A parsed SRF value. Each field in a record has a key and an optional `Value`.
pub const Value = union(enum) { pub const Value = union(enum) {
/// A numeric value, parsed from the `num` type hint.
number: f64, number: f64,
/// Raw bytes decoded from base64, parsed from the `binary` type hint. /// Bytes are converted to/from base64, string is not
bytes: []const u8, bytes: []const u8,
/// A string value, either delimiter-terminated or length-prefixed. /// String is not touched in any way
/// Not transformed during parsing (no escaping/unescaping), but will be
/// allocated if .alloc_strings = true is passed during parsing, or if
/// a multi-line string is found in the data
string: []const u8, string: []const u8,
/// A boolean value, parsed from the `bool` type hint (`true` or `false`).
boolean: bool, boolean: bool,
/// parses a single srf value, without the key. The the whole field is: // pub fn format(self: Value, writer: *std.Io.Writer) std.Io.Writer.Error!void {
/// // switch (self) {
/// SRF Field: 'foo:3:bar' // .number => try writer.print("num: {d}", .{self.number}),
/// // .bytes => try writer.print("bytes: {x}", .{self.bytes}),
/// The value we expect to be sent to this function is: // .string => try writer.print("string: {s}", .{self.string}),
/// // .boolean => try writer.print("boolean: {}", .{self.boolean}),
/// SRF Value: '3:bar' // }
/// // }
/// The value is allowed to have extra data...for instance, in compact format
/// the value above can be represented by:
///
/// SRF Value: '3:bar,next_field::foobar'
///
/// and the next field will be ignored
///
/// This function may need to advance the reader in the case of multi-line
/// strings. It may also allocate data in the case of base64 (binary) values
/// as well as multi-line strings. Metadata is returned to assist in tracking
///
/// This function is intended to be used by the SRF parser
pub fn parse(allocator: std.mem.Allocator, str: []const u8, state: *RecordIterator.State, delimiter: u8) ParseError!ValueWithMetaData { pub fn parse(allocator: std.mem.Allocator, str: []const u8, state: *RecordIterator.State, delimiter: u8) ParseError!ValueWithMetaData {
const type_val_sep_raw = std.mem.indexOfScalar(u8, str, ':'); const type_val_sep_raw = std.mem.indexOfScalar(u8, str, ':');
if (type_val_sep_raw == null) { if (type_val_sep_raw == null) {
@ -262,15 +244,22 @@ pub const Value = union(enum) {
} }
}; };
/// A single key-value pair within a record. The key is always a string. // A field has a key and a value, but the value may be null
/// The value may be `null` (from the `null` type hint) or one of the
/// `Value` variants. Yielded by `RecordIterator.FieldIterator.next`.
pub const Field = struct { pub const Field = struct {
key: []const u8, key: []const u8,
value: ?Value, value: ?Value,
}; };
fn coerce(name: []const u8, comptime T: type, val: ?Value) !T { fn coerce(name: []const u8, comptime T: type, val: ?Value) !T {
// Here's the deduplicated set of field types that coerce needs to handle:
// Direct from SRF values:
// Need parsing from string:
// - Date, ?Date -- Date.parse(string)
//
// Won't work with Record.to(T) generically:
// - []const OptionContract -- nested sub-records (OptionsChain has calls/puts arrays)
// - ?[]const Holding, ?[]const SectorWeight -- nested sub-records in EtfProfile
//
const ti = @typeInfo(T); const ti = @typeInfo(T);
if (val == null and ti != .optional) if (val == null and ti != .optional)
return error.NullValueCannotBeAssignedToNonNullField; return error.NullValueCannotBeAssignedToNonNullField;
@ -318,31 +307,23 @@ fn coerce(name: []const u8, comptime T: type, val: ?Value) !T {
return null; return null;
} }
/// A record is an ordered list of `Field` values, with no uniqueness constraints // A record has a list of fields, with no assumptions regarding duplication,
/// on keys. This allows flexible use cases such as encoding arrays by repeating // etc. This is for parsing speed, but also for more flexibility in terms of
/// the same key. In long form, this could look like: // use cases. One can make a defacto array out of this structure by having
/// // something like:
/// ```txt //
/// arr:string:foo // arr:string:foo
/// arr:string:bar // arr:string:bar
/// ``` //
/// // and when you coerce to zig struct have an array .arr that gets populated
/// Records are returned by the batch `parse` function. For streaming, prefer // with strings "foo" and "bar".
/// `iterator` which yields fields one at a time via `RecordIterator.FieldIterator`
/// without collecting them into a slice.
pub const Record = struct { pub const Record = struct {
fields: []const Field, fields: []const Field,
/// Returns a `RecordFormatter` suitable for use with `std.fmt.bufPrint`
/// or any `std.Io.Writer`. Use `FormatOptions` to control compact vs
/// long output format.
pub fn fmt(value: Record, options: FormatOptions) RecordFormatter { pub fn fmt(value: Record, options: FormatOptions) RecordFormatter {
return .{ .value = value, .options = options }; return .{ .value = value, .options = options };
} }
/// Looks up the first `Field` whose key matches `field_name`, or returns
/// `null` if no such field exists. Only the first occurrence is returned;
/// duplicate keys are not considered.
pub fn firstFieldByName(self: Record, field_name: []const u8) ?Field { pub fn firstFieldByName(self: Record, field_name: []const u8) ?Field {
for (self.fields) |f| for (self.fields) |f|
if (std.mem.eql(u8, f.key, field_name)) return f; if (std.mem.eql(u8, f.key, field_name)) return f;
@ -520,20 +501,7 @@ pub const Record = struct {
return OwnedRecord(T).init(allocator, val); return OwnedRecord(T).init(allocator, val);
} }
/// Coerce a `Record` to a Zig struct or tagged union. For each field in `T`, /// Coerce Record to a type. Does not handle fields with arrays
/// the first matching `Field` by name is coerced to the target type. Fields
/// with default values in `T` that are not present in the data use their
/// defaults. Missing fields without defaults return an error. Note that
/// by this logic, multiple fields with the same name will have all but the
/// first value silently ignored.
///
/// For tagged unions, the active variant is determined by a field named
/// `"active_tag"` (or the value of `T.srf_tag_field` if declared). The
/// remaining fields are coerced into the payload struct of that variant.
///
/// For streaming data without collecting fields first, prefer
/// `RecordIterator.FieldIterator.to` which avoids the intermediate
/// `[]Field` allocation entirely.
pub fn to(self: Record, comptime T: type) !T { pub fn to(self: Record, comptime T: type) !T {
const ti = @typeInfo(T); const ti = @typeInfo(T);
@ -579,39 +547,26 @@ pub const Record = struct {
} }
return error.CoercionNotPossible; return error.CoercionNotPossible;
} }
test to {
// Example: coerce a batch-parsed Record into a Zig struct.
const Data = struct {
city: []const u8,
pop: u8,
};
const data =
\\#!srfv1
\\city::springfield,pop:num:30
;
const allocator = std.testing.allocator;
var reader = std.Io.Reader.fixed(data);
const parsed = try parse(&reader, allocator, .{});
defer parsed.deinit();
const result = try parsed.records[0].to(Data);
try std.testing.expectEqualStrings("springfield", result.city);
try std.testing.expectEqual(@as(u8, 30), result.pop);
}
}; };
/// A streaming record iterator for parsing SRF data. This is the preferred /// The Parsed struct is equivalent to Parsed(T) in std.json. Since most are
/// parsing API because it avoids collecting all records and fields into memory /// familiar with std.json, it differs in the following ways:
/// at once. Created by calling `iterator`.
/// ///
/// Each call to `next` yields a `FieldIterator` for one record. Fields within /// * There is a records field instead of a value field. In json, one type of
/// that record are consumed lazily via `FieldIterator.next` or coerced directly /// value is an array. SRF does not have an array data type, but the set of
/// into a Zig type via `FieldIterator.to`. All allocations go through an /// records is an array. json as a format is structred as a single object at
/// internal arena; call `deinit` to release everything when done. /// the outermost
/// ///
/// If `RecordIterator.next` is called before the previous `FieldIterator` has /// * This is not generic. In SRF, it is a separate function to bind the list
/// been fully consumed, the remaining fields are automatically drained to keep /// of records to a specific data type. This will add some (hopefully minimal)
/// the parser state consistent. /// overhead, but also avoid conflating parsing from the coercion from general
/// type to specifics, and avoids answering questions like "what if I have
/// 15 values for the same key" until you're actually dealing with that problem
/// (see std.json.ParseOptions duplicate_field_behavior and ignore_unknown_fields)
///
/// When implemented, there will include a pub fn bind(self: Parsed, comptime T: type, options, BindOptions) BindError![]T
/// function. The options will include things related to duplicate handling and
/// missing fields
pub const RecordIterator = struct { pub const RecordIterator = struct {
arena: *std.heap.ArenaAllocator, arena: *std.heap.ArenaAllocator,
/// optional expiry time for the data. Useful for caching /// optional expiry time for the data. Useful for caching
@ -656,19 +611,9 @@ pub const RecordIterator = struct {
} }
}; };
/// Advances to the next record in the stream, returning a `FieldIterator`
/// for accessing its fields. Returns `null` when all records have been
/// consumed.
///
/// If the previous `FieldIterator` was not fully drained, its remaining
/// fields are consumed automatically to keep the reader positioned
/// correctly. It is safe (but unnecessary) to fully consume the
/// `FieldIterator` before calling `next` again.
///
/// Note that all state is stored in a shared area accessible to both
/// the `RecordIterator` and the `FieldIterator`, so there is no need to
/// store the return value as a variable
pub fn next(self: RecordIterator) !?FieldIterator { pub fn next(self: RecordIterator) !?FieldIterator {
// TODO: we need to capture the fieldIterator here and make sure it's run
// to the ground to keep our state intact
const state = self.state; const state = self.state;
if (state.field_iterator) |f| { if (state.field_iterator) |f| {
// We need to finish the fields on the previous record // We need to finish the fields on the previous record
@ -721,24 +666,18 @@ pub const RecordIterator = struct {
return state.field_iterator.?; return state.field_iterator.?;
} }
/// Iterates over the fields within a single record. Yielded by
/// `RecordIterator.next`. Each call to `next` returns the next `Field`
/// in the record, or `null` when the record boundary is reached.
///
/// For direct type coercion without manually iterating fields, use `to`.
pub const FieldIterator = struct { pub const FieldIterator = struct {
state: *State, state: *State,
arena: *std.heap.ArenaAllocator, arena: *std.heap.ArenaAllocator,
/// Returns the next `Field` in the current record, or `null` when
/// the record boundary has been reached. After `null` is returned,
/// subsequent calls continue to return `null`.
pub fn next(self: FieldIterator) !?Field { pub fn next(self: FieldIterator) !?Field {
const state = self.state; const state = self.state;
const aa = self.arena.allocator(); const aa = self.arena.allocator();
// Main parsing. We already have the first line of data, which could // Main parsing. We already have the first line of data, which could
// be a record (compact format) or a key/value pair (long format) // be a record (compact format) or a key/value pair (long format)
// log.debug("", .{});
log.debug("current line:{?s}", .{state.current_line});
if (state.current_line == null) return null; if (state.current_line == null) return null;
if (state.end_of_record_reached) return null; if (state.end_of_record_reached) return null;
// non-blank line, but we could have an eof marker // non-blank line, but we could have an eof marker
@ -832,21 +771,7 @@ pub const RecordIterator = struct {
return field; return field;
} }
/// Consumes remaining fields in this record and coerces them into a /// Coerce Record to a type. Does not handle fields with arrays
/// Zig struct or tagged union `T`. This is the streaming equivalent of
/// `Record.to` -- it performs the same field-name matching and default
/// value logic, but reads directly from the parser without building an
/// intermediate `[]Field` slice.
///
/// For structs, fields are matched by name. Only the first occurrence
/// of each field name is used; duplicates are ignored. Fields in `T`
/// that have default values and are not present in the data use those
/// defaults. Missing fields without defaults return an error.
///
/// For tagged unions, the active tag field must appear first in the
/// stream (unlike `Record.to` which can do random access). The tag
/// field name defaults to `"active_tag"` or `T.srf_tag_field` if
/// declared.
pub fn to(self: FieldIterator, comptime T: type) !T { pub fn to(self: FieldIterator, comptime T: type) !T {
const ti = @typeInfo(T); const ti = @typeInfo(T);
@ -922,47 +847,13 @@ pub const RecordIterator = struct {
} }
return error.CoercionNotPossible; return error.CoercionNotPossible;
} }
test to {
// Example: coerce fields directly into a Zig struct from the iterator,
// without collecting into an intermediate Record. This is the most
// allocation-efficient path for typed deserialization.
const Data = struct {
name: []const u8,
score: u8,
active: bool = true,
}; };
const data =
\\#!srfv1
\\name::alice,score:num:99
;
const allocator = std.testing.allocator;
var reader = std.Io.Reader.fixed(data);
var ri = try iterator(&reader, allocator, .{});
defer ri.deinit();
const result = try (try ri.next()).?.to(Data);
try std.testing.expectEqualStrings("alice", result.name);
try std.testing.expectEqual(@as(u8, 99), result.score);
// `active` was not in the data, so the default value is used
try std.testing.expect(result.active);
}
};
/// Releases all memory owned by this iterator. This frees the internal
/// arena (and all parsed data allocated from it), then frees the arena
/// struct itself. After calling `deinit`, any slices or string pointers
/// obtained from `FieldIterator.next` or `FieldIterator.to` are invalid.
pub fn deinit(self: RecordIterator) void { pub fn deinit(self: RecordIterator) void {
const child_allocator = self.arena.child_allocator; const child_allocator = self.arena.child_allocator;
self.arena.deinit(); self.arena.deinit();
child_allocator.destroy(self.arena); child_allocator.destroy(self.arena);
} }
/// Returns `true` if the data has not expired based on the `#!expires`
/// directive. If no expiry was specified, the data is always considered
/// fresh. Callers should check this after parsing to decide whether to
/// use or refresh cached data. Note that data will be returned by parse/
/// iterator regardless of freshness. This enables callers to use cached
/// data temporarily while refreshing it
pub fn isFresh(self: RecordIterator) bool { pub fn isFresh(self: RecordIterator) bool {
if (self.expires) |exp| if (self.expires) |exp|
return std.time.timestamp() < exp; return std.time.timestamp() < exp;
@ -970,25 +861,8 @@ pub const RecordIterator = struct {
// no expiry: always fresh, never frozen // no expiry: always fresh, never frozen
return true; return true;
} }
test isFresh {
// Example: check expiry on parsed data. Data without an #!expires
// directive is always considered fresh.
const data =
\\#!srfv1
\\key::value
;
const allocator = std.testing.allocator;
var reader = std.Io.Reader.fixed(data);
var ri = try iterator(&reader, allocator, .{});
defer ri.deinit();
// No expiry set, so always fresh
try std.testing.expect(ri.isFresh());
}
}; };
/// Options controlling SRF parsing behavior. Passed to both `iterator` and
/// `parse`.
pub const ParseOptions = struct { pub const ParseOptions = struct {
diagnostics: ?*Diagnostics = null, diagnostics: ?*Diagnostics = null,
@ -1029,12 +903,7 @@ const Directive = union(enum) {
return null; return null;
} }
}; };
/// Options controlling SRF output formatting. Used by `fmt`, `fmtFrom`,
/// `Record.fmt`, and related formatters.
pub const FormatOptions = struct { pub const FormatOptions = struct {
/// When `true`, fields are separated by newlines and records by blank
/// lines (`#!long` format). When `false` (default), fields are
/// comma-separated and records are newline-separated (compact format).
long_format: bool = false, long_format: bool = false,
/// Will emit the eof directive as well as requireeof /// Will emit the eof directive as well as requireeof
@ -1049,19 +918,14 @@ pub const FormatOptions = struct {
emit_directives: bool = true, emit_directives: bool = true,
}; };
/// Returns a `Formatter` for writing pre-built `Record` values to a writer. /// Returns a formatter that formats the given value
/// Suitable for use with `std.fmt.bufPrint` or any `std.Io.Writer` via the
/// `{f}` format specifier.
pub fn fmt(value: []const Record, options: FormatOptions) Formatter { pub fn fmt(value: []const Record, options: FormatOptions) Formatter {
return .{ .value = value, .options = options }; return .{ .value = value, .options = options };
} }
/// Returns a formatter for writing typed Zig values directly to SRF format. /// Returns a formatter that formats the given value. This will take a concrete
/// Each value is converted to a `Record` via `Record.from` and written to /// type, convert it to the SRF record format automatically (using srfFormat if
/// the output. Custom serialization is supported via the `srfFormat` method /// found), and output to the writer. It is recommended to use a FixedBufferAllocator
/// convention on struct/union fields. /// for the allocator, which is only used for custom srfFormat functions (I think - what about enum tag names?)
///
/// The `allocator` is used only for fields that require custom formatting
/// (via `srfFormat`). A `FixedBufferAllocator` is recommended for this purpose.
pub fn fmtFrom(comptime T: type, allocator: std.mem.Allocator, value: []const T, options: FormatOptions) FromFormatter(T) { pub fn fmtFrom(comptime T: type, allocator: std.mem.Allocator, value: []const T, options: FormatOptions) FromFormatter(T) {
return .{ .value = value, .options = options, .allocator = allocator }; return .{ .value = value, .options = options, .allocator = allocator };
} }
@ -1169,20 +1033,11 @@ pub const RecordFormatter = struct {
} }
}; };
/// The result of a batch `parse` call. Contains all records collected into a
/// single slice. All data is owned by the internal arena; call `deinit` to
/// release everything.
///
/// For streaming without collecting all records, prefer `iterator` which
/// returns a `RecordIterator` instead.
pub const Parsed = struct { pub const Parsed = struct {
records: []Record, records: []Record,
arena: *std.heap.ArenaAllocator, arena: *std.heap.ArenaAllocator,
expires: ?i64, expires: ?i64,
/// Releases all memory owned by this `Parsed` result, including all
/// record and field data. After calling `deinit`, any slices or string
/// pointers obtained from `records` are invalid.
pub fn deinit(self: Parsed) void { pub fn deinit(self: Parsed) void {
const ca = self.arena.child_allocator; const ca = self.arena.child_allocator;
self.arena.deinit(); self.arena.deinit();
@ -1190,15 +1045,7 @@ pub const Parsed = struct {
} }
}; };
/// Parses all records from the reader into memory, returning a `Parsed` struct /// parse function
/// with a `records` slice. This is a convenience wrapper around `iterator` that
/// collects all fields and records into arena-allocated slices.
///
/// For most use cases, prefer `iterator` instead -- it streams records lazily
/// and avoids the cost of collecting all fields into intermediate `ArrayList`s.
///
/// All returned data is owned by the `Parsed` arena. Call `Parsed.deinit` to
/// free everything at once.
pub fn parse(reader: *std.Io.Reader, allocator: std.mem.Allocator, options: ParseOptions) ParseError!Parsed { pub fn parse(reader: *std.Io.Reader, allocator: std.mem.Allocator, options: ParseOptions) ParseError!Parsed {
var records = std.ArrayList(Record).empty; var records = std.ArrayList(Record).empty;
var it = try iterator(reader, allocator, options); var it = try iterator(reader, allocator, options);
@ -1226,21 +1073,7 @@ pub fn parse(reader: *std.Io.Reader, allocator: std.mem.Allocator, options: Pars
}; };
} }
/// Creates a streaming `RecordIterator` for the given reader. This is the /// Gets an iterator to stream through the data
/// preferred entry point for parsing SRF data, as it yields records and
/// fields lazily without collecting them into slices.
///
/// The returned iterator owns an arena allocator that holds all parsed data
/// (string values, keys, etc.). Call `RecordIterator.deinit` to free
/// everything when done. Parsed field data remains valid until `deinit` is
/// called.
///
/// The iterator handles SRF header directives (`#!srfv1`, `#!long`,
/// `#!compact`, `#!requireeof`, `#!expires`) automatically during
/// construction. Notably this means you can check isFresh() immediately.
///
/// Also note that as state is allocated and stored within the recorditerator,
/// callers can assign the return value to a constant
pub fn iterator(reader: *std.Io.Reader, allocator: std.mem.Allocator, options: ParseOptions) ParseError!RecordIterator { pub fn iterator(reader: *std.Io.Reader, allocator: std.mem.Allocator, options: ParseOptions) ParseError!RecordIterator {
// The arena and state are heap-allocated because RecordIterator is returned // The arena and state are heap-allocated because RecordIterator is returned
@ -1316,28 +1149,6 @@ inline fn parseError(allocator: std.mem.Allocator, message: []const u8, state: *
return ParseError.ParseFailed; return ParseError.ParseFailed;
} }
} }
// Test-only types extracted to module level so that their pub methods
// (required by std.meta.hasMethod) do not appear in generated documentation.
const TestRecType = enum {
foo,
bar,
};
const TestCustomType = struct {
const Self = @This();
pub fn srfParse(val: []const u8) !Self {
if (std.mem.eql(u8, "hi", val)) return .{};
return error.ValueNotEqualHi;
}
pub fn srfFormat(self: Self, allocator: std.mem.Allocator, comptime field_name: []const u8) !Value {
_ = self;
_ = field_name;
return .{
.string = try allocator.dupe(u8, "hi"),
};
}
};
test "long format single record, no eof" { test "long format single record, no eof" {
const data = const data =
\\#!srfv1 # mandatory comment with format and version. Parser instructions start with #! \\#!srfv1 # mandatory comment with format and version. Parser instructions start with #!
@ -1572,13 +1383,33 @@ test "format all the things" {
try std.testing.expectEqual(expected_expires, parsed_expires.expires.?); try std.testing.expectEqual(expected_expires, parsed_expires.expires.?);
} }
test "serialize/deserialize" { test "serialize/deserialize" {
const RecType = enum {
foo,
bar,
};
const Custom = struct {
const Self = @This();
pub fn srfParse(val: []const u8) !Self {
if (std.mem.eql(u8, "hi", val)) return .{};
return error.ValueNotEqualHi;
}
pub fn srfFormat(self: Self, allocator: std.mem.Allocator, comptime field_name: []const u8) !Value {
_ = self;
_ = field_name;
return .{
.string = try allocator.dupe(u8, "hi"),
};
}
};
const Data = struct { const Data = struct {
foo: []const u8, foo: []const u8,
bar: u8, bar: u8,
qux: ?TestRecType = .foo, qux: ?RecType = .foo,
b: bool = false, b: bool = false,
f: f32 = 4.2, f: f32 = 4.2,
custom: ?TestCustomType = null, custom: ?Custom = null,
}; };
const compact = const compact =
@ -1597,11 +1428,11 @@ test "serialize/deserialize" {
const rec1 = try parsed.records[0].to(Data); const rec1 = try parsed.records[0].to(Data);
try std.testing.expectEqualStrings("bar", rec1.foo); try std.testing.expectEqualStrings("bar", rec1.foo);
try std.testing.expectEqual(@as(u8, 42), rec1.bar); try std.testing.expectEqual(@as(u8, 42), rec1.bar);
try std.testing.expectEqual(@as(TestRecType, .foo), rec1.qux); try std.testing.expectEqual(@as(RecType, .foo), rec1.qux);
const rec4 = try parsed.records[3].to(Data); const rec4 = try parsed.records[3].to(Data);
try std.testing.expectEqualStrings("bar", rec4.foo); try std.testing.expectEqualStrings("bar", rec4.foo);
try std.testing.expectEqual(@as(u8, 42), rec4.bar); try std.testing.expectEqual(@as(u8, 42), rec4.bar);
try std.testing.expectEqual(@as(TestRecType, .bar), rec4.qux.?); try std.testing.expectEqual(@as(RecType, .bar), rec4.qux.?);
try std.testing.expectEqual(true, rec4.b); try std.testing.expectEqual(true, rec4.b);
try std.testing.expectEqual(@as(f32, 6.9), rec4.f); try std.testing.expectEqual(@as(f32, 6.9), rec4.f);
@ -1612,13 +1443,13 @@ test "serialize/deserialize" {
const rec1_it = try (try ri.next()).?.to(Data); const rec1_it = try (try ri.next()).?.to(Data);
try std.testing.expectEqualStrings("bar", rec1_it.foo); try std.testing.expectEqualStrings("bar", rec1_it.foo);
try std.testing.expectEqual(@as(u8, 42), rec1_it.bar); try std.testing.expectEqual(@as(u8, 42), rec1_it.bar);
try std.testing.expectEqual(@as(TestRecType, .foo), rec1_it.qux); try std.testing.expectEqual(@as(RecType, .foo), rec1_it.qux);
_ = try ri.next(); _ = try ri.next();
_ = try ri.next(); _ = try ri.next();
const rec4_it = try (try ri.next()).?.to(Data); const rec4_it = try (try ri.next()).?.to(Data);
try std.testing.expectEqualStrings("bar", rec4_it.foo); try std.testing.expectEqualStrings("bar", rec4_it.foo);
try std.testing.expectEqual(@as(u8, 42), rec4_it.bar); try std.testing.expectEqual(@as(u8, 42), rec4_it.bar);
try std.testing.expectEqual(@as(TestRecType, .bar), rec4_it.qux.?); try std.testing.expectEqual(@as(RecType, .bar), rec4_it.qux.?);
try std.testing.expectEqual(true, rec4_it.b); try std.testing.expectEqual(true, rec4_it.b);
try std.testing.expectEqual(@as(f32, 6.9), rec4_it.f); try std.testing.expectEqual(@as(f32, 6.9), rec4_it.f);
@ -1634,10 +1465,10 @@ test "serialize/deserialize" {
// const Data = struct { // const Data = struct {
// foo: []const u8, // foo: []const u8,
// bar: u8, // bar: u8,
// qux: ?TestRecType = .foo, // qux: ?RecType = .foo,
// b: bool = false, // b: bool = false,
// f: f32 = 4.2, // f: f32 = 4.2,
// custom: ?TestCustomType = null, // custom: ?Custom = null,
// }; // };
try std.testing.expectEqual(@as(usize, 6), record_4.fields.len); try std.testing.expectEqual(@as(usize, 6), record_4.fields.len);
@ -1784,10 +1615,11 @@ test "compact format length-prefixed string as last field" {
try std.testing.expectEqualStrings("desc", rec.fields[1].key); try std.testing.expectEqualStrings("desc", rec.fields[1].key);
try std.testing.expectEqualStrings("world", rec.fields[1].value.?.string); try std.testing.expectEqualStrings("world", rec.fields[1].value.?.string);
} }
test iterator { test "iterator" {
// Example: streaming through records and fields using the iterator API. // When a length-prefixed value is the last field on the line,
// This is the preferred parsing approach -- no intermediate slices are // rest_of_data.len == size exactly. The check on line 216 uses
// allocated for fields or records. // strict > instead of >=, falling through to the multi-line path
// where size - rest_of_data.len - 1 underflows.
const data = const data =
\\#!srfv1 \\#!srfv1
\\name::alice,desc:5:world \\name::alice,desc:5:world
@ -1797,65 +1629,21 @@ test iterator {
var ri = try iterator(&reader, allocator, .{}); var ri = try iterator(&reader, allocator, .{});
defer ri.deinit(); defer ri.deinit();
// Advance to the first (and only) record const nfi = try ri.next();
const fi = (try ri.next()).?; try std.testing.expect(nfi != null);
const fi = nfi.?;
// defer fi.deinit();
const field1 = try fi.next();
try std.testing.expect(field1 != null);
try std.testing.expectEqualStrings("name", field1.?.key);
try std.testing.expectEqualStrings("alice", field1.?.value.?.string);
const field2 = try fi.next();
try std.testing.expect(field2 != null);
try std.testing.expectEqualStrings("desc", field2.?.key);
try std.testing.expectEqualStrings("world", field2.?.value.?.string);
const field3 = try fi.next();
try std.testing.expect(field3 == null);
// Iterate fields within the record const next = try ri.next();
const field1 = (try fi.next()).?; try std.testing.expect(next == null);
try std.testing.expectEqualStrings("name", field1.key);
try std.testing.expectEqualStrings("alice", field1.value.?.string);
const field2 = (try fi.next()).?;
try std.testing.expectEqualStrings("desc", field2.key);
try std.testing.expectEqualStrings("world", field2.value.?.string);
// No more fields in this record
try std.testing.expect(try fi.next() == null);
// No more records
try std.testing.expect(try ri.next() == null);
}
test parse {
// Example: batch parsing collects all records and fields into slices.
// Prefer `iterator` for streaming; use `parse` when random access to
// all records is needed.
const data =
\\#!srfv1
\\#!long
\\name::alice
\\age:num:30
\\
\\name::bob
\\age:num:25
\\#!eof
;
const allocator = std.testing.allocator;
var reader = std.Io.Reader.fixed(data);
const parsed = try parse(&reader, allocator, .{});
defer parsed.deinit();
try std.testing.expectEqual(@as(usize, 2), parsed.records.len);
try std.testing.expectEqualStrings("alice", parsed.records[0].fields[0].value.?.string);
try std.testing.expectEqualStrings("bob", parsed.records[1].fields[0].value.?.string);
}
test fmtFrom {
// Example: serialize typed Zig values directly to SRF format.
const Data = struct {
name: []const u8,
age: u8,
};
const values: []const Data = &.{
.{ .name = "alice", .age = 30 },
.{ .name = "bob", .age = 25 },
};
var buf: [4096]u8 = undefined;
const result = try std.fmt.bufPrint(
&buf,
"{f}",
.{fmtFrom(Data, std.testing.allocator, values, .{})},
);
try std.testing.expectEqualStrings(
\\#!srfv1
\\name::alice,age:num:30
\\name::bob,age:num:25
\\
, result);
} }