i wanted to learn zig beyond the tutorials. reading other people’s code only gets you so far — the syntax sticks when you write it, and the idioms stick when you debug them. so i set out to write a streaming json parser: a small, well-defined piece of work with enough edge cases to force me into the language’s harder corners.
the constraints
the parser reads bytes from a buffer and emits tokens. no allocator parameter on public functions — the caller provides an allocator up front and the parser uses it internally. no runtime checks in release mode for things the caller could validate at construction time. the whole thing fits in a single file, under 500 lines.
the core loop looks like this:
const std = @import("std");
pub const Token = union(enum) {
null: void,
true: void,
false: void,
number: f64,
string: []const u8,
array_open: void,
array_close: void,
object_open: void,
object_close: void,
colon: void,
comma: void,
};
pub const Parser = struct {
bytes: []const u8,
pos: usize,
allocator: std.mem.Allocator,
pub fn next(self: *Parser) !?Token {
self.skipWhitespace();
if (self.pos >= self.bytes.len) return null;
return switch (self.bytes[self.pos]) {
'n' => self.readNull(),
't' => self.readTrue(),
'f' => self.readFalse(),
'"' => self.readString(),
'0'...'9', '-' => self.readNumber(),
'[' => self.advance(.array_open),
']' => self.advance(.array_close),
'{' => self.advance(.object_open),
'}' => self.advance(.object_close),
':' => self.advance(.colon),
',' => self.advance(.comma),
else => error.UnexpectedCharacter,
};
}
};
what surprised me
the comptime model is the feature i was most sceptical about and the one i now miss in every other language. i used it to generate lookup tables for escape sequences at compile time — zero runtime cost for something that would normally require a precomputed array or a lazy-init static.
the error handling model, !? and tagged unions, forced me to think about every failure path at the call site. there is no try-catch, no exception propagation. the type system makes the fallible paths explicit, and the compiler enforces that you handle them.
the tradeoffs
reading json as a stream of tokens instead of a tree means the caller manages nesting state. that is fine for a cli tool that processes records one at a time, but it means this parser is not a drop-in replacement for a dom-style parser. the api surfaces the tradeoff honestly: you get control over memory and latency, but you write the traversal logic yourself.
conclusion
zig delivered on the promise of a systems language that is simpler to read than c and more explicit than rust. the json parser is now a real tool i use in small cli programs. it is not fast, but it is correct, and the code still makes sense to me six months later.