v0.3.0 candidate: memory safety hardening + C ABI v4#116
Open
samtalki wants to merge 14 commits into
Open
Conversation
`gencost_row` reads NCOST from the file as an f64 truncated to usize, so a huge or non-finite value saturates near usize::MAX. The width requirement was then computed as `start + want` (with `want = 2*ncost` for piecewise costs), which overflows: an add-overflow panic under debug overflow-checks, and in release a wraparound that makes the `require` length check pass and then panics on the reversed `row[start..start + want]` slice range. A crafted MATPOWER `mpc.gencost` row (e.g. NCOST = 1e20) therefore panics on every build profile. Through the C ABI / Python / Julia the panic is caught at the FFI boundary and degraded to a generic "panic while parsing", but the pure Rust API and the CLI take an uncaught panic — a denial of service on untrusted input. It is not a memory-safety issue: the release wraparound lands on a bounds-checked slice, so it panics rather than reading out of bounds. Size the requirement with saturating arithmetic so an implausible NCOST is rejected by the existing length check as a loud `ShortRow` error, the parser's normal malformed-input signal, on every profile and through every binding. Found by malformed-input fuzzing of the parser surface. https://claude.ai/code/session_013KSDeKD9C3YsGaR67RDKhr
copy_to_buf clipped error/warning messages at a raw byte count, which could split a multi-byte UTF-8 codepoint and hand consumers an invalid UTF-8 string. Back the truncation point up to a character boundary so a clipped message is always valid UTF-8, and pin the behavior with a test. https://claude.ai/code/session_01KxR1fuH4L8XHHZXtNYgrG8
…to v0.2.1-candidate
…mat strings everywhere PIO_ABI_VERSION 3 -> 4. One verb per job, one meaning per word, and no format names in symbols, so the surface evolves additively from here: - pio_to_normalized -> pio_normalize (a value transform returns a handle; the to_ family re-encodes unchanged data, per the strtol/htons lineage) - pio_to_matpower / pio_to_json / pio_from_json cut: matpower and the new validated powerio-json snapshot flow through pio_to_format/pio_parse_str as format strings (TargetFormat::PowerioJson; write_as is now fallible because JSON has no Inf/NaN and the snapshot must round-trip exactly) - pio_export_arrow -> pio_to_arrow; the Arrow schema is the evolution valve - pio_write_pypsa_csv_folder -> pio_write_dir(net, to, dir); pio_read_gridfm -> pio_read_dir(dir, from, scenario); pio_gridfm_scenario_ids -> pio_scenario_ids(dir, from, ...): directory formats are strings too - pio_convert_str joins pio_convert_file (both now (input, from, to, ...)) - every array extractor takes a cap and returns the total count; NULL out is the count query, so a caller buffer can never silently overflow (pio_n_reference_buses folds into pio_ref_bus_indices) - pio_parse_warnings -> pio_warnings: warnings attach to the handle from any constructor (pio_read_dir drops its warnbuf), and the return is the byte length needed, so callers can size exactly - pio_reference_bus -> pio_ref_bus_index (i64): it returns a dense index while pio_branches from/to carry bus ids; the unit is now in the name - pio_n_components -> pio_n_islands; pio_nodal_demand/pio_nodal_shunt -> pio_bus_demand/pio_bus_shunt: bus/node/branch vocabulary fixed in the header preamble (bus = connection point, node = conductor point at a bus, reserved for the multiconductor surface; branch = any two-terminal series element) The header preamble now states the grammar, the conventions (errbuf per libpcap/curl, cap/count per snprintf, UTF-8 boundary truncation, handle immutability), and the freeze-and-evolve policy. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
matpower, psse, and powerio-json through parse_str; the PowerWorld .pwb and .pwd binary decoders on raw bytes. The invariant is the parser trust model: Ok or a structured Err on any input, never a panic. Excluded from the workspace (needs nightly + cargo-fuzz); see fuzz/README.md. The gencost NCOST overflow was found by exactly this harness shape. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… model and panic notes The capi README gains the ABI v4 history row, the cap/count contract, the parser trust model (malformed input errors, never UB; memory scales with input and is uncapped), and the panic strategy note (guards need the default unwind; an abort build aborts cleanly). smoke.c now exercises the v4 surface: count queries, powerio-json snapshot round trip, convert_str, write_dir, pio_warnings sizing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ble write_as Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- dataset format dispatch moves from powerio-capi into powerio-matrix's io hub (read_dataset_dir / dataset_scenario_ids), next to where the gridfm reader lives; the single-variant DatasetFormat enum at the C boundary is gone and the C ABI is a thin wrapper, like every other format dispatch - the three identical catch_unwind tails of pio_to_format / pio_convert_file / pio_convert_str fold into finish_conversion, mirroring finish_network - write_as: the PowerioJson early return becomes a match arm, dropping the unreachable!() (the snapshot still skips the warning passes deliberately: warn_normalized_tap would be false for a format that preserves the labels) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…C block comment early Found by compiling smoke.c against the regenerated header: the pio_read_dir doc's directory-glob example ended the comment mid-sentence and broke every build including powerio.h. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ecoder bench The validate job's Julia shim ccalled pio_nodal_demand/pio_nodal_shunt and the un-capped extractor signatures — the one consumer the docs sweep missed (it greps .jl now). parse.rs gains parse_pwd_activsg200: the one reader whose hot loop runs per byte, regression coverage for the total byte accessors. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- to_json refuses non-finite values, naming the field: serde_json would degrade Inf/NaN to null, which the snapshot's own reader then rejects; the documented write-side error now actually fires - sniff_json learns the snapshot's top level buses key, so a .json snapshot parses without a format hint; powerio-cli gains the powerio-json arm (aliases powerio/json) - pio_network_free / pio_string_free run under the panic guard the boundary contract documents - capi README calls out the silent pio_convert_file argument reorder, the one v4 break invisible at link time - new powerworld_aux fuzz target: the .aux tokenizer was the one hand-written parse_str reader no harness fed - README examples compile again (.network + ?); languages.md drops a stale "(PR open)" label Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The strict to_json guard broke the bindings' materialization path: readers legitimately produce Inf limits (the pandapower fixture carries an infinite pmax) and Python/Julia build every Network view through the snapshot, so refusing the write refused the parse. Keep the write total, surface the degradation as a write_as fidelity warning naming the field, and pin the no-read-back consequence in the test (the validating reader still rejects the null). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This was referenced Jun 12, 2026
Merged
# Conflicts: # powerio-capi/include/powerio.h # powerio-capi/src/lib.rs
Member
Author
|
Collaborator
|
Looks good to me! Thanks for explaining what this is :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #112, which the head branch rename closed; its comments hold the lockstep Julia PR pointer (eigenergy/PowerIO.jl#25) and the benchmark verification.
Merges two malformed-input fixes, generalizes both bug classes, and revises the C ABI to v4. The header preamble (
powerio-capi/include/powerio.h) is the normative statement of the v4 conventions.Memory safety
1e20) overflowed the row-width arithmetic and panicked on every build profile. The arithmetic now saturates and the row is rejected as a parse error. (claude/keen-feynman-vv3049)claude/amazing-edison-4bjitk).pwdreader: the byte-read helpers indexed the buffer directly and relied on per-call-site bounds checks. They now returnOption, so an out-of-range offset from a corrupt file rejects the record instead of panicking; the record scan also retains decoded coordinates rather than re-reading them. The differential oracle tests (decoded coordinates checked against same-vintage.auxfiles across the save corpus) pass unchanged.fuzz/, workspace-excluded; nightly + cargo-fuzz): matpower, psse, and powerio-json viaparse_str; the.pwb/.pwddecoders on raw bytes. Invariant: any input yieldsOkor a structuredErr, never a panic. All five targets pass seeded smoke runs..pwbcursor reads are bounds-checked; the psse/egret/pandapower numeric casts never feed indexing; every entry point already catches panics at the boundary.C ABI v4 (
PIO_ABI_VERSION3 → 4)v3 used three different verbs for serialization, named a handle-returning transform like the string serializers, and let extractors write past a miscounted buffer. All known consumers are this repo and PowerIO.jl, and
pio_abi_versionrejects mismatched libraries at load, so the break is cheap now. The conventions are designed so it is the last one.pio_to_matpower,pio_to_json,pio_from_jsonmatpowerand the newpowerio-jsonsnapshot are format strings intopio_to_format/pio_parse_strpio_to_normalizedpio_normalize— a value transform returning a handle;to_re-encodes unchanged datapio_export_arrowpio_to_arrowpio_write_pypsa_csv_folderpio_write_dir(net, to, dir, ...)pio_read_gridfm,pio_gridfm_scenario_idspio_read_dir(dir, from, scenario, ...),pio_scenario_ids(dir, from, ...)pio_parse_warningspio_warningspio_reference_bus(isize),pio_reference_buses,pio_n_reference_busespio_ref_bus_index(i64),pio_ref_bus_indices(net, out, cap)— a dense index, not a bus id, and named sopio_n_components,pio_nodal_demand,pio_nodal_shuntpio_n_islands,pio_bus_demand,pio_bus_shuntpio_convert_file(path, to, from)pio_convert_file(path, from, to); newpio_convert_str(text, from, to)No format names remain in the symbol table; adding a format leaves the ABI unchanged.
Conventions:
cap, so a miscounted buffer reads short instead of overflowing, and(NULL, 0)is a count query — thesnprintfpattern. v3 wrote exactlypio_n_*elements on trust.pio_warningsreturns the byte length of the joined text, so a buffer can be sized exactly; v3 returned a warning count, which cannot size a buffer. Warnings attach to the handle from any constructor; only functions returning no handle (pio_to_format,pio_convert_*,pio_write_dir) take awarnbuf.errbuf/errlen(the libpcap/curl idiom) — no library-allocated strings to free, no thread-local state.bus_demandandn_islands.powerio-jsonsnapshot, whose schemas evolve without touching a C signature.Supporting changes:
TargetFormat::PowerioJsonmakes the snapshot an ordinary format, reachable from the CLI and the converters;write_as/to_formatbecome fallible because the snapshot rejects non-finite values rather than writingnull(foreign JSON targets are unchanged);powerio::write_dirandpowerio_matrix::read_dataset_dir/dataset_scenario_idsare the directory-format dispatch points;examples/smoke.cexercises the full v4 surface and is compiled and run in CI.Verification
Workspace test suite, 26 capi unit tests, header parity, the compiled C smoke binary end to end, PowerIO.jl's 180 tests against this branch's library, the PowerModels/Exa oracle matrix, and the fuzz smoke runs. Benchmarks against a main baseline show no regressions; the two reworked readers measure flat (matpower) and 1.3% faster (
.pwd). Full table: benchmark comment. Continuous tracking: #115.Numbering and pairing
No version number changes in this diff; the branch keeps its working name. Recommended release number: 0.3.0 — pre-1.0 convention puts breaking changes in the minor, and the ABI handshake remains the actual compatibility gate. Merge order: this PR, then eigenergy/PowerIO.jl#25, with binaries cut from the same commit (tandem CI inactive, #64). Follow-ups: #113 (dist surface adopts these conventions), #114 (PowerDiff field renames), #115 (benchmark tracking).
🤖 Generated with Claude Code