Follow-ups from #118
One perf-related item surfaced during the code review for #118 that was deferred because it touches row-class internals scoped out of that PR. Filing here so it isn't lost.
Eager split-then-discard in the multiline continuation loop
In Csv/CsvReader.Engine.cs::Enumerate (and EnumerateAsync), the multiline continuation loop calls options.Splitter.Split(line, options) to check for unterminated quotes. The final rawSplit is computed against the fully-joined line but discarded. The yielded row's lazy RawSplitLine cache will re-Split the same string when the consumer first reads a field. That's one redundant Split per multiline row.
Pre-refactor ReadImpl did roughly the same redundant work (it constructed a new ReadLine per loop iteration and let RawSplitLine cache-warm via the loop condition), so this is not a strict regression — but it's a known optimization point.
Fix sketch. Add an internal PrimeRawSplit(IList<MemoryText>) method (or expose the rawSplitLine field as internal) on the row classes; have the engine call it after factory.Create(...) to seed the cache. This touches row-class internals that were intentionally out of scope for #118.
Resolved
The second deferred item — "only check the last field for unterminated quotes in multiline detection" — was applied in PR #121 commit 84e2c96 after Gemini independently flagged it during review. No follow-up needed for that part.
Related
Follow-ups from #118
One perf-related item surfaced during the code review for #118 that was deferred because it touches row-class internals scoped out of that PR. Filing here so it isn't lost.
Eager split-then-discard in the multiline continuation loop
In
Csv/CsvReader.Engine.cs::Enumerate(andEnumerateAsync), the multiline continuation loop callsoptions.Splitter.Split(line, options)to check for unterminated quotes. The finalrawSplitis computed against the fully-joined line but discarded. The yielded row's lazyRawSplitLinecache will re-Splitthe same string when the consumer first reads a field. That's one redundantSplitper multiline row.Pre-refactor
ReadImpldid roughly the same redundant work (it constructed a newReadLineper loop iteration and letRawSplitLinecache-warm via the loop condition), so this is not a strict regression — but it's a known optimization point.Fix sketch. Add an internal
PrimeRawSplit(IList<MemoryText>)method (or expose therawSplitLinefield asinternal) on the row classes; have the engine call it afterfactory.Create(...)to seed the cache. This touches row-class internals that were intentionally out of scope for #118.Resolved
The second deferred item — "only check the last field for unterminated quotes in multiline detection" — was applied in PR #121 commit 84e2c96 after Gemini independently flagged it during review. No follow-up needed for that part.
Related
MemorySliceLineSource.Concatpool-defeating issue (verbatim port ofConcatenateMemory).