Skip to content

Memory-based reader paths terminate enumeration on blank middle-of-stream lines #122

@stevehansen

Description

@stevehansen

Problem

CsvReader.ReadFromMemory(ReadOnlyMemory<char>) and CsvReader.ReadFromMemoryOptimized(ReadOnlyMemory<char>, ...) terminate enumeration on the first blank line they encounter in the middle of the source, dropping every record that comes after.

Repro:

var csv = "a,b,c\r\n1,2,3\r\n\r\n4,5,6\r\n";
foreach (var row in CsvReader.ReadFromMemoryOptimized(csv.AsMemory()))
    Console.WriteLine(row.Raw);
// Output: just "1,2,3"  -- the "4,5,6" record is silently dropped.

Compare with CsvReader.Read(TextReader) over the same data, which correctly skips the blank line (via the default CsvOptions.SkipRow predicate (row, idx) => row.Span.IsEmpty || row.Span[0] == '#') and yields both records.

Root cause

MemorySliceLineSource.TryReadLine (in Csv/CsvReader.Engine.cs) and MemoryReaderLineSource.TryReadLine both return false when the just-read line is empty. The engine treats false as end-of-stream, so a blank middle-of-stream line stops the loop.

// MemorySliceLineSource.TryReadLine
if (slice.IsEmpty)
{
    line = default;
    return false;  // <-- terminates enumeration, NOT EOF
}

The MemoryReaderLineSource case has the same shape:

line = csv.ReadLine(ref position);
return !line.IsEmpty;  // <-- returns false on blank line

The Read / ReadAsSpan / ReadAsync paths don't have this bug because TextReader.ReadLine() distinguishes blank line (returns "") from EOF (returns null); the engine sees the empty string as a real line, passes it to SkipRow, which skips it.

Pre-existing

This is not a regression introduced by #118 — it's a pre-existing behavior in ReadLineOptimized and the MemoryText.ReadLine(ref position) extension. The pre-refactor ReadFromMemoryOptimizedImpl did while ((line = ReadLineOptimized(...)).IsEmpty == false) (where the optimized reader returned Empty for both blank lines AND EOF). The pre-refactor ReadFromMemory did the same with csv.ReadLine(ref position). #118 preserved this verbatim per its scope ("unify the loops, preserve behavior") and did not attempt to fix it.

Gemini flagged this during the #118 review (#121).

Proposed fix

TryReadLine must distinguish "blank line in middle of stream" from "EOF". For both memory-based sources:

  1. Return true whenever a line was successfully sliced — even if it's zero-length.
  2. Return false only when position >= csv.Length BEFORE attempting to read.

For MemorySliceLineSource:

public bool TryReadLine(out MemoryText line, out string? lineString)
{
    lineString = null;
    if (position >= csv.Length)
    {
        line = default;
        return false;
    }

    var span = csv.Span.Slice(position);
    var newlineIndex = span.IndexOfAny('\n', '\r');

    if (newlineIndex == -1)
    {
        line = csv.Slice(position);
        position = csv.Length;
        return true;  // even if empty, this is the last line
    }

    line = csv.Slice(position, newlineIndex);
    position += newlineIndex;
    // ... consume the newline sequence ...
    return true;  // even for blank lines
}

For MemoryReaderLineSource: return true unconditionally after a successful csv.ReadLine(ref position) (the EOF check at the top of the method already gates entry).

After this change, the default CsvOptions.SkipRow predicate will correctly skip blank middle-of-stream lines, and the memory-based paths will match the TextReader-based paths' behavior.

Acceptance

  • A new test in Csv.Tests/EngineUnificationTests.cs parameterized over all 5 reader paths, asserting that:

    a,b,c
    1,2,3
                    <-- blank line
    4,5,6
    

    produces two records (1,2,3 and 4,5,6) on every path, not one.

  • A negative test asserting that if the user sets CsvOptions.SkipRow = (_, _) => false, the blank line is returned as an empty record (3 records total on the HeaderPresent path: header + 1,2,3 + empty + 4,5,6 minus the header).

Risk

Low. The fix changes the semantic of TryReadLine from "false on empty OR EOF" to "false only on EOF". The engine's existing SkipRow flow handles the empty-line case correctly downstream. The cross-path matrix test in EngineUnificationTests.cs would pin the new uniformity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions