Skip to content

fix(dataextractor): RFC 4180 quote fields containing delimiter, quote, or newline#3371

Open
ryanmelt-agent wants to merge 4 commits into
OpenC3:mainfrom
ryanmelt-agent:fix/dataextractor-rfc4180-quoting
Open

fix(dataextractor): RFC 4180 quote fields containing delimiter, quote, or newline#3371
ryanmelt-agent wants to merge 4 commits into
OpenC3:mainfrom
ryanmelt-agent:fix/dataextractor-rfc4180-quoting

Conversation

@ryanmelt-agent
Copy link
Copy Markdown

Summary

Fixes #858.

DataExtractor previously built each output row as row.join(this.delimiter) with no quoting. A string telemetry value that contained the active delimiter (typically ,) — or ", CR, LF — would corrupt the CSV output. Array values were pre-wrapped as "[…]" by hand, which only happened to be correct in CSV mode and produced a stray quote in tab-delimited mode.

This change introduces an RFC 4180 quoteField helper and applies it consistently at the header and row join points, so both CSV and TSV output stay parseable when values contain the delimiter, embedded quotes, or newlines.

Changes

  • New quoteField(value) method that returns the value as-is unless it contains the current delimiter, ", \r, or \n, in which case it wraps in double quotes and doubles any embedded double quotes (per RFC 4180). null/undefined become empty strings.
  • Apply quoteField to each header before join.
  • Apply quoteField to each row cell before join.
  • Drop the manual '"[' + arr + ']"' wrapping for array values. With the helper in place, arrays produce [a,b,c] raw, and the join-time helper quotes correctly for whatever delimiter is active.

Verification

  • pnpm lint --max-warnings 0 clean in openc3-cosmos-tool-dataextractor.
  • 17 Node-based assertions over the helper + a minimal RFC 4180 parser round-trip, covering: comma in string, embedded ", embedded \n, array brackets, plain int/string passthrough, null/undefined"", NaN/Infinity passthrough, and TSV mode (no quoting on ,, quoting on \t, array brackets pass through). All pass.
  • Cross-checked the existing Playwright assertions in playwright/tests/data-extractor.p.spec.ts: TARGET,PACKET,TEMP1,TEMP2, % TIME,TARGET,PACKET,Q1,Q2, INST HEALTH_STATUS TEMP1, and INST,LATEST,HTML all produce identical bytes (no commas/quotes in those header/value strings → passthrough).

Test plan

  • Reviewer runs pnpm lint in openc3-cosmos-init/plugins/packages/openc3-cosmos-tool-dataextractor.
  • Reviewer runs the existing playwright/tests/data-extractor.p.spec.ts suite against this branch and confirms no regression.
  • (Optional manual) Add a telemetry string value containing a comma, run DataExtractor in CSV mode, and confirm the value round-trips through a CSV parser (e.g. Excel, csv.DictReader).

…, or newline

DataExtractor previously joined values with the active delimiter without
quoting, so a string telemetry value that contained a comma (or quote,
CR, LF) corrupted CSV output. Add a quoteField helper applied at the
header and row join points; drop the manual "[..]" pre-quoting on
arrays so the same helper handles both CSV and tab-delimited modes.

Fixes OpenC3#858

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Numbers, booleans, etc. can never contain the delimiter, quote, CR, or
LF, so they don't need escaping. Return them untouched and let
Array.prototype.join stringify them; this avoids a String conversion
and four .includes() scans per numeric cell. Arrays are pre-stringified
at the call site, so they still take the normal string path.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@ryanmelt-agent
Copy link
Copy Markdown
Author

Added a perf follow-up in 51bd7a1: quoteField now returns non-string values (numbers, booleans, NaN, Infinity) untouched and lets Array.prototype.join stringify them at the join, instead of calling String(value) and four .includes() scans per numeric cell. Arrays are pre-stringified at the call site as "[a,b,c]" so they still take the normal string path.

Re-ran the same Node verification expanded to 22 cases (including mixed numeric/boolean/NaN/Infinity rows, sparse rows, TSV passthrough). All pass; lint clean.

Split quoteField into a primitive fast-path (number/boolean/bigint return
untouched) and a defensive object path (any other non-string value is
coerced via String() before the RFC 4180 scan). Strings continue to skip
the redundant String() call. Makes the helper safe against future
regressions where a non-pre-stringified object reaches the join, while
keeping numeric cells free of scans.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@ryanmelt-agent
Copy link
Copy Markdown
Author

Tiered the helper in f204b51 per follow-up review feedback:

  • Primitive fast-path: number, boolean, bigint short-circuit and return untouched — no String(), no scans. Array.prototype.join stringifies them at the join.
  • String path: unchanged scan + double-quote-double escape.
  • Defensive object/array path: anything else (object, array, custom toString) is coerced via String(value) and run through the same scan. Today the call site filters objects, so this is a safety net rather than a current-traffic correctness fix; it prevents future regressions where a bare array or object slips through.

23 Node assertions pass, including bare array -> quoted CSV string, object w/ custom toString containing comma -> quoted, plain object passthrough ("[object Object]"), BigInt, NaN/Infinity/-Infinity, and mixed-type rows. Lint clean.

…781)

Addresses Sonar javascript:S7781. The literal-string overload is clearer
than the regex form and avoids constructing a RegExp.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@ryanmelt-agent
Copy link
Copy Markdown
Author

Addressed Sonar javascript:S7781 ("Prefer String#replaceAll() over String#replace()") in f76489c:

-return '"' + s.replace(/"/g, '""') + '"'
+return '"' + s.replaceAll('"', '""') + '"'

Same semantics (escape every " by doubling it), no regex object allocation. Spot-checked the four embedded-quote cases that exercise the branch.

@sonarqubecloud
Copy link
Copy Markdown

@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.81%. Comparing base (0b61176) to head (f76489c).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3371      +/-   ##
==========================================
+ Coverage   78.09%   78.81%   +0.71%     
==========================================
  Files         480      687     +207     
  Lines       35532    57265   +21733     
  Branches      728      728              
==========================================
+ Hits        27750    45133   +17383     
- Misses       7704    12054    +4350     
  Partials       78       78              
Flag Coverage Δ
python 80.03% <ø> (-15.97%) ⬇️
ruby-api 81.67% <ø> (-0.68%) ⬇️
ruby-backend 82.26% <ø> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataExtractor Quote Strings Containing Commas

2 participants