Description
The CsvConverter currently only handles .csv files with comma delimiters. TSV (Tab-Separated Values) files are extremely common in data science, bioinformatics, spreadsheet exports, and LLM pipelines, but are not supported.
Expected behavior
MarkItDown should convert .tsv files to Markdown tables, just like it does for .csv files.
Proposed solution
Extend the existing CsvConverter to:
- Accept
.tsv files and text/tab-separated-values MIME type
- Auto-detect the delimiter using Python's built-in
csv.Sniffer
- Fall back to tab for
.tsv files, comma for .csv files
PR
#2021
Description
The
CsvConvertercurrently only handles.csvfiles with comma delimiters. TSV (Tab-Separated Values) files are extremely common in data science, bioinformatics, spreadsheet exports, and LLM pipelines, but are not supported.Expected behavior
MarkItDown should convert
.tsvfiles to Markdown tables, just like it does for.csvfiles.Proposed solution
Extend the existing
CsvConverterto:.tsvfiles andtext/tab-separated-valuesMIME typecsv.Sniffer.tsvfiles, comma for.csvfilesPR
#2021