Skip to content

Enhanced file reader script to allow conversion to best datatypes pos…#72

Draft
kaushik-sb wants to merge 1 commit into
blackrock:mainfrom
kaushik-sb:feature/enhancedFileReader
Draft

Enhanced file reader script to allow conversion to best datatypes pos…#72
kaushik-sb wants to merge 1 commit into
blackrock:mainfrom
kaushik-sb:feature/enhancedFileReader

Conversation

@kaushik-sb

@kaushik-sb kaushik-sb commented Jun 11, 2026

Copy link
Copy Markdown

[FEATURE] Enhance CSVFileReader to support dtype conversion with null values

Description

This PR enhances the CSV file reading functionality to handle datatype conversion more robustly when null values are present.
Previously, datasets containing null values could lead to inconsistent or suboptimal datatype inference. This change ensures that when convert_dtypes is enabled in the source configuration, the dataframe uses pandas' built-in convert_dtypes() for safer and more accurate type inference.

Changes Made

  • Added conditional dtype conversion in CSVFileReader:

    • If convert_dtypes is set to true in the sources of config yml, the dataframe now calls:

      result = result.convert_dtypes()
  • This allows:

    • Better handling of nullable integer and string types

    • Improved consistency when null values are present in input CSV files

Definition of Done

Before submitting this pull request, please ensure that the following criteria have been met:

  • All automated tests have passed successfully.
  • All manual tests have passed successfully.
  • Code has been reviewed by at least one other team member.
  • Code has been properly documented and commented as needed.
  • All new and existing code adheres to our project's coding standards.
  • All dependencies have been added or removed from the project's README or other documentation as needed.
  • Any relevant documentation or help files have been updated to reflect the changes made in this pull request.
  • Any necessary database migrations have been run.
  • Any relevant UI changes have been reviewed and approved by the UI/UX team.

##Test Evidence
CSV file processed without convert_dtypes flag in file source (Has .0 issue in numerical columns that have null entries)
test_without_convert.csv

CSV file processed with convert_dtypes flag in file source (Has correct data formatting even though columns have null entries)
test_with_convert.csv

Additional Notes

[Add any additional notes or context for the reviewer or future maintainers of this code.]

Thank you for submitting!

…sible when there are nulls in file

Signed-off-by: 23f3001827 <23f3001827@ds.study.iitm.ac.in>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants