Enhanced file reader script to allow conversion to best datatypes pos…#72
Draft
kaushik-sb wants to merge 1 commit into
Draft
Enhanced file reader script to allow conversion to best datatypes pos…#72kaushik-sb wants to merge 1 commit into
kaushik-sb wants to merge 1 commit into
Conversation
…sible when there are nulls in file Signed-off-by: 23f3001827 <23f3001827@ds.study.iitm.ac.in>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[FEATURE] Enhance CSVFileReader to support dtype conversion with null values
Description
This PR enhances the CSV file reading functionality to handle datatype conversion more robustly when null values are present.
Previously, datasets containing null values could lead to inconsistent or suboptimal datatype inference. This change ensures that when convert_dtypes is enabled in the source configuration, the dataframe uses pandas' built-in convert_dtypes() for safer and more accurate type inference.
Changes Made
Added conditional dtype conversion in
CSVFileReader:If
convert_dtypesis set totruein the sources of config yml, the dataframe now calls:This allows:
Better handling of nullable integer and string types
Improved consistency when null values are present in input CSV files
Definition of Done
Before submitting this pull request, please ensure that the following criteria have been met:
##Test Evidence
CSV file processed without convert_dtypes flag in file source (Has .0 issue in numerical columns that have null entries)
test_without_convert.csv
CSV file processed with convert_dtypes flag in file source (Has correct data formatting even though columns have null entries)
test_with_convert.csv
Additional Notes
[Add any additional notes or context for the reviewer or future maintainers of this code.]
Thank you for submitting!