Fixed:
- using
na_pct_below from_dfnow includes metadatafrom_dfnow generates correctna_pct_below(0.01) for full datasets #63
Changed:
- bumped minimum python version to 3.8
- Support for Python 3.11 #54
- Pydantic migrated to v2
- Allows use of Pandas v2
- Version in metadata
- adds
dfschemaandpandasversion in metadata upon generation (Later will worn if Schema is initialized from json, generated by later version)
- adds
- Renamed
na_limittona_pct_belowto make it unambiguous (with backward support) #64 - Added
optional=Trueflag for columns. If true, does not raise exception if column is not present - added
dfschema update {existing_schema} {output_schema}command to upgrade schemas
- relaxed Pydantic requirement to
>=1.9
- Pydantic bumped to
1.10 - Bug Fix: Categorical constraints (
exact_set,oneof,include) now can keeointandfloatvalues. That expands to legacy schemas as well.
Legacy Schema Aliases (support for legacy schemas):
min_valuenow also supportsminaliasmax_valuenow also supportsmaxaliasoneofnow also supportsone_ofaliasversionis now correctly moved tometadatafrom root on migration- If column schema has both
oneofandincludesand they are identical, will replace withexact_set
Testing:
- conftest code improved to showcase bad json on Exception
- multiple v1 schemas were added for testing
- pre-commit setup was updated
- rename
DfSchema.validate_dftoDfSchema.validate(UNDONE:validateis reserved by Pydantic object) - updated documentation
- `DfSchema.to_file`, `DfSchema.from_file` proper testing
- CLI command help texts
- added pre-commit install to the repo
- Some benchmarking
- renamed `dfs.validate_df` to `dfs.validate`
- fix column dtype generation/validation bug
- renamed strict_column_set to additionalColumns
- renamed strict_column_order to exactColumnOrder
- Metadata SubObject
- Summary Exception is now collected for specific DfSchema, not via Borg State
- Supports SubSets
- Support reading and writing schemas as yaml
- added
validate_sqlmethod (based onpd.read_sqlfor everything including dtype mapping) - added cli support for schema generation or validation
- support for subsets in
from_df - support for
str_patterns(string columns are matched against string prefix / regex patterns )
v1.1.0
- added support for "exact_set" (exact match of categorical values)
- better structure of tests and code
- added
summaryargument. If True, all tests will be ran and errors will be summarized inDataFrameSummaryErrorexception. - re-enabled schema generation