-
Notifications
You must be signed in to change notification settings - Fork 3
Add support for parsing Gregorian dates in standard text formats #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
c37a1fe
Preliminary gregorian grammer, parser, and tests
rlskoeser 3c58c2e
Fully implement script to generate month names for Gregorian parser
rlskoeser 01ebe5e
Grammar with month names in multiple languages
rlskoeser fb29061
Import and use all month names
rlskoeser 6d6259b
Don't repeat month names / abbreviations
rlskoeser 294e573
Add more test cases in multiple languages
rlskoeser f654e83
Test gregorian parser transformer; refine parsing logic
rlskoeser 5b5d89f
Connect parsing to gregorian converter class and test
rlskoeser 6600f58
Add Gregorian to omnibus parser
rlskoeser 2bd8c23
Document Gregorian parser & languages in change log
rlskoeser 9ca8424
Add dev notes for codegen script; drop uvx from hatch run command
rlskoeser e4c468d
Make Gregorian parser case-insensitive
rlskoeser a29a5a4
Test error handling in gregorian converter parse method
rlskoeser b9c2bf6
Catch more generic Lark exception per @coderabbitai
rlskoeser bb1d724
Ignore commas and periods across all grammars
rlskoeser e16f4d2
Use markdown formatting instead of rst for hatch run command
rlskoeser 3efce6d
Add new undate_common lark grammar to version control
rlskoeser c718db7
Last minor cleanup
rlskoeser File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| #!/usr/bin/env python | ||
| """ | ||
| This script generates the gregorian_multilang.lark file | ||
| with month names (full and abbreviated) based on the list of | ||
| target languages. | ||
|
|
||
| Run this script with hatch to regenerate the file:: | ||
|
|
||
| hatch run codegen:generate | ||
|
|
||
| """ | ||
|
|
||
| from collections import defaultdict | ||
| import pathlib | ||
|
|
||
| from babel.dates import get_month_names | ||
|
|
||
| # lark grammar path relative to this script | ||
| GRAMMAR_DIR_PATH = ( | ||
| pathlib.Path(__file__).parent.parent / "src" / "undate" / "converters" / "grammars" | ||
| ) | ||
| # file that is generated by this script, in that directory | ||
| MONTH_GRAMMAR_FILE = GRAMMAR_DIR_PATH / "gregorian_multilang.lark" | ||
|
|
||
| # include month names in the following languages | ||
| languages = [ | ||
| "en", # English | ||
| "es", # Spanish | ||
| "fr", # French | ||
| "de", # German | ||
| "rw", # Kinyarwanda | ||
| "lg", # Ganda | ||
| "ti", # Tigrinya | ||
| ] | ||
|
|
||
| # warning to include at top of generated file | ||
| warning_text = """// WARNING: This file is auto-generated. DO NOT EDIT. | ||
| // To regenerate: hatch run codegen:generate | ||
|
|
||
| """ | ||
|
|
||
|
|
||
| def main(): | ||
| # create a dictionary of lists to hold the names for each month | ||
| all_month_names = defaultdict(list) | ||
|
|
||
| for lang in languages: | ||
| for width in ["wide", "abbreviated"]: | ||
| for month_num, month_name in get_month_names(width, locale=lang).items(): | ||
| # some locales use a . on the shortened month; let's ignore that | ||
| month_name = month_name.strip(".").lower() | ||
| # In some cases different languages have the same abbreviations; | ||
| # in some cases, abbreviated and full are the same. | ||
| # Only add if not already present, to avoid redundancy | ||
| if month_name not in all_month_names[month_num]: | ||
| all_month_names[month_num].append(month_name) | ||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| with MONTH_GRAMMAR_FILE.open("w") as outfile: | ||
| outfile.write(warning_text) | ||
|
|
||
| # for each numeric month, generate a rule with all variant names: | ||
| # month_1: /January|Jan/i | ||
| for i, names in all_month_names.items(): | ||
| # combine all names in a case-insensitive OR regex | ||
| # sort shortest variants last to avoid partial matches hitting first | ||
| or_names = "|".join(sorted(names, key=len, reverse=True)) | ||
| outfile.write(f"month_{i}: /({or_names})/i\n") | ||
|
|
||
| print( | ||
| f"Successfully regenerated {MONTH_GRAMMAR_FILE.relative_to(pathlib.Path.cwd())}" | ||
| ) | ||
| print("If the file has changed, make sure to commit the new version.") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| from undate.converters.calendars.gregorian.converter import GregorianDateConverter | ||
|
|
||
| __all__ = ["GregorianDateConverter"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| from lark import Lark | ||
|
|
||
| from undate.converters import GRAMMAR_FILE_PATH | ||
|
|
||
| grammar_path = GRAMMAR_FILE_PATH / "gregorian.lark" | ||
|
|
||
| # open based on filename to allow relative imports based on grammar file | ||
| gregorian_parser = Lark.open( | ||
| str(grammar_path), rel_to=__file__, start="gregorian_date", strict=True | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| from lark import Transformer, Tree | ||
|
|
||
| from undate import Undate, Calendar | ||
|
|
||
|
|
||
| class GregorianDateTransformer(Transformer): | ||
| """Transform a Gregorian date parse tree and return an Undate.""" | ||
|
|
||
| # Currently parser should not result in intervals | ||
|
|
||
| calendar = Calendar.GREGORIAN | ||
|
|
||
| def gregorian_date(self, items): | ||
| parts = {} | ||
| for child in items: | ||
| if child.data in ["year", "month", "day"]: | ||
| # in each case we expect one integer value; | ||
| # anonymous tokens convert to their value and cast as int | ||
| value = int(child.children[0]) | ||
| parts[str(child.data)] = value | ||
|
|
||
| # initialize and return an undate with year, month, day and | ||
| # Gregorian calendar | ||
| return Undate(**parts, calendar=self.calendar) | ||
|
|
||
| def year(self, items): | ||
| # combine multiple parts into a single string | ||
| value = "".join([str(i) for i in items]) | ||
| return Tree(data="year", children=[value]) | ||
|
|
||
| def month(self, items): | ||
| # month has a nested tree for the rule and the value | ||
| # the name of the rule (month_1, month_2, etc) gives us the | ||
| # number of the month needed for converting the date | ||
| tree = items[0] | ||
| month_n = tree.data.split("_")[-1] | ||
| return Tree(data="month", children=[month_n]) | ||
|
|
||
| def day(self, items): | ||
| # combine multiple parts into a single string | ||
| value = "".join([str(i) for i in items]) | ||
| return Tree(data="day", children=[value]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.