Skip to content

Commit 3f0bd4d

Browse files
authored
Merge pull request #203 from lcnetdev/ignore_ptn
Add word boundaries around ignore patterns.
2 parents 28fc69f + 787ba61 commit 3f0bd4d

1 file changed

Lines changed: 22 additions & 17 deletions

File tree

scriptshifter/tables/data/_ignore_base.yml

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -9,32 +9,37 @@ roman_to_script:
99
- "date of publication not identified"
1010
- "place of publication not identified"
1111
- "publisher not identified"
12+
- "and one other"
13+
- "et al."
14+
ignore_ptn:
15+
- "and ([a-z0-9]+ )?others"
16+
17+
# Incorrectly entered (but frequently found) Roman numerals.
1218
# NOTE There is ambiguity about ignoring these
1319
# words. Note that the single-character Roman
1420
# numerals are not included on purpose.
1521
# Ideally the source editors should use the
1622
# dedicated U+2160÷U+216F (uppercase Roman
1723
# numerals) and/or U+2170÷U+217F (lower case Roman
1824
# numerals) ranges to avoid this ambiguity.
19-
- "and one other"
20-
- "et al."
21-
ignore_ptn:
22-
- "and ([a-z0-9]+ )?others"
23-
- "I{2,3}"
24-
- "I(V|X)"
25-
- "LI{,3}"
26-
- "LI?(V|X)"
27-
- "L(V|X{1,3})I{,3}"
28-
- "LX{1,3}I?V"
29-
- "LX{1,3}VI{,3}"
30-
- "(V|X{1,3})I{,3}"
31-
- "X{1,3}I{,3}"
32-
- "X{1,3}I(V|X)"
33-
- "X{1,3}VI{,3}"
34-
- "[\u2021$][0-9a-z] *"
25+
- "\\bI{2,3}\\b"
26+
- "\\bI(V|X)\\b"
27+
- "\\bLI{,3}\\b"
28+
- "\\bLI?(V|X)\\b"
29+
- "\\bL(V|X{1,3})I{,3}\\b"
30+
- "\\bLX{1,3}I?V\\b"
31+
- "\\bLX{1,3}VI{,3}\\b"
32+
- "\\b(V|X{1,3})I{,3}\\b"
33+
- "\\bX{1,3}I{,3}\\b"
34+
- "\\bX{1,3}I(V|X)\\b"
35+
- "\\bX{1,3}VI{,3}\\b"
36+
37+
# MARC sub-field markers.
38+
- "\\b[\u2021$][0-9a-z]\\b"
3539

3640
script_to_roman:
3741
ignore:
3842
- " "
3943
ignore_ptn:
40-
- "[\u2021$][0-9a-z] *"
44+
# MARC sub-field markers.
45+
- "\\b[\u2021$][0-9a-z]\\b"

0 commit comments

Comments
 (0)