-
Notifications
You must be signed in to change notification settings - Fork 7
Expand file tree
/
Copy path_ignore_base.yml
More file actions
45 lines (42 loc) · 1.17 KB
/
_ignore_base.yml
File metadata and controls
45 lines (42 loc) · 1.17 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
general:
name: Common ignore list.
description: List of Latin strings not converted.
version: 1.0.0
date: 2025-12-03
roman_to_script:
ignore:
- "At head of title"
- "at head of title"
- "Colophon"
- "colophon"
- "Cover title"
- "date of publication not identified"
- "place of publication not identified"
- "publisher not identified"
- "and one other"
- "and others"
- "et al."
ignore_ptn:
- "and ([a-z0-9]+ )?others"
# Incorrectly entered (but frequently found) Roman numerals.
# NOTE There is ambiguity about ignoring these
# words. Note that the single-character Roman
# numerals are not included on purpose.
# Ideally the source editors should use the
# dedicated U+2160÷U+216F (uppercase Roman
# numerals) and/or U+2170÷U+217F (lower case Roman
# numerals) ranges to avoid this ambiguity.
- "I{2,3}\\b"
- "I(V|X)\\b"
- "LI{,3}\\b"
- "LI?(V|X)\\b"
- "L(V|X{1,3})I{,3}\\b"
- "LX{1,3}I?V\\b"
- "LX{1,3}VI{,3}\\b"
- "VI{1,3}\\b"
- "X{1,3}I{1,3}\\b"
- "X{1,3}I(V|X)\\b"
- "X{1,3}VI{,3}\\b"
# MARC sub-field markers.
- "[\u2021\u01C2\\$][0-9a-z]\\b"