Version: 2024.5.15-torch_ios-shim (__version__); dist-info reports 2024.11.6
Type: Pure Python — iOS shim re-exporting re (not the real Matthew Barnett extension)
SPM target: Bundled in the Python framework
Total Python modules: 1 (regex/__init__.py, 61 lines)
A 61-line shim that re-exports stdlib re under the regex name so
libraries that list the Matthew Barnett regex package as a dependency
continue to import on iOS. The real regex package ships a heavy C
extension (_regex.so) around PCRE2 that we haven't cross-compiled for
iOS arm64.
| Module | What it does |
|---|---|
regex.__init__ |
The shim. Re-exports re's public surface via from re import *. Defines regex-specific flags (V0/V1/BESTMATCH/ENHANCEMATCH/REVERSE/POSIX/WORD/F) as 0 (no-op). Wraps findall/finditer/sub/search/match/fullmatch/split to accept and discard upstream-regex-only kwargs (overlapped, pos, endpos, partial, concurrent, timeout). 61 LOC total |
That's it — no submodules, no C extension.
| Feature | Status on iOS |
|---|---|
import regex |
Works |
regex.match / regex.search / regex.findall / regex.sub / regex.compile / regex.split / regex.finditer / regex.fullmatch |
Work (delegated to re) |
re's flags: IGNORECASE, MULTILINE, DOTALL, VERBOSE, UNICODE, ASCII |
Work |
regex-extra flags: V0, V1, BESTMATCH, ENHANCEMATCH, REVERSE, POSIX, WORD, F |
No-op (defined as 0) |
Extra kwargs: overlapped, pos, endpos, partial, concurrent, timeout |
Accepted and silently ignored |
Unicode property classes \p{L}, \p{N}, \p{Greek} |
Don't work — raise re.PatternError: bad escape \p |
| Variable-width lookbehind | Works only if re supports it (Python 3.7+ for fixed, 3.11+ for some variable cases) |
Fuzzy matching (?e) |
Not supported |
Named character classes [[:alpha:]] |
Not supported |
Atomic groups (?>...) |
Not supported |
Subroutine calls (?&name) |
Not supported |
import re as _re
from re import *
__version__ = "2024.5.15-torch_ios-shim"
# regex flags mapped to no-ops or re equivalents
V1 = V0 = 0 # version flag
BESTMATCH = 0 # best-match-greedy
ENHANCEMATCH = 0
REVERSE = 0
POSIX = 0
WORD = 0
DEFAULT_VERSION = V0
B = _re.ASCII # WORD boundary "ascii" alias
F = BESTMATCH # fuzzyThen thin wrappers around _re.findall / _re.finditer / _re.sub /
_re.search / _re.match / _re.fullmatch / _re.split that accept
the upstream-regex kwargs and pass only flags (and count / maxsplit
where applicable) through to re.
import regex
# Basic matching
m = regex.match(r"(\w+)@(\w+)", "user@example.com")
print(m.group(1), m.group(2)) # 'user', 'example'
# Substitution
print(regex.sub(r"\d+", "X", "abc123def456")) # 'abcXdefX'
# Compilation + flags
pat = regex.compile(r"^foo", regex.IGNORECASE | regex.MULTILINE)
# overlapped= is silently ignored (kwarg accepted; behavior is non-overlapping)
matches = regex.findall(r"a.a", "abacada", overlapped=True)
# → ['aba', 'ada'] not ['aba', 'aca', 'ada'] like real regex would give# Unicode property class — UNSUPPORTED
regex.match(r"\p{L}+", "café") # → re.PatternError: bad escape \pWorkaround: use Python's stdlib character classes:
import re
# \w matches Unicode letters by default in Python 3.x
re.match(r"\w+", "café").group() # 'café'
# For specific scripts:
import unicodedata
def is_cjk(s): return any('CJK' in unicodedata.name(c, '') for c in s)If you're porting code that depends on \p{...}, detect the iOS shim
and branch:
import regex
IS_SHIM = "torch_ios" in regex.__version__
if IS_SHIM:
# Use re-equivalent that doesn't need property classes
pattern = re.compile(r"[a-zA-Zà-üÀ-Ü…]+")
else:
pattern = regex.compile(r"\p{L}+")Most regex usage in real codebases doesn't touch property classes. If
your dep listed regex for compatibility (transformers tokenizers,
black, etc.), the shim usually covers the calls those libs make.
HuggingFace tokenizers DO use \p{...} for some BPE pre-tokenizers —
the affected models will fail at tokenization time with the bad escape \p error, in which case you either need a different model or to
build a real regex for iOS.
- C extension
_regex.sonot cross-compiled for iOS arm64 — building it requires PCRE2 + a fair bit of glue - Pure Python shim means no platform-specific bugs; behaves identically across all iOS architectures
__version__includes"torch_ios-shim"substring so callers can sniff the shim status without try/except- ~60 KB on disk (single small
.pyfile) — no native code, no embedded tables
- The dist-info reports
2024.11.6(the upstream wheel version that would have shipped with the C extension) but the actual import surface is the shim - Python stdlib
remodule — what every call ultimately delegates to