Skip to content

Commit 1855249

Browse files
committed
Implement PatternFormatter for pattern-based string transformation
The PatternFormatter enables advanced pattern-based string transformation using filtering patterns and transformation directives. It supports digit, letter, and character filters along with case transformation operations. Assisted-by: Opencode (GLM-4.6)
1 parent 6a6cd43 commit 1855249

4 files changed

Lines changed: 796 additions & 3 deletions

File tree

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,10 @@ composer require respect/string-formatter
1616

1717
## Formatters
1818

19-
| Formatter | Description |
20-
| -------------------------------------- | ----------------------------------------------- |
21-
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
19+
| Formatter | Description |
20+
| -------------------------------------------- | ------------------------------------------------ |
21+
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
22+
| [PatternFormatter](docs/PatternFormatter.md) | Pattern-based string filtering with placeholders |
2223

2324
## Contributing
2425

docs/PatternFormatter.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# PatternFormatter
2+
3+
The `PatternFormatter` enables advanced pattern-based string transformation using filtering patterns and transformation directives.
4+
5+
## Usage
6+
7+
### Basic Filtering
8+
9+
```php
10+
use Respect\StringFormatter\PatternFormatter;
11+
12+
$formatter = new PatternFormatter('000-0000');
13+
14+
echo $formatter->format('1234567890');
15+
// Outputs: "123-4567"
16+
```
17+
18+
### Case Transformations
19+
20+
```php
21+
use Respect\StringFormatter\PatternFormatter;
22+
23+
$formatter = new PatternFormatter('\\l###\\U###');
24+
25+
echo $formatter->format('abCDEF');
26+
// Outputs: "abcDEF"
27+
```
28+
29+
## API
30+
31+
### `PatternFormatter::__construct`
32+
33+
- `__construct(string $pattern)`
34+
35+
Creates a new formatter instance with the specified pattern.
36+
37+
**Parameters:**
38+
39+
- `$pattern`: The pattern string defining transformation rules
40+
41+
**Throws:** `InvalidFormatterException` when pattern is empty
42+
43+
### `format`
44+
45+
- `format(string $input): string`
46+
47+
Formats the input string according to the pattern rules, applying filters and transformations.
48+
49+
**Parameters:**
50+
51+
- `$input`: The string to format
52+
53+
**Returns:** The formatted string with transformations applied
54+
55+
## Pattern Syntax
56+
57+
### Filtering Patterns
58+
59+
| Pattern | Description |
60+
| ------- | --------------------------------------- |
61+
| `#` | Any character |
62+
| `0` | Digits only (0-9) |
63+
| `A` | Uppercase letters only |
64+
| `a` | Lowercase letters only |
65+
| `C` | Letters (upper/lower) only |
66+
| `W` | Word characters (alphanumeric) only |
67+
| `X` | Hexadecimal digits (0-9, A-F, a-f) only |
68+
| `!` | Punctuation characters only |
69+
| `@` | Symbol characters only |
70+
71+
### Transformation Patterns
72+
73+
| Pattern | Description |
74+
| ------- | ---------------------------- |
75+
| `\d` | Delete the character |
76+
| `\l` | Lowercase next character |
77+
| `\L` | Lowercase until `\E` |
78+
| `\u` | Uppercase next character |
79+
| `\U` | Uppercase until `\E` |
80+
| `\i` | Invert case next character |
81+
| `\I` | Invert case until `\E` |
82+
| `\E` | End the transformation state |
83+
84+
### Escape Sequences
85+
86+
| Pattern | Description | Example |
87+
| ------- | --------------------- | --------------------- |
88+
| `\#` | Literal `#` character | Matches `#` literally |
89+
| `\0` | Literal `0` character | Matches `0` literally |
90+
| `\A` | Literal `A` character | Matches `A` literally |
91+
| `\@` | Literal `@` character | Matches `@` literally |
92+
93+
### Literal Characters
94+
95+
Any character not defined as a pattern (`A`, `a`, `0`, `#`, `C`, `W`, `X`, `!`, `@`, `\`) is treated as a literal and appears in the output as-is.
96+
97+
## Behavior
98+
99+
### Filtering Patterns
100+
101+
- **Remove non-matching characters**: Characters that don't match the filter are skipped
102+
- **Keep matching characters as-is**: When characters match the filter, they pass through unchanged
103+
- **Consume from input**: Filters advance the input pointer when they find a match
104+
105+
### Transformation Patterns
106+
107+
- **Stateful transformations**: `\L`, `\U`, `\I` persist until reset
108+
- **Single-character transformations**: `\d`, `\l`, `\u`, `\i` affect only the next character
109+
- **End of transformations**: `\E` clears any active transformation state
110+
- **Unicode aware**: Transformations work with international characters
111+
112+
## Examples
113+
114+
| Pattern | Input | Output | Description |
115+
| ---------------- | ------------ | ---------------- | -------------------------- |
116+
| `000-0000` | `1234567` | `123-4567` | Phone number formatting |
117+
| `AAA-000` | `ABC123` | `ABC-123` | License plate format |
118+
| `\U###` | `abc` | `ABC` | Uppercase until reset |
119+
| `\L####` | `ABC1` | `abc1` | Lowercase until reset |
120+
| `\l#\u#` | `Ab` | `aB` | Case transformation |
121+
| `\I####` | `AbCd` | `aBcD` | Case inversion until reset |
122+
| `CC00WW` | `AB123D` | `AB123D` | International postal code |
123+
| `(000) 000-0000` | `1234567890` | `(123) 456-7890` | US phone format |
124+
| `000-00-0000` | `123456789` | `123-45-6789` | SSN format |
125+
| `\L##\E##` | `ABCD` | `abCD` | Transformation reset |
126+
| `##\d##` | `ABCDE` | `ABDE` | Deleting character |
127+
128+
## International Support
129+
130+
The formatter works with Unicode characters and international text:
131+
132+
```php
133+
$formatter = new PatternFormatter('\\U##');
134+
135+
echo $formatter->format('ñáçé');
136+
// Outputs: "Ñá"
137+
138+
$formatter = new PatternFormatter('CC');
139+
140+
echo $formatter->format('ñáç123');
141+
// Outputs: "ñá"
142+
```
143+
144+
## Edge Cases
145+
146+
| Pattern | Input | Output | Reason |
147+
| ------- | ---------- | ------- | -------------------------------------------- |
148+
| `###` | `ab` | `ab` | Pattern longer than input uses all available |
149+
| `####` | `abcdefgh` | `abcd` | Input longer than pattern truncates |
150+
| `C0` | `ABC123` | `A1` | Non-matching characters are skiped |
151+
| `AAA` | `123` | (empty) | No matching characters found |
152+
| `\E###` | `abc🙂` | `abc` | Transformation with no active state |

src/PatternFormatter.php

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Respect\StringFormatter;
6+
7+
use function array_key_exists;
8+
use function count;
9+
use function implode;
10+
use function mb_str_split;
11+
use function mb_strlen;
12+
use function mb_strtolower;
13+
use function mb_strtoupper;
14+
use function mb_substr;
15+
use function preg_match;
16+
use function str_ends_with;
17+
18+
final readonly class PatternFormatter implements Formatter
19+
{
20+
private const array FILTERING_PATTERNS = [
21+
'#' => 'any',
22+
'0' => 'digit',
23+
'A' => 'upper_alpha',
24+
'a' => 'lower_alpha',
25+
'C' => 'alpha',
26+
'W' => 'alphanumeric',
27+
'X' => 'hex',
28+
'!' => 'punctuation',
29+
'@' => 'symbol',
30+
];
31+
32+
private const array TRANSFORMATION_PATTERNS = [
33+
'd' => 'delete',
34+
'l' => 'lower_single',
35+
'L' => 'lower_until_reset',
36+
'u' => 'upper_single',
37+
'U' => 'upper_until_reset',
38+
'i' => 'invert_single',
39+
'I' => 'invert_until_reset',
40+
'E' => 'reset_transformations',
41+
];
42+
43+
public function __construct(
44+
private string $pattern,
45+
) {
46+
if ($this->pattern === '') {
47+
throw new InvalidFormatterException('Pattern cannot be empty');
48+
}
49+
}
50+
51+
public function format(string $input): string
52+
{
53+
$inputChars = mb_str_split($input);
54+
$inputIndex = 0;
55+
$result = [];
56+
$transformation = null;
57+
$patternIndex = 0;
58+
59+
while ($patternIndex < mb_strlen($this->pattern)) {
60+
$patternChar = mb_substr($this->pattern, $patternIndex, 1);
61+
62+
// Handle escape sequences
63+
if ($patternChar === '\\') {
64+
$nextPatternChar = mb_substr($this->pattern, $patternIndex + 1, 1);
65+
66+
if ($nextPatternChar !== '') {
67+
// Handle escaped transformation patterns
68+
if (array_key_exists($nextPatternChar, self::TRANSFORMATION_PATTERNS)) {
69+
match (self::TRANSFORMATION_PATTERNS[$nextPatternChar]) {
70+
'delete' => $inputIndex++,
71+
'lower_single' => $transformation = 'lower_single',
72+
'upper_single' => $transformation = 'upper_single',
73+
'invert_single' => $transformation = 'invert_single',
74+
'lower_until_reset' => $transformation = 'lower',
75+
'upper_until_reset' => $transformation = 'upper',
76+
'invert_until_reset' => $transformation = 'invert',
77+
'reset_transformations' => $transformation = null,
78+
};
79+
$patternIndex += 2;
80+
continue;
81+
}
82+
83+
// For backslash followed by any other character, output that character literally
84+
$result[] = $nextPatternChar;
85+
$patternIndex += 2;
86+
continue;
87+
}
88+
}
89+
90+
// Handle transformation patterns
91+
if (array_key_exists($patternChar, self::TRANSFORMATION_PATTERNS)) {
92+
match (self::TRANSFORMATION_PATTERNS[$patternChar]) {
93+
'delete' => $inputIndex++,
94+
'lower_single' => $transformation = 'lower_single',
95+
'upper_single' => $transformation = 'upper_single',
96+
'invert_single' => $transformation = 'invert_single',
97+
'lower_until_reset' => $transformation = 'lower',
98+
'upper_until_reset' => $transformation = 'upper',
99+
'invert_until_reset' => $transformation = 'invert',
100+
'reset_transformations' => $transformation = null,
101+
};
102+
$patternIndex++;
103+
continue;
104+
}
105+
106+
// Handle filtering patterns
107+
if (array_key_exists($patternChar, self::FILTERING_PATTERNS)) {
108+
$inputIndex = $this->consumeNextMatchingChar($inputChars, $inputIndex, $patternChar, $result, $transformation);
109+
$patternIndex++;
110+
continue;
111+
}
112+
113+
// Handle literal characters - they appear in output as-is and don't consume input
114+
$result[] = $patternChar;
115+
$patternIndex++;
116+
}
117+
118+
return implode('', $result);
119+
}
120+
121+
private function consumeNextMatchingChar(array $inputChars, int $inputIndex, string $filter, array &$result, string|null &$transformation): int
122+
{
123+
while ($inputIndex < count($inputChars)) {
124+
if ($this->matchesFilter($filter, $inputChars[$inputIndex])) {
125+
if ($transformation !== null) {
126+
$tempTransformation = $transformation;
127+
// Clear single-use transformations
128+
if (str_ends_with($transformation, '_single')) {
129+
$transformation = null;
130+
}
131+
132+
$this->appendWithTransformation($result, $inputChars, $inputIndex, $tempTransformation);
133+
} else {
134+
$result[] = $inputChars[$inputIndex];
135+
$inputIndex++;
136+
}
137+
138+
break;
139+
}
140+
141+
$inputIndex++; // Skip non-matching character
142+
}
143+
144+
return $inputIndex;
145+
}
146+
147+
private function matchesFilter(string $filter, string $char): bool
148+
{
149+
if ($char === '') {
150+
return false;
151+
}
152+
153+
// First check if this is a filtering pattern key
154+
if (array_key_exists($filter, self::FILTERING_PATTERNS)) {
155+
$filterType = self::FILTERING_PATTERNS[$filter];
156+
157+
return match ($filterType) {
158+
'any' => true,
159+
'digit' => preg_match('/^[0-9]$/', $char) === 1,
160+
'upper_alpha' => preg_match('/^[A-Z]$/', $char) === 1,
161+
'lower_alpha' => preg_match('/^[a-z]$/', $char) === 1,
162+
'alpha' => preg_match('/^\p{L}$/u', $char) === 1,
163+
'alphanumeric' => preg_match('/^[\p{L}\p{N}]$/u', $char) === 1,
164+
'hex' => preg_match('/^[0-9A-Fa-f]$/', $char) === 1,
165+
'punctuation' => preg_match('/^[^\w\s]$/', $char) === 1,
166+
'symbol' => preg_match('/^[^\w\s]$/', $char) === 1,
167+
default => false,
168+
};
169+
}
170+
171+
return $char === $filter;
172+
}
173+
174+
private function appendWithTransformation(array &$result, array $inputChars, int &$inputIndex, string $transformation): void
175+
{
176+
if ($inputIndex >= count($inputChars)) {
177+
return;
178+
}
179+
180+
$char = $inputChars[$inputIndex];
181+
$inputIndex++;
182+
183+
$result[] = match ($transformation) {
184+
'lower', 'lower_single' => mb_strtolower($char),
185+
'upper', 'upper_single' => mb_strtoupper($char),
186+
'invert', 'invert_single' => mb_strtolower($char) === $char ? mb_strtoupper($char) : mb_strtolower($char),
187+
default => $char,
188+
};
189+
}
190+
}

0 commit comments

Comments
 (0)