Skip to content

Commit 92b9103

Browse files
committed
Implement PatternFormatter for pattern-based string transformation
The PatternFormatter enables advanced pattern-based string transformation using filtering patterns and transformation directives. It supports digit, letter, and character filters along with case transformation operations. Assisted-by: Opencode (GLM-4.6)
1 parent 6a6cd43 commit 92b9103

4 files changed

Lines changed: 800 additions & 3 deletions

File tree

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,10 @@ composer require respect/string-formatter
1616

1717
## Formatters
1818

19-
| Formatter | Description |
20-
| -------------------------------------- | ----------------------------------------------- |
21-
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
19+
| Formatter | Description |
20+
| -------------------------------------------- | ------------------------------------------------ |
21+
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
22+
| [PatternFormatter](docs/PatternFormatter.md) | Pattern-based string filtering with placeholders |
2223

2324
## Contributing
2425

docs/PatternFormatter.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# PatternFormatter
2+
3+
The `PatternFormatter` enables advanced pattern-based string transformation using filtering patterns and transformation directives.
4+
5+
## Usage
6+
7+
### Basic Filtering
8+
9+
```php
10+
use Respect\StringFormatter\PatternFormatter;
11+
12+
$formatter = new PatternFormatter('000-0000');
13+
14+
echo $formatter->format('1234567890');
15+
// Outputs: "123-4567"
16+
```
17+
18+
### Case Transformations
19+
20+
```php
21+
use Respect\StringFormatter\PatternFormatter;
22+
23+
$formatter = new PatternFormatter('\\l###\\U###');
24+
25+
echo $formatter->format('abCDEF');
26+
// Outputs: "abcDEF"
27+
```
28+
29+
## API
30+
31+
### `PatternFormatter::__construct`
32+
33+
- `__construct(string $pattern)`
34+
35+
Creates a new formatter instance with the specified pattern.
36+
37+
**Parameters:**
38+
39+
- `$pattern`: The pattern string defining transformation rules
40+
41+
**Throws:** `InvalidFormatterException` when pattern is empty
42+
43+
### `format`
44+
45+
- `format(string $input): string`
46+
47+
Formats the input string according to the pattern rules, applying filters and transformations.
48+
49+
**Parameters:**
50+
51+
- `$input`: The string to format
52+
53+
**Returns:** The formatted string with transformations applied
54+
55+
## Pattern Syntax
56+
57+
### Filtering Patterns
58+
59+
| Pattern | Description |
60+
| ------- | --------------------------------------- |
61+
| `#` | Any character |
62+
| `0` | Digits only (0-9) |
63+
| `A` | Uppercase letters only |
64+
| `a` | Lowercase letters only |
65+
| `C` | Letters (upper/lower) only |
66+
| `W` | Word characters (alphanumeric) only |
67+
68+
### Transformation Patterns
69+
70+
| Pattern | Description |
71+
| ------- | ---------------------------- |
72+
| `\d` | Delete the character |
73+
| `\l` | Lowercase next character |
74+
| `\L` | Lowercase until `\E` |
75+
| `\u` | Uppercase next character |
76+
| `\U` | Uppercase until `\E` |
77+
| `\i` | Invert case next character |
78+
| `\I` | Invert case until `\E` |
79+
| `\E` | End the transformation state |
80+
81+
### Escape Sequences
82+
83+
| Pattern | Description | Example |
84+
| ------- | --------------------- | --------------------- |
85+
| `\#` | Literal `#` character | Matches `#` literally |
86+
| `\0` | Literal `0` character | Matches `0` literally |
87+
| `\A` | Literal `A` character | Matches `A` literally |
88+
| `\@` | Literal `@` character | Matches `@` literally |
89+
90+
### Literal Characters
91+
92+
Any character not defined as a pattern (`A`, `a`, `0`, `#`, `C`, `W`, `\`) is treated as a literal and appears in the output as-is.
93+
94+
## Behavior
95+
96+
### Filtering Patterns
97+
98+
- **Remove non-matching characters**: Characters that don't match the filter are skipped
99+
- **Keep matching characters as-is**: When characters match the filter, they pass through unchanged
100+
- **Consume from input**: Filters advance the input pointer when they find a match
101+
102+
### Transformation Patterns
103+
104+
- **Stateful transformations**: `\L`, `\U`, `\I` persist until reset
105+
- **Single-character transformations**: `\d`, `\l`, `\u`, `\i` affect only the next character
106+
- **End of transformations**: `\E` clears any active transformation state
107+
- **Unicode aware**: Transformations work with international characters
108+
109+
## Examples
110+
111+
| Pattern | Input | Output | Description |
112+
| ---------------- | ------------ | ---------------- | -------------------------- |
113+
| `000-0000` | `1234567` | `123-4567` | Phone number formatting |
114+
| `AAA-000` | `ABC123` | `ABC-123` | License plate format |
115+
| `\U###` | `abc` | `ABC` | Uppercase until reset |
116+
| `\L####` | `ABC1` | `abc1` | Lowercase until reset |
117+
| `\l#\u#` | `Ab` | `aB` | Case transformation |
118+
| `\I####` | `AbCd` | `aBcD` | Case inversion until reset |
119+
| `CC00WW` | `AB123D` | `AB123D` | International postal code |
120+
| `(000) 000-0000` | `1234567890` | `(123) 456-7890` | US phone format |
121+
| `000-00-0000` | `123456789` | `123-45-6789` | SSN format |
122+
| `\L##\E##` | `ABCD` | `abCD` | Transformation reset |
123+
| `##\d##` | `ABCDE` | `ABDE` | Deleting character |
124+
125+
## International Support
126+
127+
The formatter works with Unicode characters and international text:
128+
129+
```php
130+
$formatter = new PatternFormatter('\\U##');
131+
132+
echo $formatter->format('ñáçé');
133+
// Outputs: "Ñá"
134+
135+
$formatter = new PatternFormatter('CC');
136+
137+
echo $formatter->format('ñáç123');
138+
// Outputs: "ñá"
139+
```
140+
141+
## Edge Cases
142+
143+
| Pattern | Input | Output | Reason |
144+
| ------- | ---------- | ------- | -------------------------------------------- |
145+
| `###` | `ab` | `ab` | Pattern longer than input uses all available |
146+
| `####` | `abcdefgh` | `abcd` | Input longer than pattern truncates |
147+
| `C0` | `ABC123` | `A1` | Non-matching characters are skiped |
148+
| `AAA` | `123` | (empty) | No matching characters found |
149+
| `\E###` | `abc🙂` | `abc` | Transformation with no active state |

src/PatternFormatter.php

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Respect\StringFormatter;
6+
7+
use function array_key_exists;
8+
use function count;
9+
use function implode;
10+
use function mb_str_split;
11+
use function mb_strlen;
12+
use function mb_strtolower;
13+
use function mb_strtoupper;
14+
use function mb_substr;
15+
use function preg_match;
16+
use function str_ends_with;
17+
18+
final readonly class PatternFormatter implements Formatter
19+
{
20+
private const array FILTERING_PATTERNS = [
21+
'#' => 'any',
22+
'0' => 'digit',
23+
'A' => 'upper_alpha',
24+
'a' => 'lower_alpha',
25+
'C' => 'alpha',
26+
'W' => 'alphanumeric',
27+
'X' => 'hex',
28+
'!' => 'punctuation',
29+
'@' => 'symbol',
30+
];
31+
32+
private const array TRANSFORMATION_PATTERNS = [
33+
'd' => 'delete',
34+
'l' => 'lower_single',
35+
'L' => 'lower_until_reset',
36+
'u' => 'upper_single',
37+
'U' => 'upper_until_reset',
38+
'i' => 'invert_single',
39+
'I' => 'invert_until_reset',
40+
'E' => 'reset_transformations',
41+
];
42+
43+
public function __construct(
44+
private string $pattern,
45+
) {
46+
if ($this->pattern === '') {
47+
throw new InvalidFormatterException('Pattern cannot be empty');
48+
}
49+
}
50+
51+
public function format(string $input): string
52+
{
53+
$inputChars = mb_str_split($input);
54+
$inputIndex = 0;
55+
$result = [];
56+
$transformation = null;
57+
$patternIndex = 0;
58+
59+
while ($patternIndex < mb_strlen($this->pattern)) {
60+
$patternChar = mb_substr($this->pattern, $patternIndex, 1);
61+
62+
// Handle escape sequences
63+
if ($patternChar === '\\') {
64+
$nextPatternChar = mb_substr($this->pattern, $patternIndex + 1, 1);
65+
66+
if ($nextPatternChar !== '') {
67+
// Handle escaped transformation patterns
68+
if (array_key_exists($nextPatternChar, self::TRANSFORMATION_PATTERNS)) {
69+
match (self::TRANSFORMATION_PATTERNS[$nextPatternChar]) {
70+
'delete' => $inputIndex++,
71+
'lower_single' => $transformation = 'lower_single',
72+
'upper_single' => $transformation = 'upper_single',
73+
'invert_single' => $transformation = 'invert_single',
74+
'lower_until_reset' => $transformation = 'lower',
75+
'upper_until_reset' => $transformation = 'upper',
76+
'invert_until_reset' => $transformation = 'invert',
77+
'reset_transformations' => $transformation = null,
78+
};
79+
$patternIndex += 2;
80+
continue;
81+
}
82+
83+
// For backslash followed by any other character, output that character literally
84+
$result[] = $nextPatternChar;
85+
$patternIndex += 2;
86+
continue;
87+
}
88+
}
89+
90+
// Handle transformation patterns
91+
if (array_key_exists($patternChar, self::TRANSFORMATION_PATTERNS)) {
92+
match (self::TRANSFORMATION_PATTERNS[$patternChar]) {
93+
'delete' => $inputIndex++,
94+
'lower_single' => $transformation = 'lower_single',
95+
'upper_single' => $transformation = 'upper_single',
96+
'invert_single' => $transformation = 'invert_single',
97+
'lower_until_reset' => $transformation = 'lower',
98+
'upper_until_reset' => $transformation = 'upper',
99+
'invert_until_reset' => $transformation = 'invert',
100+
'reset_transformations' => $transformation = null,
101+
};
102+
$patternIndex++;
103+
continue;
104+
}
105+
106+
// Handle filtering patterns
107+
if (array_key_exists($patternChar, self::FILTERING_PATTERNS)) {
108+
$inputIndex = $this->consumeNextMatchingChar(
109+
$inputChars,
110+
$inputIndex,
111+
$patternChar,
112+
$result,
113+
$transformation,
114+
);
115+
$patternIndex++;
116+
continue;
117+
}
118+
119+
// Handle literal characters - they appear in output as-is and don't consume input
120+
$result[] = $patternChar;
121+
$patternIndex++;
122+
}
123+
124+
return implode('', $result);
125+
}
126+
127+
/**
128+
* @param array<string> $inputChars
129+
* @param array<string> $result
130+
*/
131+
private function consumeNextMatchingChar(
132+
array $inputChars,
133+
int $inputIndex,
134+
string $filter,
135+
array &$result,
136+
string|null &$transformation,
137+
): int {
138+
while ($inputIndex < count($inputChars)) {
139+
if ($this->matchesFilter($filter, $inputChars[$inputIndex])) {
140+
if ($transformation !== null) {
141+
$tempTransformation = $transformation;
142+
// Clear single-use transformations
143+
if (str_ends_with($transformation, '_single')) {
144+
$transformation = null;
145+
}
146+
147+
$this->appendWithTransformation($result, $inputChars, $inputIndex, $tempTransformation);
148+
} else {
149+
$result[] = $inputChars[$inputIndex];
150+
$inputIndex++;
151+
}
152+
153+
break;
154+
}
155+
156+
$inputIndex++; // Skip non-matching character
157+
}
158+
159+
return $inputIndex;
160+
}
161+
162+
private function matchesFilter(string $filter, string $char): bool
163+
{
164+
if ($char === '') {
165+
return false;
166+
}
167+
168+
// First check if this is a filtering pattern key
169+
if (array_key_exists($filter, self::FILTERING_PATTERNS)) {
170+
$filterType = self::FILTERING_PATTERNS[$filter];
171+
172+
return match ($filterType) {
173+
'any' => true,
174+
'digit' => preg_match('/^[0-9]$/', $char) === 1,
175+
'upper_alpha' => preg_match('/^[A-Z]$/', $char) === 1,
176+
'lower_alpha' => preg_match('/^[a-z]$/', $char) === 1,
177+
'alpha' => preg_match('/^\p{L}$/u', $char) === 1,
178+
'alphanumeric' => preg_match('/^[\p{L}\p{N}]$/u', $char) === 1,
179+
180+
default => false,
181+
};
182+
}
183+
184+
return $char === $filter;
185+
}
186+
187+
/**
188+
* @param array<string> $result
189+
* @param array<string> $inputChars
190+
*/
191+
private function appendWithTransformation(
192+
array &$result,
193+
array $inputChars,
194+
int &$inputIndex,
195+
string $transformation,
196+
): void {
197+
if ($inputIndex >= count($inputChars)) {
198+
return;
199+
}
200+
201+
$char = $inputChars[$inputIndex];
202+
$inputIndex++;
203+
204+
$result[] = match ($transformation) {
205+
'lower', 'lower_single' => mb_strtolower($char),
206+
'upper', 'upper_single' => mb_strtoupper($char),
207+
'invert', 'invert_single' => mb_strtolower($char) === $char ? mb_strtoupper($char) : mb_strtolower($char),
208+
default => $char,
209+
};
210+
}
211+
}

0 commit comments

Comments
 (0)