Skip to content

Commit 84c1561

Browse files
committed
Implement PatternFormatter for pattern-based string transformation
The PatternFormatter enables advanced pattern-based string transformation using filtering patterns and transformation directives. It supports digit, letter, and character filters along with case transformation operations. Assisted-by: Opencode (GLM-4.6)
1 parent 6a6cd43 commit 84c1561

4 files changed

Lines changed: 774 additions & 3 deletions

File tree

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,10 @@ composer require respect/string-formatter
1616

1717
## Formatters
1818

19-
| Formatter | Description |
20-
| -------------------------------------- | ----------------------------------------------- |
21-
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
19+
| Formatter | Description |
20+
| -------------------------------------------- | ------------------------------------------------ |
21+
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
22+
| [PatternFormatter](docs/PatternFormatter.md) | Pattern-based string filtering with placeholders |
2223

2324
## Contributing
2425

docs/PatternFormatter.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# PatternFormatter
2+
3+
The `PatternFormatter` enables advanced pattern-based string transformation using filtering patterns and transformation directives.
4+
5+
## Usage
6+
7+
### Basic Filtering
8+
9+
```php
10+
use Respect\StringFormatter\PatternFormatter;
11+
12+
$formatter = new PatternFormatter('000-0000');
13+
14+
echo $formatter->format('1234567890');
15+
// Outputs: "123-4567"
16+
```
17+
18+
### Case Transformations
19+
20+
```php
21+
use Respect\StringFormatter\PatternFormatter;
22+
23+
$formatter = new PatternFormatter('\\l###\\U###');
24+
25+
echo $formatter->format('abCDEF');
26+
// Outputs: "abcDEF"
27+
```
28+
29+
## API
30+
31+
### `PatternFormatter::__construct`
32+
33+
- `__construct(string $pattern)`
34+
35+
Creates a new formatter instance with the specified pattern.
36+
37+
**Parameters:**
38+
39+
- `$pattern`: The pattern string defining transformation rules
40+
41+
**Throws:** `InvalidFormatterException` when pattern is empty
42+
43+
### `format`
44+
45+
- `format(string $input): string`
46+
47+
Formats the input string according to the pattern rules, applying filters and transformations.
48+
49+
**Parameters:**
50+
51+
- `$input`: The string to format
52+
53+
**Returns:** The formatted string with transformations applied
54+
55+
## Pattern Syntax
56+
57+
### Filtering Patterns
58+
59+
| Pattern | Description |
60+
| ------- | --------------------------------------- |
61+
| `#` | Any character |
62+
| `0` | Digits only (0-9) |
63+
| `A` | Uppercase letters only |
64+
| `a` | Lowercase letters only |
65+
| `C` | Letters (upper/lower) only |
66+
| `W` | Word characters (alphanumeric) only |
67+
68+
### Transformation Patterns
69+
70+
| Pattern | Description |
71+
| ------- | ---------------------------- |
72+
| `\d` | Delete the character |
73+
| `\l` | Lowercase next character |
74+
| `\L` | Lowercase until `\E` |
75+
| `\u` | Uppercase next character |
76+
| `\U` | Uppercase until `\E` |
77+
| `\i` | Invert case next character |
78+
| `\I` | Invert case until `\E` |
79+
| `\E` | End the transformation state |
80+
81+
### Escape Sequences
82+
83+
| Pattern | Description | Example |
84+
| ------- | --------------------- | --------------------- |
85+
| `\#` | Literal `#` character | Matches `#` literally |
86+
| `\0` | Literal `0` character | Matches `0` literally |
87+
| `\A` | Literal `A` character | Matches `A` literally |
88+
| `\@` | Literal `@` character | Matches `@` literally |
89+
90+
### Literal Characters
91+
92+
Any character not defined as a pattern (`A`, `a`, `0`, `#`, `C`, `W`, `\`) is treated as a literal and appears in the output as-is.
93+
94+
## Behavior
95+
96+
### Filtering Patterns
97+
98+
- **Remove non-matching characters**: Characters that don't match the filter are skipped
99+
- **Keep matching characters as-is**: When characters match the filter, they pass through unchanged
100+
- **Consume from input**: Filters advance the input pointer when they find a match
101+
102+
### Transformation Patterns
103+
104+
- **Stateful transformations**: `\L`, `\U`, `\I` persist until reset
105+
- **Single-character transformations**: `\d`, `\l`, `\u`, `\i` affect only the next character
106+
- **End of transformations**: `\E` clears any active transformation state
107+
- **Unicode aware**: Transformations work with international characters
108+
109+
## Examples
110+
111+
| Pattern | Input | Output | Description |
112+
| ---------------- | ------------ | ---------------- | -------------------------- |
113+
| `000-0000` | `1234567` | `123-4567` | Phone number formatting |
114+
| `AAA-000` | `ABC123` | `ABC-123` | License plate format |
115+
| `\U###` | `abc` | `ABC` | Uppercase until reset |
116+
| `\L####` | `ABC1` | `abc1` | Lowercase until reset |
117+
| `\l#\u#` | `Ab` | `aB` | Case transformation |
118+
| `\I####` | `AbCd` | `aBcD` | Case inversion until reset |
119+
| `CC00WW` | `AB123D` | `AB123D` | International postal code |
120+
| `(000) 000-0000` | `1234567890` | `(123) 456-7890` | US phone format |
121+
| `000-00-0000` | `123456789` | `123-45-6789` | SSN format |
122+
| `\L##\E##` | `ABCD` | `abCD` | Transformation reset |
123+
| `##\d##` | `ABCDE` | `ABDE` | Deleting character |
124+
125+
## International Support
126+
127+
The formatter works with Unicode characters and international text:
128+
129+
```php
130+
$formatter = new PatternFormatter('\\U##');
131+
132+
echo $formatter->format('ñáçé');
133+
// Outputs: "Ñá"
134+
135+
$formatter = new PatternFormatter('CC');
136+
137+
echo $formatter->format('ñáç123');
138+
// Outputs: "ñá"
139+
```
140+
141+
## Edge Cases
142+
143+
| Pattern | Input | Output | Reason |
144+
| ------- | ---------- | ------- | -------------------------------------------- |
145+
| `###` | `ab` | `ab` | Pattern longer than input uses all available |
146+
| `####` | `abcdefgh` | `abcd` | Input longer than pattern truncates |
147+
| `C0` | `ABC123` | `A1` | Non-matching characters are skiped |
148+
| `AAA` | `123` | (empty) | No matching characters found |
149+
| `\E###` | `abc🙂` | `abc` | Transformation with no active state |

src/PatternFormatter.php

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Respect\StringFormatter;
6+
7+
use function array_key_exists;
8+
use function assert;
9+
use function count;
10+
use function implode;
11+
use function mb_str_split;
12+
use function mb_strlen;
13+
use function mb_strtolower;
14+
use function mb_strtoupper;
15+
use function mb_substr;
16+
use function preg_match;
17+
use function str_ends_with;
18+
19+
final readonly class PatternFormatter implements Formatter
20+
{
21+
private const array FILTERING_PATTERNS = [
22+
'#' => 'any',
23+
'0' => 'digit',
24+
'A' => 'upper_alpha',
25+
'a' => 'lower_alpha',
26+
'C' => 'alpha',
27+
'W' => 'alphanumeric',
28+
'X' => 'hex',
29+
'!' => 'punctuation',
30+
'@' => 'symbol',
31+
];
32+
33+
private const array TRANSFORMATION_PATTERNS = [
34+
'd' => 'delete',
35+
'l' => 'lower_single',
36+
'L' => 'lower_until_reset',
37+
'u' => 'upper_single',
38+
'U' => 'upper_until_reset',
39+
'i' => 'invert_single',
40+
'I' => 'invert_until_reset',
41+
'E' => 'reset_transformations',
42+
];
43+
44+
public function __construct(
45+
private string $pattern,
46+
) {
47+
if ($this->pattern === '') {
48+
throw new InvalidFormatterException('Pattern cannot be empty');
49+
}
50+
}
51+
52+
public function format(string $input): string
53+
{
54+
$inputChars = mb_str_split($input);
55+
$inputIndex = 0;
56+
$result = [];
57+
$transformation = null;
58+
$patternIndex = 0;
59+
60+
while ($patternIndex < mb_strlen($this->pattern)) {
61+
$patternChar = mb_substr($this->pattern, $patternIndex, 1);
62+
63+
// Handle escape sequences
64+
if ($patternChar === '\\') {
65+
$nextPatternChar = mb_substr($this->pattern, $patternIndex + 1, 1);
66+
67+
if ($nextPatternChar !== '') {
68+
// Handle escaped transformation patterns
69+
if (array_key_exists($nextPatternChar, self::TRANSFORMATION_PATTERNS)) {
70+
match (self::TRANSFORMATION_PATTERNS[$nextPatternChar]) {
71+
'delete' => $inputIndex++,
72+
'lower_single' => $transformation = 'lower_single',
73+
'upper_single' => $transformation = 'upper_single',
74+
'invert_single' => $transformation = 'invert_single',
75+
'lower_until_reset' => $transformation = 'lower',
76+
'upper_until_reset' => $transformation = 'upper',
77+
'invert_until_reset' => $transformation = 'invert',
78+
'reset_transformations' => $transformation = null,
79+
};
80+
$patternIndex += 2;
81+
continue;
82+
}
83+
84+
// For backslash followed by any other character, output that character literally
85+
$result[] = $nextPatternChar;
86+
$patternIndex += 2;
87+
continue;
88+
}
89+
}
90+
91+
// Handle filtering patterns
92+
if (array_key_exists($patternChar, self::FILTERING_PATTERNS)) {
93+
$inputIndex = $this->consumeNextMatchingChar(
94+
$inputChars,
95+
$inputIndex,
96+
$patternChar,
97+
$result,
98+
$transformation,
99+
);
100+
$patternIndex++;
101+
continue;
102+
}
103+
104+
// Handle literal characters - they appear in output as-is and don't consume input
105+
$result[] = $patternChar;
106+
$patternIndex++;
107+
}
108+
109+
return implode('', $result);
110+
}
111+
112+
/**
113+
* @param array<string> $inputChars
114+
* @param array<string> $result
115+
*/
116+
private function consumeNextMatchingChar(
117+
array $inputChars,
118+
int $inputIndex,
119+
string $filter,
120+
array &$result,
121+
string|null &$transformation,
122+
): int {
123+
while ($inputIndex < count($inputChars)) {
124+
if ($this->matchesFilter($filter, $inputChars[$inputIndex])) {
125+
if ($transformation !== null) {
126+
$tempTransformation = $transformation;
127+
// Clear single-use transformations
128+
if (str_ends_with($transformation, '_single')) {
129+
$transformation = null;
130+
}
131+
132+
$this->appendWithTransformation($result, $inputChars, $inputIndex, $tempTransformation);
133+
} else {
134+
$result[] = $inputChars[$inputIndex];
135+
$inputIndex++;
136+
}
137+
138+
break;
139+
}
140+
141+
$inputIndex++; // Skip non-matching character
142+
}
143+
144+
return $inputIndex;
145+
}
146+
147+
private function matchesFilter(string $filter, string $char): bool
148+
{
149+
assert(isset(self::FILTERING_PATTERNS[$filter]));
150+
151+
$filterType = self::FILTERING_PATTERNS[$filter];
152+
153+
return match ($filterType) {
154+
'any' => true,
155+
'digit' => preg_match('/^[0-9]$/', $char) === 1,
156+
'upper_alpha' => preg_match('/^[A-Z]$/', $char) === 1,
157+
'lower_alpha' => preg_match('/^[a-z]$/', $char) === 1,
158+
'alpha' => preg_match('/^\p{L}$/u', $char) === 1,
159+
'alphanumeric' => preg_match('/^[\p{L}\p{N}]$/u', $char) === 1,
160+
161+
default => false,
162+
};
163+
}
164+
165+
/**
166+
* @param array<string> $result
167+
* @param array<string> $inputChars
168+
*/
169+
private function appendWithTransformation(
170+
array &$result,
171+
array $inputChars,
172+
int &$inputIndex,
173+
string $transformation,
174+
): void {
175+
$char = $inputChars[$inputIndex];
176+
$inputIndex++;
177+
178+
$result[] = match ($transformation) {
179+
'lower', 'lower_single' => mb_strtolower($char),
180+
'upper', 'upper_single' => mb_strtoupper($char),
181+
'invert', 'invert_single' => mb_strtolower($char) === $char ? mb_strtoupper($char) : mb_strtolower($char),
182+
default => $char,
183+
};
184+
}
185+
}

0 commit comments

Comments
 (0)