Skip to content

Commit c0e244e

Browse files
authored
Merge pull request #60 from TysonAndre/decode-integer-test
Document differences from json_decode()
2 parents 104b90e + 54727fb commit c0e244e

4 files changed

Lines changed: 141 additions & 13 deletions

File tree

README.md

Lines changed: 77 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -46,14 +46,7 @@ extension=simdjson.so
4646

4747
## simdjson_php Usage
4848
```php
49-
50-
//Check if a JSON string is valid:
51-
$isValid = simdjson_is_valid($jsonString); //return bool
52-
53-
//Parsing a JSON string. similar to the json_decode() function but without the fourth argument
54-
$parsedJSON = simdjson_decode($jsonString, true, 512); //return array|object|null. "null" string is not a standard json
55-
56-
/*
49+
$jsonString = <<<'JSON'
5750
{
5851
"Image": {
5952
"Width": 800,
@@ -68,7 +61,19 @@ $parsedJSON = simdjson_decode($jsonString, true, 512); //return array|object|nul
6861
"IDs": [116, 943, 234, 38793, {"p": "30"}]
6962
}
7063
}
71-
*/
64+
JSON;
65+
66+
//Check if a JSON string is valid:
67+
$isValid = simdjson_is_valid($jsonString); //return bool
68+
var_dump($isValid); // true
69+
70+
//Parsing a JSON string. similar to the json_decode() function but without the fourth argument
71+
try {
72+
$parsedJSON = simdjson_decode($jsonString, true, 512); //return array|object|null. "null" string is not a standard json
73+
var_dump($parsedJSON); // PHP array
74+
} catch (RuntimeException $e) {
75+
echo "Failed to parse $jsonString: {$e->getMessage()}\n";
76+
}
7277

7378
//note. "/" is a separator. Can be used as the "key" of the object and the "index" of the array
7479
//E.g. "Image/Thumbnail/Url" is ok.
@@ -97,5 +102,68 @@ var_dump($res) //int(5)
97102

98103
```
99104

105+
## simdjson_php API
106+
107+
```php
108+
<?php
109+
/**
110+
* Similar to json_decode()
111+
*
112+
* @returns array|stdClass|string|float|int|bool|null
113+
* @throws RuntimeException for invalid JSON (or document over 4GB, or out of range integer/float)
114+
*/
115+
function simdjson_decode(string $json, bool $assoc = false, int $depth = 512) {}
116+
117+
/**
118+
* Returns true if json is valid.
119+
*
120+
* @returns ?bool (null if depth is invalid)
121+
*/
122+
function simdjson_is_valid(string $json, int $depth = 512) : ?bool {}
123+
124+
/**
125+
* Parses $json and returns the number of keys in $json matching the JSON pointer $key
126+
*
127+
* @returns ?bool (null if depth is invalid)
128+
*/
129+
function simdjson_key_count(string $json, string $key, int $depth = 512) : ?int {}
130+
131+
/**
132+
* Returns true if the JSON pointer $key could be found.
133+
*
134+
* @returns ?bool (null if depth is invalid, false if json is invalid or key is not found)
135+
*/
136+
function simdjson_key_exists(string $json, string $key, int $depth = 512) : ?bool {}
137+
138+
/**
139+
* Returns the value at $key
140+
*
141+
* @returns array|stdClass|string|float|int|bool|null the value at $key
142+
* @throws RuntimeException for invalid JSON (or document over 4GB, or out of range integer/float)
143+
*/
144+
function simdjson_key_value(string $json, string $key, bool $assoc = unknown, int $depth = unknown) {}
145+
```
146+
147+
## Edge cases
148+
149+
There are some differences from `json_decode()` due to the implementation of the underlying simdjson library. This will throw a RuntimeException if simdjson rejects the JSON.
150+
151+
1) `simdjson_decode()` how out of range 64-bit integers and floats are handled.
152+
153+
See https://github.com/simdjson/simdjson/blob/master/doc/basics.md#standard-compliance
154+
155+
> - The specification allows implementations to set limits on the range and precision of numbers accepted. We support 64-bit floating-point numbers as well as integer values.
156+
> - We parse integers and floating-point numbers as separate types which allows us to support all signed (two's complement) 64-bit integers, like a Java `long` or a C/C++ `long long` and all 64-bit unsigned integers. When we cannot represent exactly an integer as a signed or unsigned 64-bit value, we reject the JSON document.
157+
> - We support the full range of 64-bit floating-point numbers (binary64). The values range from `std::numeric_limits<double>::lowest()` to `std::numeric_limits<double>::max()`, so from -1.7976e308 all the way to 1.7975e308. Extreme values (less or equal to -1e308, greater or equal to 1e308) are rejected: we refuse to parse the input document. Numbers are parsed with a perfect accuracy (ULP 0): the nearest floating-point value is chosen, rounding to even when needed. If you serialized your floating-point numbers with 17 significant digits in a standard compliant manner, the simdjson library is guaranteed to recover the same numbers, exactly.
158+
159+
2) The maximum string length that can be passed to `simdjson_decode()` is 4GiB (4294967295 bytes).
160+
`json_decode()` can decode longer strings.
161+
162+
3) The handling of max depth is counted slightly differently for empty vs non-empty objects/arrays.
163+
In `json_decode`, an array with a scalar has the same depth as an array with no elements.
164+
In `simdjson_decode`, an array with a scalar is one level deeper than an array with no elements.
165+
For typical use cases, this shouldn't matter.
166+
(e.g. `simdjson_decode('[[]]', true, 2)` will succeed but `json_decode('[[]]', true, 2)` and `simdjson_decode('[[1]]', true, 2)` will fail.)
167+
100168
## Benchmarks
101169
See the [benchmark](./benchmark) folder for more benchmarks.

package.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252
<file name="64bit_support.phpt" role="test"/>
5353
<file name="decode_args.phpt" role="test"/>
5454
<file name="decode_exception.phpt" role="test"/>
55+
<file name="decode_integer_overflow.phpt" role="test"/>
5556
<file name="decode_invalid_property.phpt" role="test"/>
5657
<file name="decode_max_depth.phpt" role="test"/>
5758
<file name="decode_result.phpt" role="test"/>

tests/decode_integer_overflow.phpt

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
--TEST--
2+
simdjson_decode throws for integer syntax out of signed/unsigned 64-bit range due to C simdjson library
3+
--SKIPIF--
4+
<?php if (PHP_INT_SIZE < 8) echo "skip 64-bit test only\n"; ?>
5+
--INI--
6+
; in php 8.0 var_dump started using serialize_precision instead of precision
7+
serialize_precision=20
8+
precision=20
9+
--FILE--
10+
<?php
11+
// https://github.com/simdjson/simdjson/blob/master/doc/basics.md#standard-compliance
12+
// > - The specification allows implementations to set limits on the range and precision of numbers accepted. We support 64-bit floating-point numbers as well as integer values.
13+
// > - We parse integers and floating-point numbers as separate types which allows us to support all signed (two's complement) 64-bit integers, like a Java `long` or a C/C++ `long long` and all 64-bit unsigned integers. When we cannot represent exactly an integer as a signed or unsigned 64-bit value, we reject the JSON document.
14+
// > - We support the full range of 64-bit floating-point numbers (binary64). The values range from `std::numeric_limits<double>::lowest()` to `std::numeric_limits<double>::max()`, so from -1.7976e308 all the way to 1.7975e308. Extreme values (less or equal to -1e308, greater or equal to 1e308) are rejected: we refuse to parse the input document. Numbers are parsed with a perfect accuracy (ULP 0): the nearest floating-point value is chosen, rounding to even when needed. If you serialized your floating-point numbers with 17 significant digits in a standard compliant manner, the simdjson library is guaranteed to recover the same numbers, exactly.
15+
function dump_result(string $x) {
16+
echo "Testing " . var_export($x, true) . "\n";
17+
try {
18+
var_dump(simdjson_decode($x));
19+
} catch (Exception $e) {
20+
printf("Caught %s: %s\n", get_class($e), $e->getMessage());
21+
}
22+
}
23+
dump_result('18446744073709551615');
24+
dump_result('18446744073709551615.0');
25+
dump_result('18446744073709551615E0');
26+
dump_result('18446744073709551616'); // simdjson_decode throws but json_decode doesn't.
27+
dump_result('18446744073709551616.0');
28+
dump_result('-9223372036854775808');
29+
dump_result('-9223372036854775809');
30+
dump_result('-9223372036854775809.0');
31+
dump_result('1e307');
32+
dump_result('1e309');
33+
?>
34+
--EXPECT--
35+
Testing '18446744073709551615'
36+
float(18446744073709551616)
37+
Testing '18446744073709551615.0'
38+
float(18446744073709551616)
39+
Testing '18446744073709551615E0'
40+
float(18446744073709551616)
41+
Testing '18446744073709551616'
42+
Caught RuntimeException: Problem while parsing a number
43+
Testing '18446744073709551616.0'
44+
float(18446744073709551616)
45+
Testing '-9223372036854775808'
46+
int(-9223372036854775808)
47+
Testing '-9223372036854775809'
48+
Caught RuntimeException: Problem while parsing a number
49+
Testing '-9223372036854775809.0'
50+
float(-9223372036854775808)
51+
Testing '1e307'
52+
float(9.9999999999999998603E+306)
53+
Testing '1e309'
54+
Caught RuntimeException: Problem while parsing a number

tests/depth.phpt

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,16 @@ try {
1717
}
1818
var_dump(simdjson_decode('[1]', true, 2));
1919
// XXX there's a difference between simdjson_decode and json_decode.
20-
// In json_decode, an array with no elements has the same depth as an array of scalars.
21-
// In simdjson_decode, an array with no elements is deeper than an array with no elements.
22-
// For typical use cases this shouldn't matter.
20+
// In json_decode, an array with a scalar has the same depth as an array with no elements.
21+
// In simdjson_decode, an array with a scalar is deeper than an array with no elements.
22+
// For typical use cases, this shouldn't matter.
2323
try {
2424
var_dump(simdjson_decode('[[1]]', true, 2));
2525
} catch (RuntimeException $e) {
2626
echo "Caught for [[1]]: {$e->getMessage()}\n";
2727
}
28+
var_dump(simdjson_decode('[[]]', true, 2));
2829
var_dump(simdjson_decode('[[1]]', true, 3));
29-
3030
?>
3131
--EXPECTF--
3232
Warning: simdjson_decode(): Depth must be greater than zero in %s on line 2
@@ -43,6 +43,11 @@ array(1) {
4343
int(1)
4444
}
4545
Caught for [[1]]: The JSON document was too deep (too many nested objects and arrays)
46+
array(1) {
47+
[0]=>
48+
array(0) {
49+
}
50+
}
4651
array(1) {
4752
[0]=>
4853
array(1) {

0 commit comments

Comments
 (0)