Bug Report
Steps to Reproduce
This repository is a minimal reproduction for an issue where loading an Arrow IPC stream containing multiple record batches into Perspective appears to replace some null values with type-specific defaults. The same logical data, encoded as a single record batch, preserves null values correctly.
This repro intentionally avoids the Perspective datagrid and any backend/networking code so the issue can be demonstrated with the smallest possible surface area.
Repository contents:
.
├── package.json
├── vite.config.js
└── repro.html
repro.html: primary browser repro using @perspective-dev/client and apache-arrow
package.json: JavaScript dependencies and scripts for the browser repro
vite.config.js: vite config
The repro constructs three one-row Arrow RecordBatch objects, each with the same schema. The input logical data is:
[
{ "Identifier": "A", "Value": null, "Date": null },
{ "Identifier": "B", "Value": 5, "Date": null },
{ "Identifier": "C", "Value": null, "Date": "2025-06-15" }
]
The repro compares two cases:
-
Multi-batch Arrow IPC stream:
- 3 record batches
- 1 row per batch
-
Single-batch Arrow IPC stream:
Browser repro
Install dependencies and run the local dev server:
Open the local URL printed by Vite, usually:
http://127.0.0.1:5173/repro.html
Then open the browser developer console and compare the logged output for:
Input Values
Input Dates
Multi-batch output
Single-batch output
Expected Result
Both the multi-batch and single-batch Arrow IPC streams should preserve null values.
Expected output:
[
{ "Identifier": "A", "Value": null, "Date": null },
{ "Identifier": "B", "Value": 5, "Date": null },
{ "Identifier": "C", "Value": null, "Date": 1749945600000 }
]
I would expect multi-batch Arrow IPC streams to be supported equivalently to single-batch Arrow IPC streams, since both represent the same logical Arrow table.
Actual Result
The single-batch Arrow IPC stream preserves null values correctly.
The multi-batch Arrow IPC stream appears to replace some null values with type-specific defaults.
Observed:
Multi-batch output:
{'Identifier': 'A', 'Value': 0, 'Date': -62167305600000}
{'Identifier': 'B', 'Value': 5, 'Date': -62167305600000}
{'Identifier': 'C', 'Value': null, 'Date': 1749945600000}
Single-batch output from Perspective:
{'Identifier': 'A', 'Value': null, 'Date': null}
{'Identifier': 'B', 'Value': 5, 'Date': null}
{'Identifier': 'C', 'Value': null, 'Date': 1749945600000}
The incorrect values are visible in the console output from view.to_json().
Environment
Browser repro:
@perspective-dev/client: 4.4.1
apache-arrow: 21.1.0
vite: 5.4.11
- Browser: Chrome Version 145.0.7632.160
- OS: MacOS 26.3.1
Additional Context
I originally encountered this while working on an application using the Perspective datagrid. In that application, Arrow IPC data is produced by a backend that may naturally produce multiple record batches, then loaded into a Perspective-backed UI. I intentionally did not include the datagrid, backend, HTTP streaming, or app-specific code in this repro. The problem does not appear to be caused by HTTP streaming or by splitting network chunks, because the minimal repro constructs the Arrow data entirely in memory and still reproduces the issue.
A workaround is to consolidate the Arrow data into a single batch before serialization. In PyArrow, calling something like table.combine_chunks() before writing the IPC stream avoids the issue. However, that requires an additional copy and is expensive for large datasets.
Question: is loading multi-batch Arrow IPC streams expected to be supported by client.table()? If so, this appears to be a null-handling bug across record batch boundaries.
Bug Report
Steps to Reproduce
This repository is a minimal reproduction for an issue where loading an Arrow IPC stream containing multiple record batches into Perspective appears to replace some null values with type-specific defaults. The same logical data, encoded as a single record batch, preserves null values correctly.
This repro intentionally avoids the Perspective datagrid and any backend/networking code so the issue can be demonstrated with the smallest possible surface area.
Repository contents:
repro.html: primary browser repro using@perspective-dev/clientandapache-arrowpackage.json: JavaScript dependencies and scripts for the browser reprovite.config.js: vite configThe repro constructs three one-row Arrow
RecordBatchobjects, each with the same schema. The input logical data is:[ { "Identifier": "A", "Value": null, "Date": null }, { "Identifier": "B", "Value": 5, "Date": null }, { "Identifier": "C", "Value": null, "Date": "2025-06-15" } ]The repro compares two cases:
Multi-batch Arrow IPC stream:
Single-batch Arrow IPC stream:
Browser repro
Install dependencies and run the local dev server:
Open the local URL printed by Vite, usually:
Then open the browser developer console and compare the logged output for:
Expected Result
Both the multi-batch and single-batch Arrow IPC streams should preserve null values.
Expected output:
[ { "Identifier": "A", "Value": null, "Date": null }, { "Identifier": "B", "Value": 5, "Date": null }, { "Identifier": "C", "Value": null, "Date": 1749945600000 } ]I would expect multi-batch Arrow IPC streams to be supported equivalently to single-batch Arrow IPC streams, since both represent the same logical Arrow table.
Actual Result
The single-batch Arrow IPC stream preserves null values correctly.
The multi-batch Arrow IPC stream appears to replace some null values with type-specific defaults.
Observed:
The incorrect values are visible in the console output from
view.to_json().Environment
Browser repro:
@perspective-dev/client: 4.4.1apache-arrow: 21.1.0vite: 5.4.11Additional Context
I originally encountered this while working on an application using the Perspective datagrid. In that application, Arrow IPC data is produced by a backend that may naturally produce multiple record batches, then loaded into a Perspective-backed UI. I intentionally did not include the datagrid, backend, HTTP streaming, or app-specific code in this repro. The problem does not appear to be caused by HTTP streaming or by splitting network chunks, because the minimal repro constructs the Arrow data entirely in memory and still reproduces the issue.
A workaround is to consolidate the Arrow data into a single batch before serialization. In PyArrow, calling something like
table.combine_chunks()before writing the IPC stream avoids the issue. However, that requires an additional copy and is expensive for large datasets.Question: is loading multi-batch Arrow IPC streams expected to be supported by
client.table()? If so, this appears to be a null-handling bug across record batch boundaries.