Skip to content

Null values replaced with type defaults when loading multi-batch Arrow IPC streams #3169

@tditmars

Description

@tditmars

Bug Report

Steps to Reproduce

This repository is a minimal reproduction for an issue where loading an Arrow IPC stream containing multiple record batches into Perspective appears to replace some null values with type-specific defaults. The same logical data, encoded as a single record batch, preserves null values correctly.

This repro intentionally avoids the Perspective datagrid and any backend/networking code so the issue can be demonstrated with the smallest possible surface area.

Repository contents:

.
├── package.json
├── vite.config.js
└── repro.html
  • repro.html: primary browser repro using @perspective-dev/client and apache-arrow
  • package.json: JavaScript dependencies and scripts for the browser repro
  • vite.config.js: vite config

The repro constructs three one-row Arrow RecordBatch objects, each with the same schema. The input logical data is:

[
  { "Identifier": "A", "Value": null, "Date": null },
  { "Identifier": "B", "Value": 5, "Date": null },
  { "Identifier": "C", "Value": null, "Date": "2025-06-15" }
]

The repro compares two cases:

  1. Multi-batch Arrow IPC stream:

    • 3 record batches
    • 1 row per batch
  2. Single-batch Arrow IPC stream:

    • 1 record batch
    • 3 rows

Browser repro

Install dependencies and run the local dev server:

npm install
npm run dev

Open the local URL printed by Vite, usually:

http://127.0.0.1:5173/repro.html

Then open the browser developer console and compare the logged output for:

Input Values
Input Dates
Multi-batch output
Single-batch output

Expected Result

Both the multi-batch and single-batch Arrow IPC streams should preserve null values.

Expected output:

[
  { "Identifier": "A", "Value": null, "Date": null },
  { "Identifier": "B", "Value": 5, "Date": null },
  { "Identifier": "C", "Value": null, "Date": 1749945600000 }
]

I would expect multi-batch Arrow IPC streams to be supported equivalently to single-batch Arrow IPC streams, since both represent the same logical Arrow table.

Actual Result

The single-batch Arrow IPC stream preserves null values correctly.

The multi-batch Arrow IPC stream appears to replace some null values with type-specific defaults.

Observed:

Multi-batch output:
{'Identifier': 'A', 'Value': 0, 'Date': -62167305600000}
{'Identifier': 'B', 'Value': 5, 'Date': -62167305600000}
{'Identifier': 'C', 'Value': null, 'Date': 1749945600000}
Single-batch output from Perspective:
{'Identifier': 'A', 'Value': null, 'Date': null}
{'Identifier': 'B', 'Value': 5, 'Date': null}
{'Identifier': 'C', 'Value': null, 'Date': 1749945600000}

The incorrect values are visible in the console output from view.to_json().

Image

Environment

Browser repro:

  • @perspective-dev/client: 4.4.1
  • apache-arrow: 21.1.0
  • vite: 5.4.11
  • Browser: Chrome Version 145.0.7632.160
  • OS: MacOS 26.3.1

Additional Context

I originally encountered this while working on an application using the Perspective datagrid. In that application, Arrow IPC data is produced by a backend that may naturally produce multiple record batches, then loaded into a Perspective-backed UI. I intentionally did not include the datagrid, backend, HTTP streaming, or app-specific code in this repro. The problem does not appear to be caused by HTTP streaming or by splitting network chunks, because the minimal repro constructs the Arrow data entirely in memory and still reproduces the issue.

A workaround is to consolidate the Arrow data into a single batch before serialization. In PyArrow, calling something like table.combine_chunks() before writing the IPC stream avoids the issue. However, that requires an additional copy and is expensive for large datasets.

Question: is loading multi-batch Arrow IPC streams expected to be supported by client.table()? If so, this appears to be a null-handling bug across record batch boundaries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConcrete, reproducible bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions