Add QDP backend detection and pure-PyTorch reference implementations by ryankert01 · Pull Request #1189 · apache/mahout

ryankert01 · 2026-03-15T12:15:21Z

Summary

This PR adds a pure-PyTorch reference backend to the QDP Python package to compare with our implementation of GPU kernel~

Benchmark Results

All runs: 100 batches x 64 vectors (except 18-qubit: 50 batches x 64), median of 3 trials.

Amplitude Encoding

Qubits	Mode	PyTorch CPU	PyTorch GPU	Mahout	Mahout vs GPU
10	encode-only	462,882	1,390,998	567,499	0.4x
10	end-to-end	73,699	86,939	234,170	2.7x
14	encode-only	118,620	789,862	151,713	0.2x
14	end-to-end	5,514	6,356	25,603	4.0x
16	encode-only	5,458	228,525	65,358	0.3x
16	end-to-end	721	964	6,336	6.6x
18	encode-only	1,313	58,761	15,876	0.3x
18	end-to-end	194	237	1,529	6.5x

Angle Encoding

Qubits	Mode	PyTorch CPU	PyTorch GPU	Mahout	Mahout vs GPU
14	encode-only	1,086	59,864	45,332	0.8x
14	end-to-end	1,114	56,456	50,919	0.9x
16	encode-only	262	10,254	8,032	0.8x
16	end-to-end	252	10,093	11,496	1.1x

IQP Encoding

Qubits	Mode	PyTorch CPU	PyTorch GPU	Mahout	Mahout vs GPU
10	encode-only	15,381	74,239	484,071	6.5x
14	encode-only	1,258	25,597	55,304	2.2x

Analysis

Amplitude encode-only: PyTorch GPU wins 2-4x because its vectorized L2-norm + pad is very efficient, while Mahout's encode path still pays per-batch GPU output allocation + D2H norm validation sync overhead.
Amplitude end-to-end: Mahout wins 2.7-6.6x, with the advantage growing at higher qubit counts. Rust data generation + integrated pipeline dominates Python generate_batch_data + torch.tensor + H2D transfer.
Angle: Near-parity in both modes (0.8-1.1x). The tensor-product encoding is compute-bound and both implementations are similarly efficient.
IQP encode-only: Mahout wins decisively (2.2-6.5x). Mahout's CUDA kernel for IQP (Walsh-Hadamard + phase computation) is significantly faster than PyTorch's Python-level loop over butterfly stages.

Known Limitations

Basis encode-only: Rust engine.encode expects per-sample basis indices; batch input format differs from PyTorch. Requires Rust API change to support.
IQP end-to-end: Rust pipeline uses 1 << num_qubits as sample_size regardless of encoding method, causing a mismatch for IQP (which expects n + n*(n-1)/2). Pre-existing Rust pipeline bug.

ryankert01 · 2026-03-29T10:49:31Z

cc @viiccwen @vvvdwbvvv for review

Copilot

Pull request overview

This PR introduces a pure-PyTorch reference implementation for QDP encoders plus backend detection/selection plumbing, enabling comparisons against the Rust+CUDA path and allowing parts of qumat_qdp to function when _qdp isn’t built.

Changes:

Added PyTorch reference implementations for amplitude/angle/basis/IQP encodings with a string-based dispatcher.
Introduced backend detection utilities and exposed backend info via qumat_qdp exports; added explicit backend selection to QdpBenchmark and QuantumDataLoader.
Added new tests for the PyTorch reference encoders and for behavior when _qdp is unavailable; added a benchmark script supporting “encode-only” vs “end-to-end” modes.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`testing/qdp_python/test_torch_ref.py`	New unit tests for PyTorch reference encoders + optional cross-validation vs `_qdp`.
`testing/qdp_python/test_fallback.py`	Tests for backend detection and explicit PyTorch backend behavior when `_qdp` is missing.
`testing/conftest.py`	Adjusts skip logic so selected tests can run without `_qdp`.
`qdp/qdp-python/qumat_qdp/torch_ref.py`	Implements pure-PyTorch reference encoders and an `encode()` dispatcher.
`qdp/qdp-python/qumat_qdp/loader.py`	Adds explicit `.backend('rust'
`qdp/qdp-python/qumat_qdp/api.py`	Adds `.backend('rust'
`qdp/qdp-python/qumat_qdp/_backend.py`	Adds backend detection (`Backend` enum, `get_backend`, `force_backend`, `get_qdp`, `get_torch`).
`qdp/qdp-python/qumat_qdp/__init__.py`	Makes `qumat_qdp` importable without `_qdp`; exports `BACKEND`/`Backend` and safe `_qdp` symbols.
`qdp/qdp-python/benchmark/benchmark_pytorch_ref.py`	Adds benchmark script comparing PyTorch vs Mahout with `--mode` (`encode-only`/`end-to-end`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

qdp/qdp-python/qumat_qdp/loader.py

testing/qdp_python/test_torch_ref.py

qdp/qdp-python/qumat_qdp/api.py

qdp/qdp-python/qumat_qdp/loader.py

ryankert01 · 2026-03-29T14:30:04Z

update at df90a6e

ryankert01 · 2026-04-02T13:26:02Z

I already implements the amplitude one and successfully increase the speed to slightly faster than torch.compile.

- Implemented backend detection and selection logic in _backend.py, prioritizing Rust+CUDA, PyTorch, and fallback to None. - Added pure-PyTorch reference implementations for quantum data encoding methods in torch_ref.py, including amplitude, angle, basis, and IQP encoding. - Created comprehensive tests for fallback mechanisms and pure-PyTorch encodings in test_fallback.py and test_torch_ref.py, ensuring functionality without the Rust extension. - Enhanced error handling and validation across encoding methods to ensure robustness.

…tion

ryankert01 requested review from 400Ping and guan404ming as code owners March 15, 2026 12:15

ryankert01 requested a review from rich7420 March 29, 2026 10:09

ryankert01 force-pushed the add-pytorch-reference branch from 562e6c2 to be21049 Compare March 29, 2026 10:10

ryankert01 requested review from Copilot and removed request for 400Ping, guan404ming and rich7420 March 29, 2026 11:02

Copilot started reviewing on behalf of ryankert01 March 29, 2026 11:02 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

ryankert01 mentioned this pull request Mar 29, 2026

[Roadmap] Strengthen CUDA kernel implementation #1227

Open

400Ping requested review from 400Ping, guan404ming and rich7420 March 30, 2026 14:58

400Ping self-assigned this Mar 30, 2026

ryankert01 mentioned this pull request Mar 30, 2026

feat: implement IQP encoding benchmark and reference against Torch #1231

Closed

9 tasks

ryankert01 added 4 commits April 2, 2026 23:18

update

d268ee6

fix precommit

9ab714a

feat: enhance CUDA device handling and update sample vector documenta…

d814d7a

…tion

ryankert01 force-pushed the add-pytorch-reference branch from df90a6e to d814d7a Compare April 2, 2026 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QDP backend detection and pure-PyTorch reference implementations#1189

Add QDP backend detection and pure-PyTorch reference implementations#1189
ryankert01 wants to merge 4 commits intoapache:mainfrom
ryankert01:add-pytorch-reference

ryankert01 commented Mar 15, 2026 •

edited

Loading

Uh oh!

ryankert01 commented Mar 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryankert01 commented Mar 29, 2026

Uh oh!

ryankert01 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ryankert01 commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark Results

Amplitude Encoding

Angle Encoding

IQP Encoding

Analysis

Known Limitations

Uh oh!

ryankert01 commented Mar 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryankert01 commented Mar 29, 2026

Uh oh!

ryankert01 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ryankert01 commented Mar 15, 2026 •

edited

Loading