Skip to content

Add QDP backend detection and pure-PyTorch reference implementations#1189

Open
ryankert01 wants to merge 4 commits intoapache:mainfrom
ryankert01:add-pytorch-reference
Open

Add QDP backend detection and pure-PyTorch reference implementations#1189
ryankert01 wants to merge 4 commits intoapache:mainfrom
ryankert01:add-pytorch-reference

Conversation

@ryankert01
Copy link
Copy Markdown
Member

@ryankert01 ryankert01 commented Mar 15, 2026

Closes #1177

Summary

This PR adds a pure-PyTorch reference backend to the QDP Python package to compare with our implementation of GPU kernel~

Benchmark Results

All runs: 100 batches x 64 vectors (except 18-qubit: 50 batches x 64), median of 3 trials.

Amplitude Encoding

Qubits Mode PyTorch CPU PyTorch GPU Mahout Mahout vs GPU
10 encode-only 462,882 1,390,998 567,499 0.4x
10 end-to-end 73,699 86,939 234,170 2.7x
14 encode-only 118,620 789,862 151,713 0.2x
14 end-to-end 5,514 6,356 25,603 4.0x
16 encode-only 5,458 228,525 65,358 0.3x
16 end-to-end 721 964 6,336 6.6x
18 encode-only 1,313 58,761 15,876 0.3x
18 end-to-end 194 237 1,529 6.5x

Angle Encoding

Qubits Mode PyTorch CPU PyTorch GPU Mahout Mahout vs GPU
14 encode-only 1,086 59,864 45,332 0.8x
14 end-to-end 1,114 56,456 50,919 0.9x
16 encode-only 262 10,254 8,032 0.8x
16 end-to-end 252 10,093 11,496 1.1x

IQP Encoding

Qubits Mode PyTorch CPU PyTorch GPU Mahout Mahout vs GPU
10 encode-only 15,381 74,239 484,071 6.5x
14 encode-only 1,258 25,597 55,304 2.2x

Analysis

  • Amplitude encode-only: PyTorch GPU wins 2-4x because its vectorized L2-norm + pad is very efficient, while Mahout's encode path still pays per-batch GPU output allocation + D2H norm validation sync overhead.
  • Amplitude end-to-end: Mahout wins 2.7-6.6x, with the advantage growing at higher qubit counts. Rust data generation + integrated pipeline dominates Python generate_batch_data + torch.tensor + H2D transfer.
  • Angle: Near-parity in both modes (0.8-1.1x). The tensor-product encoding is compute-bound and both implementations are similarly efficient.
  • IQP encode-only: Mahout wins decisively (2.2-6.5x). Mahout's CUDA kernel for IQP (Walsh-Hadamard + phase computation) is significantly faster than PyTorch's Python-level loop over butterfly stages.

Known Limitations

  • Basis encode-only: Rust engine.encode expects per-sample basis indices; batch input format differs from PyTorch. Requires Rust API change to support.
  • IQP end-to-end: Rust pipeline uses 1 << num_qubits as sample_size regardless of encoding method, causing a mismatch for IQP (which expects n + n*(n-1)/2). Pre-existing Rust pipeline bug.

@ryankert01
Copy link
Copy Markdown
Member Author

cc @viiccwen @vvvdwbvvv for review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a pure-PyTorch reference implementation for QDP encoders plus backend detection/selection plumbing, enabling comparisons against the Rust+CUDA path and allowing parts of qumat_qdp to function when _qdp isn’t built.

Changes:

  • Added PyTorch reference implementations for amplitude/angle/basis/IQP encodings with a string-based dispatcher.
  • Introduced backend detection utilities and exposed backend info via qumat_qdp exports; added explicit backend selection to QdpBenchmark and QuantumDataLoader.
  • Added new tests for the PyTorch reference encoders and for behavior when _qdp is unavailable; added a benchmark script supporting “encode-only” vs “end-to-end” modes.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
testing/qdp_python/test_torch_ref.py New unit tests for PyTorch reference encoders + optional cross-validation vs _qdp.
testing/qdp_python/test_fallback.py Tests for backend detection and explicit PyTorch backend behavior when _qdp is missing.
testing/conftest.py Adjusts skip logic so selected tests can run without _qdp.
qdp/qdp-python/qumat_qdp/torch_ref.py Implements pure-PyTorch reference encoders and an encode() dispatcher.
qdp/qdp-python/qumat_qdp/loader.py Adds explicit `.backend('rust'
qdp/qdp-python/qumat_qdp/api.py Adds `.backend('rust'
qdp/qdp-python/qumat_qdp/_backend.py Adds backend detection (Backend enum, get_backend, force_backend, get_qdp, get_torch).
qdp/qdp-python/qumat_qdp/__init__.py Makes qumat_qdp importable without _qdp; exports BACKEND/Backend and safe _qdp symbols.
qdp/qdp-python/benchmark/benchmark_pytorch_ref.py Adds benchmark script comparing PyTorch vs Mahout with --mode (encode-only/end-to-end).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ryankert01
Copy link
Copy Markdown
Member Author

update at df90a6e

@ryankert01
Copy link
Copy Markdown
Member Author

I already implements the amplitude one and successfully increase the speed to slightly faster than torch.compile.

- Implemented backend detection and selection logic in _backend.py, prioritizing Rust+CUDA, PyTorch, and fallback to None.
- Added pure-PyTorch reference implementations for quantum data encoding methods in torch_ref.py, including amplitude, angle, basis, and IQP encoding.
- Created comprehensive tests for fallback mechanisms and pure-PyTorch encodings in test_fallback.py and test_torch_ref.py, ensuring functionality without the Rust extension.
- Enhanced error handling and validation across encoding methods to ensure robustness.
@ryankert01 ryankert01 force-pushed the add-pytorch-reference branch from df90a6e to d814d7a Compare April 2, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a PyTorch reference implementation

3 participants