Skip to content

Commit b7c59f6

Browse files
committed
Readme fix
1 parent 1f30625 commit b7c59f6

4 files changed

Lines changed: 14 additions & 7 deletions

File tree

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,5 +56,5 @@ This file helps AI agents discover and understand how to work with this reposito
5656
- Expanded `python/t81/__init__.py` so the higher-level `t81` package re-exports the compiled binding helpers (`t81lib`, `BigInt`, `Limb`, `gemm_ternary`, etc.) while staying import-safe when the extension is unavailable.
5757
- Added `scripts/ternary_quantization_benchmark.py` plus `BENCHMARKS.md` so contributors can reproduce a Fashion-MNIST FP32/PTQ/QAT benchmark and log accuracy/latency/storage for each mode; README now links the benchmark doc.
5858
- Rewrote `pyproject.toml` with valid TOML sections so editable installs (and `pip install -e '.[torch]'`) can parse the metadata cleanly before building the extension.
59-
- Restructured `README.md` into a onboarding-focused front door and added companion docs (`docs/use-cases.md`, `docs/hardware.md`, `docs/api-overview.md`, `docs/python-install.md`, `docs/torch.md`, `examples/README.md`) so heavy reference material lives outside the visitor-facing overview.
59+
- Restructured `README.md` into a onboarding-focused front door and added companion docs (`docs/use-cases.md`, `docs/hardware.md`, `docs/api-overview.md`, `docs/python-install.md`, `docs/torch.md`, `docs/gpu.md`, `examples/README.md`) so heavy reference material lives outside the visitor-facing overview.
6060
- Added optional CUDA/ROCm toggles plus a GPU dispatcher sketch (`include/t81/linalg/gemm_gpu.hpp`, `src/linalg/{gemm_cuda.cu,gemm_dispatch.cpp,gemm_rocm.cpp}`) so future teams can wire the new `where`/`clamp`/`lerp`/`addcmul` helpers into GPU kernels, introduced `t81::TensorMetadata` + Python helpers (`python/bindings.cpp`) that extract metadata from NumPy/Torch tensors, and expanded `tests/python/test_gpu_ops.py` to cover the metadata-backed bindings on both CPU and GPU paths.

README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,16 @@ AI workflows.
1818
#include <t81/t81lib.hpp>
1919

2020
int main() {
21-
t81::Int sum = t81::Int{1} + t81::Int{2};
22-
return sum == 3 ? 0 : 1;
21+
using t81::Int;
22+
Int sum = Int::from_int(1) + Int::from_int(2);
23+
return (sum == Int::from_int(3)) ? 0 : 1;
2324
}
2425
```
2526

2627
```python
2728
import t81lib
2829

29-
print(t81lib.Float.from_string("1.5") + t81lib.Float.from_string("1.5"))
30+
print(t81lib.BigInt(3) * t81lib.BigInt(7))
3031
```
3132

3233
## Who is this for?
@@ -97,9 +98,9 @@ target_link_libraries(... t81::t81lib)
9798

9899
`pip install .[torch]` unlocks the `t81lib`/`t81` namespace, NumPy quantization helpers, and the `t81.torch`/`t81.nn` layers that mix ternary weights with FP32/BF16 biases. Jump deeper via [docs/python-api.md](docs/python-api.md), [docs/python-cookbook.md](docs/python-cookbook.md), and [docs/torch.md](docs/torch.md).
99100

100-
## GPU backends & tensor metadata
101+
## GPU backends
101102

102-
Enable CUDA/ROCm through the optional `-DUSE_CUDA=ON` and `-DUSE_ROCM=ON` flags during CMake configuration so the Python bindings link against the new GPU kernels (`python/CMakeLists.txt`). Once enabled, `t81lib.where`, `t81lib.clamp`, `t81lib.lerp`, and `t81lib.addcmul` accept either NumPy buffers or PyTorch tensors and route the work directly to CUDA/HIP kernels via the lightweight [`t81::TensorMetadata`](include/t81/tensor_metadata.hpp) ABI. The metadata struct carries device/dtype/shape/stride info plus raw `data_ptr`, letting the dispatcher avoid host copies and keep outputs on-device. When torch is installed, `t81lib` automatically wraps GPU tensors; when only NumPy is available it falls back to CPU buffers. Consult [docs/torch.md](docs/torch.md) and `python/bindings.cpp` for the extraction helpers and lifetime semantics.
103+
Optional CUDA/ROCm backends can be enabled with `-DUSE_CUDA=ON` / `-DUSE_ROCM=ON` so the Python bindings link against the GPU kernels. `t81lib` exposes a compact `TensorMetadata` ABI that carries device, dtype, shape, and stride info, allowing `where`, `clamp`, `lerp`, and `addcmul` to work directly on NumPy arrays or Torch tensors. See [docs/gpu.md](docs/gpu.md) for build flags, device routing, and tensor metadata details.
103104

104105
## CLI helpers
105106

@@ -130,7 +131,7 @@ See [docs/api-overview.md](docs/api-overview.md) for the full surface described
130131

131132
## Stability & compatibility
132133

133-
- Supported toolchains: recent Clang/GCC/MSVC or `pip install`’s compatible CPython builds; CMake config defaults to host SIMD if available (AVX2/AVX-512, NEON) while falling back to portable kernels.
134+
- Supported toolchains: C++20-capable Clang/GCC/MSVC (or `pip install`’s compatible CPython builds) with CMake ≥ 3.22; the build auto-detects AVX2/AVX-512/NEON and falls back to portable kernels when those SIMD targets are unavailable.
134135
- We track the ABI/API surface via `include/t81/t81lib.hpp`; expect the core headers to evolve until we reach a stable v1 release and consult [CHANGELOG.md](CHANGELOG.md) for migration notes.
135136

136137
## Docs & resources

docs/gpu.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# GPU backends & tensor metadata
2+
3+
CUDA/ROCm kernels can be built when you configure with `-DUSE_CUDA=ON` or `-DUSE_ROCM=ON` (see `python/CMakeLists.txt`). The bindings expose `t81lib.where`, `t81lib.clamp`, `t81lib.lerp`, and `t81lib.addcmul`, which accept either NumPy buffers or PyTorch tensors and dispatch directly to the GPU kernels.
4+
5+
Dispatch relies on `t81::TensorMetadata` (`include/t81/tensor_metadata.hpp`): a lightweight struct that carries device tags, dtype codes, shape, strides, and `data_ptr` so the dispatcher can call the right CUDA/HIP kernel without copies. When torch is available, `t81lib` automatically wraps tensors; without torch it gracefully falls back to CPU buffers. Review `python/bindings.cpp` for the extraction helpers and lifetime management.

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ to understand the balanced ternary engine without digging through specs immediat
3838
from command-line workflows.
3939
- **Use cases & demos**[`docs/use-cases.md`](use-cases.md) and [`examples/README.md`](../examples/README.md) capture the canonical scripts, notebooks, and research stories.
4040
- **Hardware simulation**[`docs/hardware.md`](hardware.md) details `t81.hardware.TernaryEmulator`, fuzzy helpers, and the visualizer notebook.
41+
- **GPU backends**[`docs/gpu.md`](gpu.md) explains the CUDA/ROCm build flags and tensor metadata routing.
4142
- **API overview**[`docs/api-overview.md`](api-overview.md) summarizes the numeric containers and helpers exposed via `<t81/t81lib.hpp>`.
4243
- **Tests & benchmarks**[`tests/`](../tests/) documents the unit/property coverage while [`bench/`](../bench/) shows throughput patterns.
4344

0 commit comments

Comments
 (0)