Skip to content

Commit ed64ff2

Browse files
Update README.md
1 parent c0fb18a commit ed64ff2

1 file changed

Lines changed: 16 additions & 3 deletions

File tree

README.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ This is especially useful for workloads such as low-bit LLM inference, where dec
1616
- [Benchmark Results](#benchmark-results-)
1717
- [Updates](#updates-)
1818
- [Project Structure](#project-structure-%EF%B8%8F)
19+
- [Citation](#citation-)
1920

2021
## Demo 🎬
2122
Inference on CPU for a 1.58-bit LLM decoding step. Click the image to view the original high-quality video. `HF` denotes the Hugging Face baseline running `bfloat16` on PyTorch.
@@ -63,9 +64,8 @@ CLI args for integrations/hf/model_prep.py:
6364
```
6465

6566
> [!NOTE]
66-
> `k` is hardware-dependent, so run the `best_k` benchmark on the same machine
67-
> and device you plan to use for inference, then reuse the generated JSON. If
68-
> no `best_k_{device}.json` is found, `model_prep.py` falls back to `--k`.
67+
> `k` might be hardware-dependent, so run the `best_k` benchmark on the same machine
68+
> and device you plan to use for inference, then reuse the generated JSON.
6969
7070
### Run model inference 🤖
7171
Use `integrations/hf/model_infer.py` to run generation from a preprocessed
@@ -273,3 +273,16 @@ RSR-core/
273273
│ └── frontend/ # React dashboard
274274
└── tests/ # Unit and integration tests
275275
```
276+
277+
## Citation 📝
278+
279+
If you use this repository in your research or project, please cite our work:
280+
281+
```bibtex
282+
@inproceedings{dehghankarefficient,
283+
title={An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks},
284+
author={Dehghankar, Mohsen and Erfanian, Mahdi and Asudeh, Abolfazl},
285+
booktitle={Forty-second International Conference on Machine Learning},
286+
year={2025}
287+
}
288+
```

0 commit comments

Comments
 (0)