Skip to content

Commit 047df23

Browse files
committed
GUFF4 removed
1 parent 9c26a56 commit 047df23

1 file changed

Lines changed: 0 additions & 11 deletions

File tree

README.md

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -121,17 +121,6 @@ That command rewrites the tensors in place while preserving the standard GGUF me
121121
For a zero-disk workaround you can also dequantize on the fly (via `t81.dequantize_gguf_to_float` or a small loader patch) before instantiating `llama_cpp.Llama`; see the docs for an example monkey patch if you want to load `model-tq1.gguf` or `model-tq2.gguf` directly without producing an intermediate copy.
122122

123123

124-
## GGUF v4 compliance
125-
126-
t81’s GGUF exports already mirror the llama.cpp conventions; the writer now aligns with llama.cpp’s block layout (32-row groups, per-group f16 scale, optional TQ2 refinement bytes) and includes v4’s mandatory `gguf_header` additions, which are worth calling out for everybody writing their own converter:
127-
128-
- **Header bump** – write `version = 4` instead of 3 so llama.cpp accepts the file and no longer fails with “unsupported version”.
129-
- **Global alignment metadata** – after `tensor_count`/`kv_count` emit `alignment` (default 32, power-of-two) and `reserved` (0) before the metadata block, and compute tensor padding with `GGML_PAD(size, alignment)` so every tensor data block ends on that boundary.
130-
- **Tensor padding & metadata rules** – rely on the new alignment field instead of optional metadata keys, keep `general.alignment` as a `uint32_t` power-of-two, and let missing/invalid values fail fast instead of corrupting mmaped loads.
131-
- **Implementation note**`struct gguf_header { char magic[4]; uint32_t version; uint64_t tensor_count, kv_count; uint32_t alignment; uint32_t reserved; };` plus explicit `fwrite(&alignment, sizeof(uint32_t), 1, f); fwrite(&reserved, sizeof(uint32_t), 1, f);` immediately after writing `kv_count` is enough to match the official layout.
132-
133-
The eight extra header bytes are negligible even for huge models, but they unlock ARM64-friendly alignment, predictable metadata parsing, and wide compatibility with upcoming llama.cpp releases.
134-
135124
## Use cases
136125

137126
- Ternary LLM weight quantization and GGUF exports for Hugging Face + `llama.cpp`.

0 commit comments

Comments
 (0)