GUFF4 removed

t81dev · t81dev · commit 047df232e07a · 2025-12-13T13:27:37.000-05:00
diff --git a/README.md b/README.md
@@ -121,17 +121,6 @@ That command rewrites the tensors in place while preserving the standard GGUF me
 For a zero-disk workaround you can also dequantize on the fly (via `t81.dequantize_gguf_to_float` or a small loader patch) before instantiating `llama_cpp.Llama`; see the docs for an example monkey patch if you want to load `model-tq1.gguf` or `model-tq2.gguf` directly without producing an intermediate copy.
 
 
-## GGUF v4 compliance
-
-t81’s GGUF exports already mirror the llama.cpp conventions; the writer now aligns with llama.cpp’s block layout (32-row groups, per-group f16 scale, optional TQ2 refinement bytes) and includes v4’s mandatory `gguf_header` additions, which are worth calling out for everybody writing their own converter:
-
-- **Header bump** – write `version = 4` instead of 3 so llama.cpp accepts the file and no longer fails with “unsupported version”.
-- **Global alignment metadata** – after `tensor_count`/`kv_count` emit `alignment` (default 32, power-of-two) and `reserved` (0) before the metadata block, and compute tensor padding with `GGML_PAD(size, alignment)` so every tensor data block ends on that boundary.
-- **Tensor padding & metadata rules** – rely on the new alignment field instead of optional metadata keys, keep `general.alignment` as a `uint32_t` power-of-two, and let missing/invalid values fail fast instead of corrupting mmaped loads.
-- **Implementation note** – `struct gguf_header { char magic[4]; uint32_t version; uint64_t tensor_count, kv_count; uint32_t alignment; uint32_t reserved; };` plus explicit `fwrite(&alignment, sizeof(uint32_t), 1, f); fwrite(&reserved, sizeof(uint32_t), 1, f);` immediately after writing `kv_count` is enough to match the official layout.
-
-The eight extra header bytes are negligible even for huge models, but they unlock ARM64-friendly alignment, predictable metadata parsing, and wide compatibility with upcoming llama.cpp releases.
-
 ## Use cases
 
 - Ternary LLM weight quantization and GGUF exports for Hugging Face + `llama.cpp`.