DGX Spark Local Inference Toolkit: llama.cpp, Vibe-Coding setup, and Whisper TUI.
-
Updated
Feb 14, 2026 - HTML
DGX Spark Local Inference Toolkit: llama.cpp, Vibe-Coding setup, and Whisper TUI.
3-bit Lloyd-Max KV Cache Compression for LLM Inference on NVIDIA DGX Spark GB10 — 5.12x compression, 0.983 cosine similarity, pure numpy on ARM unified memory
Deploy Nemotron 3 Nano 30B on NVIDIA DGX Spark using TensorRT-LLM (Blackwell GB10, NVFP4 quantization, OpenAI-compatible API)
Local-first developer dashboard for the NVIDIA DGX Spark.
Deploy Nemotron 3 Nano 30B with 1M context window on NVIDIA DGX Spark using llama.cpp (Blackwell sm_121, Q4_0 KV cache quantization)
Add a description, image, and links to the nvidia-dgx-spark topic page so that developers can more easily learn about it.
To associate your repository with the nvidia-dgx-spark topic, visit your repo's landing page and select "manage topics."