/unsloth-buddy I have 500 customer support Q&As and want to fine-tune a summarization model. I only have a MacBook Air.
The self-evolving fine-tuning agent. It talks like a colleague, learns your setup's quirks over time, and orchestrates the full lifecycle: from data formatting and model selection to training, validation, and deployment.
Runs on NVIDIA GPUs via Unsloth, natively on Apple Silicon via mlx-tune, and on free cloud GPUs via colab-mcp. Part of the Gaslamp AI development platform — docs.
You: Fine-tune a small model on my customer support FAQ. I have a CSV file.
[Phase 0] Creating project: customer_faq_sft_2026_03_17/
Injecting memory from past sessions...
Applied: adapter_path convention, SFT recipe for Apple Silicon, M4 profile
[Phase 1] Requirements interview...
Method: SFT Model: Qwen2.5-0.5B Deploy: Ollama
[Phase 2] Data strategy...
Loading 1,200 rows from faq.csv → reformatting as chat messages
Saved to data/train.jsonl (validated: messages column ✓)
[Phase 3] Environment: Apple M4 24GB, mlx-tune 0.4.3, Python 3.12
Ready for training
[Phase 4] Training... 200 steps
Final loss: 1.42 → saved to outputs/adapters/
[Phase 5] Evaluation (base vs fine-tuned, greedy decoding):
Q: How do I reset my password?
[Base] I can help with that. Which password?
[Fine-tuned] Go to the login page → click "Forgot password" → check your email.
[Phase 6] Export → outputs/model-q4_k_m.gguf
Run: ollama create my-faq-bot -f Modelfile && ollama run my-faq-bot
[Phase 7] Reflecting on completed project...
✓ 4 lessons → ~/.gaslamp/lessons.md (model gotchas, install traps)
✓ 1 recipe → ~/.gaslamp/skills.md (SFT on Apple Silicon)
✓ 1 profile → ~/.gaslamp/user.md (M4 Max, mlx-tune, Python 3.12)
One conversation, eight phases, one deployable model — and a smarter agent next time.
This skill includes sub-skills and utility scripts — install the full repository, not a single file.
Claude Code (recommended)
/plugin marketplace add TYH-labs/unsloth-buddy
/plugin install unsloth-buddy@TYH-labs/unsloth-buddy
Then describe what you want to fine-tune. The skill activates automatically.
Gemini CLI
gemini extensions install https://github.com/TYH-labs/unsloth-buddy --consentAny agent supporting the Agent Skills standard
git clone https://github.com/TYH-labs/unsloth-buddy.git .agents/skills/unsloth-buddyMost tools assume you already know what to do. This one doesn't — and it learns from every project you run.
| Your concern | What actually happens |
|---|---|
| "I don't know where to start" | A 2-question interview locks in task, audience, and data — then recommends the right model, hardware, and method |
| "I don't have data, or it's in the wrong format" | A dedicated data phase acquires, generates, or reformats data to exactly match the trainer's required schema |
| "SFT? DPO? GRPO? Which one?" | Maps your goal to the right technique and explains why in plain language |
| "Which model? Will it fit in my GPU?" | Detects your hardware, maps to available model sizes, estimates cloud cost if needed |
| "Unsloth won't install on my machine" | Two-stage environment detection catches mismatches and prints the exact install command for your setup |
| "I trained it, but does it work?" | Runs the fine-tuned adapter alongside the base model so you can see the difference, not just a loss number |
| "How do I deploy it?" | You name the target (Ollama, vLLM, HF Hub) — it runs the conversion commands |
| "How do I reproduce this later — or hand it off?" | Every project gets a gaslamp.md roadbook: every kept decision with its rationale, plus 📖 learn blocks on the underlying ML concepts — enough for any agent or person to reproduce end-to-end |
| "I keep hitting the same problems on my setup" | A self-evolving feedback loop: Agent-driven memory synthesis after every run. Hardware quirks and hyperparameter workarounds accumulate over time. Frozen snapshot injection for zero-prompt cross-project recall. |
Eight phases, each scoped to an isolated dated project directory that never touches your repo root.
| Phase | What happens | Output files |
|---|---|---|
| 0. Init | Creates {name}_{date}/, injects long-term memory snapshot from past sessions |
gaslamp.md, .gaslamp_context/ |
| 1. Interview | 2-question interview — task + data; captures domain/audience; silently applies past lessons | project_brief.md |
| 2. Data | Acquires, validates, and formats to trainer schema | data_strategy.md |
| 3. Environment | Hardware scan → Python env check → blocks until ready | detect_env_result.json |
| 4. Training | Generates and runs train.py, streams output to log |
outputs/adapters/ |
| 5. Evaluation | Batch tests, interactive REPL, base vs fine-tuned comparison | logs/eval.log |
| 5.5. Demo | Generates a shareable static HTML page — base vs fine-tuned side-by-side | demos/<name>/index.html |
| 6. Export | GGUF, merged 16-bit, or Hub push | outputs/ |
| 6.5. Local Deploy | Optional: quantize → bench → serve + Gaslamp Chat WebUI (requires llama.cpp) | outputs/*.gguf |
| 7. Reflect | Synthesizes lessons, gotchas, and recipes into ~/.gaslamp/ for future projects |
~/.gaslamp/ |
customer_faq_sft_2026_03_17/
├── train.py eval.py
├── data/ outputs/adapters/
├── logs/
├── gaslamp.md ← reproducibility roadbook
├── project_brief.md data_strategy.md
├── memory.md progress_log.md
└── .gaslamp_context/ ← read-only snapshot of long-term memory (local only)
| Hardware | Backend | What it can run |
|---|---|---|
| NVIDIA T4 (16 GB) | unsloth |
7B QLoRA, small-scale GRPO |
| NVIDIA A100 (80 GB) | unsloth |
70B QLoRA, 14B LoRA 16-bit |
| Apple M1 / M2 / M3 / M4 | mlx-tune / mlx-vlm / trl |
SFT/DPO: 7B on 10 GB, 13B on 24 GB; Vision SFT via mlx-vlm; GRPO: 1–7B via TRL + PyTorch MPS |
| Google Colab (T4/L4/A100) | unsloth via colab-mcp |
Free cloud GPU, opt-in |
Unsloth is ~2× faster than standard HuggingFace training, uses up to 80% less VRAM, and produces exact gradients.
Supported training methods: SFT, DPO, GRPO, ORPO, KTO, SimPO, Vision SFT (Qwen2.5-VL, Llama 3.2 Vision, Gemma 3, Gemma 4)
Every local training run automatically opens a real-time dashboard at http://localhost:8080/:
- Task-aware panels — pass
task_type="sft"|"dpo"|"grpo"|"vision"to unlock the right charts automatically - SSE streaming — instant updates via
EventSource, no polling lag - EMA smoothed loss — clear trend line over noisy raw loss, plus running average
- Dynamic phase badge — idle → training → completed / error, with colour-coded task-type badge
- ETA, elapsed time & epoch — estimated time remaining and current epoch progress
- GPU memory breakdown — baseline (model load) vs LoRA training overhead vs total, shown as gauge bars; works on both NVIDIA (CUDA) and Apple Silicon (MPS via
driver_allocated_memory/recommended_max_memory) - GRPO panels — reward ± std-dev confidence band + KL divergence chart
- DPO panels — chosen vs rejected reward + KL divergence chart
- Gradient norm & tokens/sec — live stats row, fades in when data arrives
- Completed summary banner — final memory and runtime stats on training end
- Terminal UI (Plotext) —
scripts/terminal_dashboard.pywith--oncefor CLI snapshots; upgrades to 2×2 layout for DPO/GRPO - Demo server —
python scripts/demo_server.py --task grpo --hardware mps|nvidiaserves rich mock data so you can preview every panel without a GPU
Works on both NVIDIA (via GaslampDashboardCallback(task_type=...)) and Apple Silicon (via MlxGaslampDashboard(task_type=...)).
After evaluation, the agent can generate a static HTML demo page that showcases base model vs fine-tuned outputs side-by-side — open it in any browser, no server needed. Great for sharing results with teammates, stakeholders, or in a portfolio.
The demo builder is part of the Gaslamp platform's presentation toolkit. We've simplified it for unsloth-buddy with two built-in themes and automatic domain-specific color customization:
| Theme | Best for | Look |
|---|---|---|
| crisp-light | Business, healthcare, education, general | Clean, minimal, light background |
| dark-signal | Code, math, security, DevOps | Bold, high-contrast, monospace output |
The accent color is auto-selected based on your model's domain (e.g. teal for healthcare, amber for education, electric cyan for code) — or you can pick your own.
Try the live example: demos/qwen2.5-0.5b-chip2-sft/index.html — download and open in any browser.
After GGUF export, if llama.cpp is detected on your system (checked in Phase 3), the agent offers a one-command local deploy:
python scripts/llamacpp.py deploy \
--model outputs/model-f16.gguf --quant q4_k_m --bench --serveThis runs the full pipeline: quantize → benchmark → start an OpenAI-compatible server → open the Gaslamp Chat WebUI at http://localhost:8081/. Individual subcommands are also available:
python scripts/llamacpp.py install # install llama.cpp (brew / cmake)
python scripts/llamacpp.py quantize --input model.gguf --types q4_k_m q8_0
python scripts/llamacpp.py bench --models model-q4_k_m.gguf
python scripts/llamacpp.py serve --model model-q4_k_m.gguf --port 8081
python scripts/llamacpp.py chat --model model-q4_k_m.ggufRequires llama.cpp — installed automatically via llamacpp.py install.
Apple Silicon users who need larger models or CUDA-only features can offload training to a free Colab GPU:
- Install
colab-mcpin Claude Code:uv python install 3.13 claude mcp add colab-mcp -- uvx --from git+https://github.com/googlecolab/colab-mcp --python 3.13 colab-mcp
- Open a Colab notebook, connect to a T4/L4 GPU runtime
- The agent connects, installs Unsloth, starts training in a background thread, and polls metrics every 30s
- Download adapters from the Colab file browser when done
Local mlx-tune remains the default — Colab is opt-in for when you need more power.
unsloth-buddy works standalone or as part of a larger Gaslamp project — an agentic platform that orchestrates the full ML lifecycle from research to training to deployment. When called via Gaslamp, the project directory and state are shared across skills, and results pass automatically to the next phase.
Every project also gets a gaslamp.md roadbook — a reproducibility record that captures every kept decision with its rationale and 📖 learn blocks on the underlying ML concepts. Any agent or person can hand this file to a fresh session and reproduce the project end-to-end, or use it to understand why each choice was made.
gaslamp.dev/unsloth — gaslamp.dev
unsloth-buddy is an OpenClaw-compatible skill. Share the repo URL with OpenClaw, describe what you want to fine-tune — it reads AGENTS.md, understands the workflow, and runs everything automatically.
1. Share https://github.com/TYH-labs/unsloth-buddy with OpenClaw
2. OpenClaw reads AGENTS.md → understands the 7-phase fine-tuning lifecycle
3. Say: "Fine-tune a model on my customer support data"
4. Done — OpenClaw runs the interview, formats data, trains, evaluates, and exports
For Claude Code, Gemini CLI, Codex, or any ACP-compatible agent: provide AGENTS.md as context and the agent will automatically follow the same workflow.
After each fine-tuning run, unsloth-buddy synthesizes what it learned — the workarounds, the hardware-specific settings, the hyperparameters that worked — and carries that knowledge into future projects automatically.
The second time you fine-tune on Apple Silicon, it already knows your adapter path convention. The third time you work with Gemma models, it already sets padding_side correctly. You stop repeating the same debugging sessions. The agent improves for your setup, not for some statistical average user.
This works through a local ~/.gaslamp/ memory directory (never committed to your repo). Past lessons, model-specific quirks, and reusable scenario recipes accumulate there. Every new project injects a frozen snapshot at startup — silently, with no prompts — and applies anything relevant.
- 2026-04-14 — Self-evolving memory (Phase 7 + global inject): after each project, the agent synthesizes lessons, model-specific gotchas, and reusable scenario recipes into
~/.gaslamp/. Every new project injects a frozen snapshot at startup and silently applies past knowledge. Implements the Frozen Snapshot pattern from agent memory research. Seeref/self_evolve_plan.md. - 2026-04-12 — Added llama.cpp local deploy (Phase 6.5): after GGUF export, if llama.cpp is installed, the agent offers a one-command pipeline — quantize → benchmark → serve + open the Gaslamp Chat WebUI (
templates/chat_ui.html).scripts/llamacpp.pyprovides 7 subcommands (install,quantize,bench,ppl,serve,chat,deploy); auto-selects GPU offload on Apple Silicon (Metal) and NVIDIA.scripts/detect_system.pynow detects llama.cpp binaries and prints an install hint if missing. - 2026-04-10 — Added native Vision SFT for Apple Silicon: Integrated mlx-vlm to support multimodal fine-tuning (e.g. Gemma 4 Vision, Qwen2.5-VL) on M-series chips. Added
scripts/unsloth_mlx_vision_example.pytraining template andmlx_eval_vision_template.pyfor comparative vision evaluation. Demo Builder now supports wide-format VLM layouts (vlm-crisp,vlm-dark) and relative PNG asset packaging for offline-portable multimodal dashboards. - 2026-04-09 — Demo Builder improvements: auto-resolves conceptual/movie keywords (e.g. "matrix" → nvidia, "star wars" → spacex) to the best-fit brand before calling the design search script; distinguishes shallow vs. deep DESIGN.md overrides — deep overrides (structural layout changes like all-black or hero/light-content split) skip the CSS injection point and write the demo file from scratch. Added
scripts/search_design.pyto skill resources for fetching brand design templates withoutnpx. - 2026-04-04 — Added Demo Builder (Phase 5.5): after evaluation, generates a static HTML demo page showing base vs fine-tuned outputs side-by-side. Two themes (crisp-light, dark-signal) with automatic domain-specific accent colors. No server needed — open the file in any browser. Part of the Gaslamp presentation toolkit, simplified for unsloth-buddy. Interview simplified from 5-point contract to 2-question format (task + data) that also captures user domain/audience for demo theming.
- 2026-03-22 — Added
gaslamp.mdreproducibility roadbook: every project now records all kept decisions with rationale and 📖 ML concept explanations (method, model, data, hyperparameters, eval, export), so any agent or person can reproduce the project end-to-end and understand why each choice was made. Template lives attemplates/gaslamp_template.md; auto-generated byinit_project.py. - 2026-03-21 — Enhanced training dashboard: task-aware panels (SFT/DPO/GRPO/Vision), GPU memory breakdown (baseline vs LoRA vs total), GRPO reward ± std and KL divergence charts, DPO chosen/rejected reward and KL charts, epoch tracking, completed-training summary banner, terminal 2×2 layout for DPO/GRPO, and new
scripts/demo_server.pymock server for UI development without a GPU. - 2026-03-19 — Added terminal training dashboard (
scripts/terminal_dashboard.py): liveplotextcharts of loss and learning rate in the terminal, with--oncemode for Claude Code one-shot progress checks. - 2026-03-18 — Added Google Colab training support via colab-mcp: free T4/L4/A100 GPU access from Claude Code, background-thread training with live polling, and adapter download workflow.
See LICENSE.txt. Unsloth is MIT licensed, mlx-tune is MIT licensed.
