Add image generation track (#16)#17
Open
ethanphan3993 wants to merge 2 commits into
Open
Conversation
Introduces a separate /images surface for local diffusion / flow-matching
models. The text-LLM track was unfit to host this: different model formats
(safetensors vs GGUF), different runtimes, different benchmarks (GenEval /
imagen-arena ELO / Emu-Edit), and a fundamentally different cost model
(compute-bound time-per-image vs bandwidth-bound TPS).
What ships:
- Curated catalog of ~20 models in backend/data/image_aliases.yaml,
each with cited sources for benchmark scores: FLUX.1 dev/schnell/Kontext,
Stable Diffusion 3.5 Large/Medium/Turbo, SDXL Base/Turbo, Stable
Cascade, SD 1.5, HiDream-I1 dev/fast and HiDream-E1, AuraFlow v0.3,
OmniGen v1, Sana 1.6B, Lumina-Next-T2I, PixArt-Sigma, Kolors,
InstructPix2Pix.
- Image use cases (image_generation, image_editing) and image harnesses
(Draw Things, Mochi Diffusion, ComfyUI, InvokeAI, Forge, Diffusers)
in their own YAMLs. Editing-only filter requires supports_editing.
- backend/services/images/ — self-contained module with catalog loader,
compute-bound hardware-fit recommender, and 3-axis scoring matching
the text track's structure (0.55 use_case + 0.30 hardware + 0.15 harness).
The hardware model uses chip FP16 TFLOPS to scale empirical M3 Max /
M4 Max reference times to other Apple Silicon chips, plus a VRAM-fit
check at the picked quantization (FP16 / Q8 / Q4).
- backend/routers/images.py — GET /api/images/{use-cases,harnesses,catalog}
plus POST /api/images/recommend. Same 15s hardware-snapshot cache
pattern as the text /api/recommend.
- frontend/src/pages/Images.tsx — new /images route with use-case +
harness picker, ranked cards showing time-per-image, VRAM at chosen
quant, license, install hint per harness, and benchmark provenance
on expand. Nav entry next to Home.
- 14 new pytest cases in backend/tests/test_image_recommender.py
covering catalog load, score normalization (including FID inversion),
TFLOPS scaling (M1 should be ~5x slower than M3 Max for FLUX),
quantization picking, harness filtering (Mochi rejects FLUX/SD3),
use-case filtering (editing keeps only edit-capable models), and
install command resolution. All 39 backend tests pass; frontend
type-checks and builds cleanly.
What's deliberately out of scope for this PR (planned as follow-ups so
this lands in a reviewable size):
- Live data fetcher for imagen-leaderboard.dev. Catalog is currently
refreshed manually from cited sources; the leaderboard is incomplete
enough that a fetcher would mostly replicate hand-curation.
- Generalizing the text recommender's score_hardware to accept the
workload-specific cost model. Image track has its own score_hardware
today; consolidating can come once both have proven shapes.
- SQLite persistence for image catalog. ~20 entries doesn't justify a
schema migration; YAML is fine until we add a fetcher.
Per the issue, /images is a separate route rather than a use-case under
the text recommender — the mental models (memory-bandwidth vs compute,
GGUF vs safetensors, TPS vs seconds-per-image) are too different to share
controls.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rdware-fit primitives Three follow-ups to the initial image track in 14ffd06, addressing functional gaps I'd left rather than time-budget cuts. 1. Install commands now actually work. The harness templates had {url} / {hf_id} / {folder} placeholders that the recommender never substituted, so users got copy-pasteable strings like 'drawthings://import?url={url}' — useless. Each catalog entry now carries a real Hugging Face repo ID (hf_id) and a comfyui_folder hint (unet for FLUX, checkpoints for everything else). The recommender substitutes these into harness templates so: - Diffusers: from diffusers import DiffusionPipeline; DiffusionPipeline.from_pretrained('black-forest-labs/FLUX.1-dev') - ComfyUI: huggingface-cli download black-forest-labs/FLUX.1-dev --local-dir ComfyUI/models/unet/flux-1-dev/ - InvokeAI: Model Manager → Add via URL → https://huggingface.co/black-forest-labs/FLUX.1-dev - Forge: huggingface-cli download ... --local-dir stable-diffusion-webui-forge/models/Stable-diffusion/ - Mochi: CoreML conversion command targeting the HF repo - Drawthings: 'search the in-app catalog' (Drawthings has its own curated download flow, not HF-based) Local-dir paths use canonical_id ('flux-1-dev') instead of display_name ('FLUX.1 [dev]') so they're shell-safe — no spaces or square brackets. install_options also expose a download_url pointing at the model's HF page when applicable. Two new tests pin this: - test_install_commands_have_no_unsubstituted_placeholders (asserts no {...} survives substitution for any (model, harness) pair) - test_install_command_local_dirs_are_shell_safe (asserts no whitespace or [ ] in --local-dir tokens) 2. /images now has a full-catalog browse view. Previously only the top-10 ranked picks were visible; the catalog endpoint existed but wasn't surfaced. /images now has a 'Ranked picks for your Mac' / 'Browse all 20 models' toggle. The catalog grid shows each model's family, architecture, params, FP16/Q8/Q4 VRAM, capability (gen/edit), compatible harnesses, license, and a Hugging Face link. Family + capability filters built in. The text /browse stays focused on the SQLite text catalog — the data shapes are different enough that mixing them would have been more cost than benefit. 3. score_hardware shares its rubric across both tracks. Storage scoring (cost of failed download) and combine weights (0.50 mem + 0.40 speed + 0.10 storage) are identical between text and image, and were duplicated. Extracted to backend/services/hardware_fit.py with a `bucket()` helper for the interpolated thresholds. Memory thresholds remain track-specific because the physics differs: text gracefully degrades when memory is tight while diffusion runtimes hang, so the image track is stricter at ratio > 0.85. Comment in each call site explains why. Tests: 43 pass (was 39); frontend tsc + vite build clean (255 KB → 74 KB gzipped JS, +4 KB for the catalog grid). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #16.
What this delivers
A working
/imagestrack for local diffusion / flow-matching models, parallel to the text-LLM track. Per the issue, this can't share controls with text — model formats (safetensors vs GGUF), runtimes (Drawthings/ComfyUI/Mochi vs Ollama/LM Studio/MLX), benchmarks (GenEval / imagen-arena ELO / Emu-Edit vs HumanEval/IFEval/MMLU), and the cost model (compute vs bandwidth) are all different.backend/data/image_aliases.yaml). Each entry carries a real Hugging Face repo ID and ComfyUI subfolder hint.image_generation(text-to-image) andimage_editing(img2img / inpainting / instruct-edit). Editing-only filter requiressupports_editing.time_per_image = default_steps × scaled_time_per_step + overhead. Empirical M3 Max / M4 Max reference times in the catalog scaled by chip FP16 TFLOPS. VRAM-fit picker chooses the highest-quality quant (FP16 / Q8 / Q4) that fits.0.55 × use_case + 0.30 × hardware + 0.15 × harness. Storage rubric and combine weights extracted tobackend/services/hardware_fit.py; memory thresholds stay track-specific (image is stricter past 0.85 because diffusion runtimes hang rather than degrade).GET /api/images/{use-cases,harnesses,catalog}andPOST /api/images/recommend. 15 s hardware-snapshot cache matching/api/recommend./imagesroute with two views: ranked picks for the user's Mac, and browse-all (family + capability filter). Install rows show real, copy-pasteable commands and a Hugging Face link.Catalog
FLUX.1 dev/schnell/Kontext · SD 3.5 Large/Medium/Large-Turbo · SDXL Base/Turbo · Stable Cascade · SD 1.5 · HiDream-I1 dev/fast + HiDream-E1 · AuraFlow v0.3 · OmniGen v1 · Sana 1.6B · Lumina-Next-T2I · PixArt-Σ · Kolors · InstructPix2Pix.
Each entry cites its source (Black Forest Labs / Stability / NVIDIA model cards, GenEval paper, imagen-leaderboard.dev with snapshot date, technical reports). Refresh is currently a manual quarterly task — a fetcher isn't included because imagen-leaderboard.dev is fragmented enough that hand-curation is more reliable than scraping.
End-to-end smoke (Apple M5 Pro, 64 GB)
```
chip: Apple M5 Pro (10.0 TFLOPS)
#1 FLUX.1 [schnell] fit=8.21 time=~12s vram=24.0 GB
#2 Stable Diffusion 3.5 Large fit=8.00 time=~40s vram=16.0 GB
#3 FLUX.1 [dev] fit=7.78 time=~66s vram=24.0 GB
#4 SDXL 1.0 Base fit=7.76 time=~18s vram= 7.0 GB
#5 HiDream-I1 [fast] fit=7.68 time=~28s vram=32.0 GB
```
Schnell ranks above dev because 4-step distillation amortizes the per-step cost; SDXL holds top-5 on its low VRAM and fast step time despite a lower GenEval score. Editing use case correctly filters to Kontext / OmniGen / InstructPix2Pix / HiDream-E1. Mochi Diffusion harness correctly drops FLUX and SD3 (CoreML doesn't support them yet).
Install commands resolve to real, copy-pasteable strings
```
ComfyUI: huggingface-cli download black-forest-labs/FLUX.1-dev
--local-dir ComfyUI/models/unet/flux-1-dev/
Diffusers: from diffusers import DiffusionPipeline;
DiffusionPipeline.from_pretrained('black-forest-labs/FLUX.1-dev')
InvokeAI: Model Manager → Add via URL →
https://huggingface.co/black-forest-labs/FLUX.1-dev
```
Local-dir paths use slugs (
flux-1-dev) not display names (FLUX.1 [dev]) so the commands are shell-safe.Test plan
🤖 Generated with Claude Code