Add image generation track (#16) by ethanphan3993 · Pull Request #17 · ethanphan3993/Model-Advisor

ethanphan3993 · 2026-05-18T13:11:51Z

Closes #16.

What this delivers

A working /images track for local diffusion / flow-matching models, parallel to the text-LLM track. Per the issue, this can't share controls with text — model formats (safetensors vs GGUF), runtimes (Drawthings/ComfyUI/Mochi vs Ollama/LM Studio/MLX), benchmarks (GenEval / imagen-arena ELO / Emu-Edit vs HumanEval/IFEval/MMLU), and the cost model (compute vs bandwidth) are all different.

Layer	What
Catalog	20 hand-curated models with cited per-source benchmark scores (`backend/data/image_aliases.yaml`). Each entry carries a real Hugging Face repo ID and ComfyUI subfolder hint.
Use cases	`image_generation` (text-to-image) and `image_editing` (img2img / inpainting / instruct-edit). Editing-only filter requires `supports_editing`.
Harnesses	Draw Things, Mochi Diffusion, ComfyUI, InvokeAI, Stable Diffusion WebUI Forge, Diffusers — each with format/family requirements and a real install-command template.
Cost model	New compute-bound model: `time_per_image = default_steps × scaled_time_per_step + overhead`. Empirical M3 Max / M4 Max reference times in the catalog scaled by chip FP16 TFLOPS. VRAM-fit picker chooses the highest-quality quant (FP16 / Q8 / Q4) that fits.
Scoring	Same 3-axis weighting as text: `0.55 × use_case + 0.30 × hardware + 0.15 × harness`. Storage rubric and combine weights extracted to `backend/services/hardware_fit.py`; memory thresholds stay track-specific (image is stricter past 0.85 because diffusion runtimes hang rather than degrade).
API	`GET /api/images/{use-cases,harnesses,catalog}` and `POST /api/images/recommend`. 15 s hardware-snapshot cache matching `/api/recommend`.
Frontend	New `/images` route with two views: ranked picks for the user's Mac, and browse-all (family + capability filter). Install rows show real, copy-pasteable commands and a Hugging Face link.

Catalog

FLUX.1 dev/schnell/Kontext · SD 3.5 Large/Medium/Large-Turbo · SDXL Base/Turbo · Stable Cascade · SD 1.5 · HiDream-I1 dev/fast + HiDream-E1 · AuraFlow v0.3 · OmniGen v1 · Sana 1.6B · Lumina-Next-T2I · PixArt-Σ · Kolors · InstructPix2Pix.

Each entry cites its source (Black Forest Labs / Stability / NVIDIA model cards, GenEval paper, imagen-leaderboard.dev with snapshot date, technical reports). Refresh is currently a manual quarterly task — a fetcher isn't included because imagen-leaderboard.dev is fragmented enough that hand-curation is more reliable than scraping.

End-to-end smoke (Apple M5 Pro, 64 GB)

```
chip: Apple M5 Pro (10.0 TFLOPS)
#1 FLUX.1 [schnell] fit=8.21 time=~12s vram=24.0 GB
#2 Stable Diffusion 3.5 Large fit=8.00 time=~40s vram=16.0 GB
#3 FLUX.1 [dev] fit=7.78 time=~66s vram=24.0 GB
#4 SDXL 1.0 Base fit=7.76 time=~18s vram= 7.0 GB
#5 HiDream-I1 [fast] fit=7.68 time=~28s vram=32.0 GB
```

Schnell ranks above dev because 4-step distillation amortizes the per-step cost; SDXL holds top-5 on its low VRAM and fast step time despite a lower GenEval score. Editing use case correctly filters to Kontext / OmniGen / InstructPix2Pix / HiDream-E1. Mochi Diffusion harness correctly drops FLUX and SD3 (CoreML doesn't support them yet).

Install commands resolve to real, copy-pasteable strings

```
ComfyUI: huggingface-cli download black-forest-labs/FLUX.1-dev
--local-dir ComfyUI/models/unet/flux-1-dev/
Diffusers: from diffusers import DiffusionPipeline;
DiffusionPipeline.from_pretrained('black-forest-labs/FLUX.1-dev')
InvokeAI: Model Manager → Add via URL →
https://huggingface.co/black-forest-labs/FLUX.1-dev
```

Local-dir paths use slugs (flux-1-dev) not display names (FLUX.1 [dev]) so the commands are shell-safe.

Test plan

`pytest backend/tests/` — 43 passed (25 existing + 18 new image tests covering catalog load, score normalization including FID inversion, TFLOPS scaling, quantization picking, harness filters, install command substitution, shell-safe local-dirs, and ComfyUI subfolder routing)
`tsc --noEmit` — clean
`vite build` — clean (255 KB → 74 KB gzipped)
Live API smoke (image_generation / image_editing, drawthings/mochi/comfyui harness filters, install command substitution end-to-end)
Manual frontend pass: visit `/images`, switch between "Ranked picks" and "Browse all 20 models", expand a card to see provenance, copy a ComfyUI install command

🤖 Generated with Claude Code

Introduces a separate /images surface for local diffusion / flow-matching models. The text-LLM track was unfit to host this: different model formats (safetensors vs GGUF), different runtimes, different benchmarks (GenEval / imagen-arena ELO / Emu-Edit), and a fundamentally different cost model (compute-bound time-per-image vs bandwidth-bound TPS). What ships: - Curated catalog of ~20 models in backend/data/image_aliases.yaml, each with cited sources for benchmark scores: FLUX.1 dev/schnell/Kontext, Stable Diffusion 3.5 Large/Medium/Turbo, SDXL Base/Turbo, Stable Cascade, SD 1.5, HiDream-I1 dev/fast and HiDream-E1, AuraFlow v0.3, OmniGen v1, Sana 1.6B, Lumina-Next-T2I, PixArt-Sigma, Kolors, InstructPix2Pix. - Image use cases (image_generation, image_editing) and image harnesses (Draw Things, Mochi Diffusion, ComfyUI, InvokeAI, Forge, Diffusers) in their own YAMLs. Editing-only filter requires supports_editing. - backend/services/images/ — self-contained module with catalog loader, compute-bound hardware-fit recommender, and 3-axis scoring matching the text track's structure (0.55 use_case + 0.30 hardware + 0.15 harness). The hardware model uses chip FP16 TFLOPS to scale empirical M3 Max / M4 Max reference times to other Apple Silicon chips, plus a VRAM-fit check at the picked quantization (FP16 / Q8 / Q4). - backend/routers/images.py — GET /api/images/{use-cases,harnesses,catalog} plus POST /api/images/recommend. Same 15s hardware-snapshot cache pattern as the text /api/recommend. - frontend/src/pages/Images.tsx — new /images route with use-case + harness picker, ranked cards showing time-per-image, VRAM at chosen quant, license, install hint per harness, and benchmark provenance on expand. Nav entry next to Home. - 14 new pytest cases in backend/tests/test_image_recommender.py covering catalog load, score normalization (including FID inversion), TFLOPS scaling (M1 should be ~5x slower than M3 Max for FLUX), quantization picking, harness filtering (Mochi rejects FLUX/SD3), use-case filtering (editing keeps only edit-capable models), and install command resolution. All 39 backend tests pass; frontend type-checks and builds cleanly. What's deliberately out of scope for this PR (planned as follow-ups so this lands in a reviewable size): - Live data fetcher for imagen-leaderboard.dev. Catalog is currently refreshed manually from cited sources; the leaderboard is incomplete enough that a fetcher would mostly replicate hand-curation. - Generalizing the text recommender's score_hardware to accept the workload-specific cost model. Image track has its own score_hardware today; consolidating can come once both have proven shapes. - SQLite persistence for image catalog. ~20 entries doesn't justify a schema migration; YAML is fine until we add a fetcher. Per the issue, /images is a separate route rather than a use-case under the text recommender — the mental models (memory-bandwidth vs compute, GGUF vs safetensors, TPS vs seconds-per-image) are too different to share controls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rdware-fit primitives Three follow-ups to the initial image track in 14ffd06, addressing functional gaps I'd left rather than time-budget cuts. 1. Install commands now actually work. The harness templates had {url} / {hf_id} / {folder} placeholders that the recommender never substituted, so users got copy-pasteable strings like 'drawthings://import?url={url}' — useless. Each catalog entry now carries a real Hugging Face repo ID (hf_id) and a comfyui_folder hint (unet for FLUX, checkpoints for everything else). The recommender substitutes these into harness templates so: - Diffusers: from diffusers import DiffusionPipeline; DiffusionPipeline.from_pretrained('black-forest-labs/FLUX.1-dev') - ComfyUI: huggingface-cli download black-forest-labs/FLUX.1-dev --local-dir ComfyUI/models/unet/flux-1-dev/ - InvokeAI: Model Manager → Add via URL → https://huggingface.co/black-forest-labs/FLUX.1-dev - Forge: huggingface-cli download ... --local-dir stable-diffusion-webui-forge/models/Stable-diffusion/ - Mochi: CoreML conversion command targeting the HF repo - Drawthings: 'search the in-app catalog' (Drawthings has its own curated download flow, not HF-based) Local-dir paths use canonical_id ('flux-1-dev') instead of display_name ('FLUX.1 [dev]') so they're shell-safe — no spaces or square brackets. install_options also expose a download_url pointing at the model's HF page when applicable. Two new tests pin this: - test_install_commands_have_no_unsubstituted_placeholders (asserts no {...} survives substitution for any (model, harness) pair) - test_install_command_local_dirs_are_shell_safe (asserts no whitespace or [ ] in --local-dir tokens) 2. /images now has a full-catalog browse view. Previously only the top-10 ranked picks were visible; the catalog endpoint existed but wasn't surfaced. /images now has a 'Ranked picks for your Mac' / 'Browse all 20 models' toggle. The catalog grid shows each model's family, architecture, params, FP16/Q8/Q4 VRAM, capability (gen/edit), compatible harnesses, license, and a Hugging Face link. Family + capability filters built in. The text /browse stays focused on the SQLite text catalog — the data shapes are different enough that mixing them would have been more cost than benefit. 3. score_hardware shares its rubric across both tracks. Storage scoring (cost of failed download) and combine weights (0.50 mem + 0.40 speed + 0.10 storage) are identical between text and image, and were duplicated. Extracted to backend/services/hardware_fit.py with a `bucket()` helper for the interpolated thresholds. Memory thresholds remain track-specific because the physics differs: text gracefully degrades when memory is tight while diffusion runtimes hang, so the image track is stricter at ratio > 0.85. Comment in each call site explains why. Tests: 43 pass (was 39); frontend tsc + vite build clean (255 KB → 74 KB gzipped JS, +4 KB for the catalog grid). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ethan Phan and others added 2 commits May 18, 2026 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add image generation track (#16)#17

Add image generation track (#16)#17
ethanphan3993 wants to merge 2 commits into
mainfrom
feature/image-generation-track

ethanphan3993 commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ethanphan3993 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this delivers

Catalog

End-to-end smoke (Apple M5 Pro, 64 GB)

Install commands resolve to real, copy-pasteable strings

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ethanphan3993 commented May 18, 2026 •

edited

Loading