Skip to content

torch recompile limit + benchmark #21

@delray

Description

@delray

I have something odd going on with my install somewhere, hoping you can point me in the right direction.

gradio + gui scripts run, meaning it starts up, the weights get loaded and i see my live camera) but I dont get any processing (i get a warning about no reference image but I assume that doesnt matter)
EDIT: i didnt let the gui run long enough to eventually get the recompile error - but eventually it does and processes frames, just very very slowly-the recompile error is the same as below.

the minimal cv2_demo eventually runs, but throws the recompile error as below and FPS is maybe .5 if i had to guess.

running benchmark gets the following:

`python scripts/run_benchmark.py
LivePortrait not installed, lip transfer unavailable
LivePortrait not installed, lip transfer unavailable
Initializing...
LivePortrait not installed, lip transfer unavailable
LivePortrait not installed, lip transfer unavailable
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 398/398 [00:00<00:00, 8816.92it/s]
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] torch._dynamo hit config.recompile_limit (8)
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] function: 'torch_dynamo_resume_in___call___at_977' (D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:977)
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] last reason: 35/7: block_id == 7 # cached_keys = self.single_block_keys[block_id] # D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:200 in sync_with_kv_cache (_dynamo\variables\lists.py:135 in getitem_const)
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] torch._dynamo hit config.recompile_limit (8)
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] function: 'torch_dynamo_resume_in_sparse_mlp_compute_at_246' (D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:246)
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] last reason: 14/7: tensor 'input_hidden_states' size mismatch at index 2. expected 12288, actual 3072
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
Warming up...
Testing with dynamic area: 0%
Testing with dynamic area: 10%
Testing with dynamic area: 25%
Testing with dynamic area: 50%
Testing with dynamic area: 75%
Testing with dynamic area: 90%
Testing with dynamic area: 100%
Measuring end to end latency...

FluxRT Benchmark Report

Configuration

{
  "default_prompt": "Turn this into art.",
  "default_steps": 2,
  "default_seed": 52,
  "models_path": "FLUX.2-klein-4B",
  "int8_models_path": "FLUX.2-klein-4B-int8",
  "resolution": {
    "height": 320,
    "width": 576
  },
  "compile_models": true,
  "enable_spatial_cache": true,
  "enable_int8_quantization": false,
  "target_fps": null,
  "interpolation_exp": 2,
  "use_reference_image": false,
  "logging": false
}

Hardware Information

{
  "platform": "Windows-11-10.0.26200-SP0",
  "python": "3.12.13",
  "cpu": "",
  "cpu_cores_logical": 24,
  "gpu": [
    {
      "name": "NVIDIA GeForce RTX 4090",
      "vram_gb": 23.99,
      "cc": "8.9"
    }
  ]
}

Results

Dynamic Area Processing Time (s) FPS
0% 0.1688 23.81
10% 0.2200 18.92
25% 0.2347 17.16
50% 0.2691 14.90
75% 0.3093 12.97
90% 0.3240 12.51
100% 0.3391 11.80

End-to-end latency: 0.0010 s

Reserved GPU memory: 19.7871 GB`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions