torch recompile limit + benchmark

I have something odd going on with my install somewhere, hoping you can point me in the right direction. 

gradio + gui scripts run, meaning it starts up, the weights get loaded and i see my live camera) but I dont get any processing (i get a warning about no reference image but I assume that doesnt matter)
EDIT: i didnt let the gui run long enough to eventually get the recompile error - but eventually it does and processes frames, just very very slowly-the recompile error is the same as below.

the minimal cv2_demo eventually runs, but throws the recompile error as below and FPS is maybe .5 if i had to guess.

running benchmark gets the following:

`python scripts/run_benchmark.py 
LivePortrait not installed, lip transfer unavailable
LivePortrait not installed, lip transfer unavailable
Initializing...
LivePortrait not installed, lip transfer unavailable
LivePortrait not installed, lip transfer unavailable
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 398/398 [00:00<00:00, 8816.92it/s]
W0520 10:22:09.571000 23264 site-packages\torch\_dynamo\convert_frame.py:1743] [35/8] torch._dynamo hit config.recompile_limit (8)
W0520 10:22:09.571000 23264 site-packages\torch\_dynamo\convert_frame.py:1743] [35/8]    function: 'torch_dynamo_resume_in___call___at_977' (D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:977)
W0520 10:22:09.571000 23264 site-packages\torch\_dynamo\convert_frame.py:1743] [35/8]    last reason: 35/7: block_id == 7  # cached_keys = self.single_block_keys[block_id]  # D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:200 in sync_with_kv_cache (_dynamo\variables\lists.py:135 in getitem_const)
W0520 10:22:09.571000 23264 site-packages\torch\_dynamo\convert_frame.py:1743] [35/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W0520 10:22:09.571000 23264 site-packages\torch\_dynamo\convert_frame.py:1743] [35/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
W0520 10:22:09.970000 23264 site-packages\torch\_dynamo\convert_frame.py:1743] [14/8] torch._dynamo hit config.recompile_limit (8)
W0520 10:22:09.970000 23264 site-packages\torch\_dynamo\convert_frame.py:1743] [14/8]    function: 'torch_dynamo_resume_in_sparse_mlp_compute_at_246' (D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:246)
W0520 10:22:09.970000 23264 site-packages\torch\_dynamo\convert_frame.py:1743] [14/8]    last reason: 14/7: tensor 'input_hidden_states' size mismatch at index 2. expected 12288, actual 3072
W0520 10:22:09.970000 23264 site-packages\torch\_dynamo\convert_frame.py:1743] [14/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W0520 10:22:09.970000 23264 site-packages\torch\_dynamo\convert_frame.py:1743] [14/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
Warming up...
Testing with dynamic area: 0%
Testing with dynamic area: 10%
Testing with dynamic area: 25%
Testing with dynamic area: 50%
Testing with dynamic area: 75%
Testing with dynamic area: 90%
Testing with dynamic area: 100%
Measuring end to end latency...
# FluxRT Benchmark Report

## Configuration

```json
{
  "default_prompt": "Turn this into art.",
  "default_steps": 2,
  "default_seed": 52,
  "models_path": "FLUX.2-klein-4B",
  "int8_models_path": "FLUX.2-klein-4B-int8",
  "resolution": {
    "height": 320,
    "width": 576
  },
  "compile_models": true,
  "enable_spatial_cache": true,
  "enable_int8_quantization": false,
  "target_fps": null,
  "interpolation_exp": 2,
  "use_reference_image": false,
  "logging": false
}
```

## Hardware Information

```json
{
  "platform": "Windows-11-10.0.26200-SP0",
  "python": "3.12.13",
  "cpu": "",
  "cpu_cores_logical": 24,
  "gpu": [
    {
      "name": "NVIDIA GeForce RTX 4090",
      "vram_gb": 23.99,
      "cc": "8.9"
    }
  ]
}
```

## Results

| Dynamic Area | Processing Time (s) | FPS |
|-------------:|--------------------:|----:|
| 0% | 0.1688 | 23.81 |
| 10% | 0.2200 | 18.92 |
| 25% | 0.2347 | 17.16 |
| 50% | 0.2691 | 14.90 |
| 75% | 0.3093 | 12.97 |
| 90% | 0.3240 | 12.51 |
| 100% | 0.3391 | 11.80 |

**End-to-end latency:** 0.0010 s

**Reserved GPU memory:** 19.7871 GB`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch recompile limit + benchmark #21

FluxRT Benchmark Report

Configuration

Hardware Information

Results

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dynamic Area	Processing Time (s)	FPS
0%	0.1688	23.81
10%	0.2200	18.92
25%	0.2347	17.16
50%	0.2691	14.90
75%	0.3093	12.97
90%	0.3240	12.51
100%	0.3391	11.80

torch recompile limit + benchmark #21

Description

FluxRT Benchmark Report

Configuration

Hardware Information

Results

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions