I have something odd going on with my install somewhere, hoping you can point me in the right direction.
gradio + gui scripts run, meaning it starts up, the weights get loaded and i see my live camera) but I dont get any processing (i get a warning about no reference image but I assume that doesnt matter)
EDIT: i didnt let the gui run long enough to eventually get the recompile error - but eventually it does and processes frames, just very very slowly-the recompile error is the same as below.
the minimal cv2_demo eventually runs, but throws the recompile error as below and FPS is maybe .5 if i had to guess.
running benchmark gets the following:
`python scripts/run_benchmark.py
LivePortrait not installed, lip transfer unavailable
LivePortrait not installed, lip transfer unavailable
Initializing...
LivePortrait not installed, lip transfer unavailable
LivePortrait not installed, lip transfer unavailable
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 398/398 [00:00<00:00, 8816.92it/s]
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] torch._dynamo hit config.recompile_limit (8)
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] function: 'torch_dynamo_resume_in___call___at_977' (D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:977)
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] last reason: 35/7: block_id == 7 # cached_keys = self.single_block_keys[block_id] # D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:200 in sync_with_kv_cache (_dynamo\variables\lists.py:135 in getitem_const)
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] torch._dynamo hit config.recompile_limit (8)
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] function: 'torch_dynamo_resume_in_sparse_mlp_compute_at_246' (D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:246)
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] last reason: 14/7: tensor 'input_hidden_states' size mismatch at index 2. expected 12288, actual 3072
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
Warming up...
Testing with dynamic area: 0%
Testing with dynamic area: 10%
Testing with dynamic area: 25%
Testing with dynamic area: 50%
Testing with dynamic area: 75%
Testing with dynamic area: 90%
Testing with dynamic area: 100%
Measuring end to end latency...
FluxRT Benchmark Report
Configuration
{
"default_prompt": "Turn this into art.",
"default_steps": 2,
"default_seed": 52,
"models_path": "FLUX.2-klein-4B",
"int8_models_path": "FLUX.2-klein-4B-int8",
"resolution": {
"height": 320,
"width": 576
},
"compile_models": true,
"enable_spatial_cache": true,
"enable_int8_quantization": false,
"target_fps": null,
"interpolation_exp": 2,
"use_reference_image": false,
"logging": false
}
Hardware Information
{
"platform": "Windows-11-10.0.26200-SP0",
"python": "3.12.13",
"cpu": "",
"cpu_cores_logical": 24,
"gpu": [
{
"name": "NVIDIA GeForce RTX 4090",
"vram_gb": 23.99,
"cc": "8.9"
}
]
}
Results
| Dynamic Area |
Processing Time (s) |
FPS |
| 0% |
0.1688 |
23.81 |
| 10% |
0.2200 |
18.92 |
| 25% |
0.2347 |
17.16 |
| 50% |
0.2691 |
14.90 |
| 75% |
0.3093 |
12.97 |
| 90% |
0.3240 |
12.51 |
| 100% |
0.3391 |
11.80 |
End-to-end latency: 0.0010 s
Reserved GPU memory: 19.7871 GB`
I have something odd going on with my install somewhere, hoping you can point me in the right direction.
gradio + gui scripts run, meaning it starts up, the weights get loaded and i see my live camera) but I dont get any processing (i get a warning about no reference image but I assume that doesnt matter)
EDIT: i didnt let the gui run long enough to eventually get the recompile error - but eventually it does and processes frames, just very very slowly-the recompile error is the same as below.
the minimal cv2_demo eventually runs, but throws the recompile error as below and FPS is maybe .5 if i had to guess.
running benchmark gets the following:
`python scripts/run_benchmark.py
LivePortrait not installed, lip transfer unavailable
LivePortrait not installed, lip transfer unavailable
Initializing...
LivePortrait not installed, lip transfer unavailable
LivePortrait not installed, lip transfer unavailable
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 398/398 [00:00<00:00, 8816.92it/s]
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] torch._dynamo hit config.recompile_limit (8)
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] function: 'torch_dynamo_resume_in___call___at_977' (D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:977)
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] last reason: 35/7: block_id == 7 # cached_keys = self.single_block_keys[block_id] # D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:200 in sync_with_kv_cache (_dynamo\variables\lists.py:135 in getitem_const)
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W0520 10:22:09.571000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [35/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] torch._dynamo hit config.recompile_limit (8)
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] function: 'torch_dynamo_resume_in_sparse_mlp_compute_at_246' (D:\aiRepos\FluxRT\src\fluxrt\stream_processor\transformer_flux2.py:246)
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] last reason: 14/7: tensor 'input_hidden_states' size mismatch at index 2. expected 12288, actual 3072
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W0520 10:22:09.970000 23264 site-packages\torch_dynamo\convert_frame.py:1743] [14/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
Warming up...
Testing with dynamic area: 0%
Testing with dynamic area: 10%
Testing with dynamic area: 25%
Testing with dynamic area: 50%
Testing with dynamic area: 75%
Testing with dynamic area: 90%
Testing with dynamic area: 100%
Measuring end to end latency...
FluxRT Benchmark Report
Configuration
{ "default_prompt": "Turn this into art.", "default_steps": 2, "default_seed": 52, "models_path": "FLUX.2-klein-4B", "int8_models_path": "FLUX.2-klein-4B-int8", "resolution": { "height": 320, "width": 576 }, "compile_models": true, "enable_spatial_cache": true, "enable_int8_quantization": false, "target_fps": null, "interpolation_exp": 2, "use_reference_image": false, "logging": false }Hardware Information
{ "platform": "Windows-11-10.0.26200-SP0", "python": "3.12.13", "cpu": "", "cpu_cores_logical": 24, "gpu": [ { "name": "NVIDIA GeForce RTX 4090", "vram_gb": 23.99, "cc": "8.9" } ] }Results
End-to-end latency: 0.0010 s
Reserved GPU memory: 19.7871 GB`