WIP: TRT Flakiness test script by dkosowski87 · Pull Request #2174 · roboflow/inference

dkosowski87 · 2026-03-27T19:02:14Z

What does this PR do?

Related Issue(s):

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Other:

Testing

I have tested this change locally
I have added/updated tests for this change

Test details:

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code where necessary, particularly in hard-to-understand areas
My changes generate no new warnings or errors
I have updated the documentation accordingly (if applicable)

Additional Context

* Introduce a new script to assess the flakiness of model predictions across multiple iterations and images. * Implement functions for running models, clearing cache, and normalizing outputs. * Include detailed reporting on model stability and mismatched outputs, with options for saving logs of discrepancies. * Support for various image formats and model types is integrated. This addition enhances the robustness of model evaluation by identifying inconsistencies in predictions.

* Replace print statements with LOGGER for improved logging consistency. * Enhance output formatting for problem reporting and iteration details. * Maintain detailed summary reporting while ensuring logs are structured and informative.

* Add support for namedtuples and Enums in the normalization process. * Include handling for numpy boolean and generic types, as well as Path objects. * Improve the robustness of the normalize_output function for diverse data types.

* Introduce a new parameter, sample_different_images_limit, to control the number of different-image paths retained in the output. * Update the compute_diff_summary function to utilize this new parameter for improved reporting of mismatched images. * Enhance the main function to validate the sample limit input and integrate it into the flakiness check process.

* Introduce functions to fetch test images from Roboflow, including validation for required parameters. * Update the flakiness check to utilize image paths instead of a directory, improving flexibility. * Load environment variables from a .env file to streamline configuration. * Remove deprecated images directory option to simplify the interface.

…ojects * Update the flakiness check to iterate over workspaces and their associated projects, enhancing flexibility in model evaluation. * Introduce a new model configuration JSON file format to specify models per project within each workspace. * Add detailed logging for each workspace and project during the flakiness check process. * Include a README file to document the usage and configuration of the flakiness check script.

* Introduce a new test suite for the check_prediction_flakiness.py script, covering various scenarios including stable and flaky predictions. * Implement tests to validate the behavior of the flakiness check under different conditions, ensuring robustness in model evaluation. * Utilize mocking to simulate model behavior and image fetching, allowing for comprehensive testing without external dependencies.

…ing functionality * Introduce new functions for handling Roboflow API interactions, including fetching image details and normalizing search results. * Update the image fetching logic to prioritize download URLs from the new API structure, improving robustness in image retrieval. * Add unit tests to validate the new functionality, ensuring correct behavior when processing search results and download URLs.

* Introduce a new bash script, run_flakiness_gpu_docker.sh, to facilitate running the check_prediction_flakiness.py script within a Docker container configured for GPU inference. * The script allows for customizable parameters, including the number of test images and iterations, and supports environment variable configuration for enhanced flexibility. * Ensure proper directory mounting for results and logs, streamlining the process of evaluating model flakiness in a containerized environment.

dkosowski87 added 10 commits March 27, 2026 12:24

Add scenarios for flakiness

fde13b2

dkosowski87 closed this Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: TRT Flakiness test script#2174

WIP: TRT Flakiness test script#2174
dkosowski87 wants to merge 10 commits intomainfrom
trt-flakiness-tests

dkosowski87 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dkosowski87 commented Mar 27, 2026

What does this PR do?

Type of Change

Testing

Checklist

Additional Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant