Skip to content

WIP: TRT Flakiness test script#2174

Closed
dkosowski87 wants to merge 10 commits intomainfrom
trt-flakiness-tests
Closed

WIP: TRT Flakiness test script#2174
dkosowski87 wants to merge 10 commits intomainfrom
trt-flakiness-tests

Conversation

@dkosowski87
Copy link
Copy Markdown
Contributor

What does this PR do?

Related Issue(s):

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Other:

Testing

  • I have tested this change locally
  • I have added/updated tests for this change

Test details:

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in hard-to-understand areas
  • My changes generate no new warnings or errors
  • I have updated the documentation accordingly (if applicable)

Additional Context

* Introduce a new script to assess the flakiness of model predictions across multiple iterations and images.
* Implement functions for running models, clearing cache, and normalizing outputs.
* Include detailed reporting on model stability and mismatched outputs, with options for saving logs of discrepancies.
* Support for various image formats and model types is integrated.

This addition enhances the robustness of model evaluation by identifying inconsistencies in predictions.
* Replace print statements with LOGGER for improved logging consistency.
* Enhance output formatting for problem reporting and iteration details.
* Maintain detailed summary reporting while ensuring logs are structured and informative.
* Add support for namedtuples and Enums in the normalization process.
* Include handling for numpy boolean and generic types, as well as Path objects.
* Improve the robustness of the normalize_output function for diverse data types.
* Introduce a new parameter, sample_different_images_limit, to control the number of different-image paths retained in the output.
* Update the compute_diff_summary function to utilize this new parameter for improved reporting of mismatched images.
* Enhance the main function to validate the sample limit input and integrate it into the flakiness check process.
* Introduce functions to fetch test images from Roboflow, including validation for required parameters.
* Update the flakiness check to utilize image paths instead of a directory, improving flexibility.
* Load environment variables from a .env file to streamline configuration.
* Remove deprecated images directory option to simplify the interface.
…ojects

* Update the flakiness check to iterate over workspaces and their associated projects, enhancing flexibility in model evaluation.
* Introduce a new model configuration JSON file format to specify models per project within each workspace.
* Add detailed logging for each workspace and project during the flakiness check process.
* Include a README file to document the usage and configuration of the flakiness check script.
* Introduce a new test suite for the check_prediction_flakiness.py script, covering various scenarios including stable and flaky predictions.
* Implement tests to validate the behavior of the flakiness check under different conditions, ensuring robustness in model evaluation.
* Utilize mocking to simulate model behavior and image fetching, allowing for comprehensive testing without external dependencies.
…ing functionality

* Introduce new functions for handling Roboflow API interactions, including fetching image details and normalizing search results.
* Update the image fetching logic to prioritize download URLs from the new API structure, improving robustness in image retrieval.
* Add unit tests to validate the new functionality, ensuring correct behavior when processing search results and download URLs.
* Introduce a new bash script, run_flakiness_gpu_docker.sh, to facilitate running the check_prediction_flakiness.py script within a Docker container configured for GPU inference.
* The script allows for customizable parameters, including the number of test images and iterations, and supports environment variable configuration for enhanced flexibility.
* Ensure proper directory mounting for results and logs, streamlining the process of evaluating model flakiness in a containerized environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant