Closed
Conversation
* Introduce a new script to assess the flakiness of model predictions across multiple iterations and images. * Implement functions for running models, clearing cache, and normalizing outputs. * Include detailed reporting on model stability and mismatched outputs, with options for saving logs of discrepancies. * Support for various image formats and model types is integrated. This addition enhances the robustness of model evaluation by identifying inconsistencies in predictions.
* Replace print statements with LOGGER for improved logging consistency. * Enhance output formatting for problem reporting and iteration details. * Maintain detailed summary reporting while ensuring logs are structured and informative.
* Add support for namedtuples and Enums in the normalization process. * Include handling for numpy boolean and generic types, as well as Path objects. * Improve the robustness of the normalize_output function for diverse data types.
* Introduce a new parameter, sample_different_images_limit, to control the number of different-image paths retained in the output. * Update the compute_diff_summary function to utilize this new parameter for improved reporting of mismatched images. * Enhance the main function to validate the sample limit input and integrate it into the flakiness check process.
* Introduce functions to fetch test images from Roboflow, including validation for required parameters. * Update the flakiness check to utilize image paths instead of a directory, improving flexibility. * Load environment variables from a .env file to streamline configuration. * Remove deprecated images directory option to simplify the interface.
…ojects * Update the flakiness check to iterate over workspaces and their associated projects, enhancing flexibility in model evaluation. * Introduce a new model configuration JSON file format to specify models per project within each workspace. * Add detailed logging for each workspace and project during the flakiness check process. * Include a README file to document the usage and configuration of the flakiness check script.
* Introduce a new test suite for the check_prediction_flakiness.py script, covering various scenarios including stable and flaky predictions. * Implement tests to validate the behavior of the flakiness check under different conditions, ensuring robustness in model evaluation. * Utilize mocking to simulate model behavior and image fetching, allowing for comprehensive testing without external dependencies.
…ing functionality * Introduce new functions for handling Roboflow API interactions, including fetching image details and normalizing search results. * Update the image fetching logic to prioritize download URLs from the new API structure, improving robustness in image retrieval. * Add unit tests to validate the new functionality, ensuring correct behavior when processing search results and download URLs.
* Introduce a new bash script, run_flakiness_gpu_docker.sh, to facilitate running the check_prediction_flakiness.py script within a Docker container configured for GPU inference. * The script allows for customizable parameters, including the number of test images and iterations, and supports environment variable configuration for enhanced flexibility. * Ensure proper directory mounting for results and logs, streamlining the process of evaluating model flakiness in a containerized environment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Related Issue(s):
Type of Change
Testing
Test details:
Checklist
Additional Context