Fix missing file sizes#12455
Open
qqmyers wants to merge 3 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it: QDR has noticed a production case where ~2000 files did not have file sizes in the database. It is believed that this can/could have occurred during S3 direct upload when the code tries to retrieve the size from S3. If the S3 store is only 'eventually consistent' this call can fail, resulting in a null file size in the database. (Whether this is possible now or was only an issue in past releases is not clear.).
Overall, this state is non-fatal, but it does affect the reported download size when downloading a dataset (i.e. the download may be bigger than shown due to the files with no size not adding to the total).
This PR adds an /api/admin/datafiles/integrity/fixmissingfilesizes (somewhat analogous to the one to fixmissingoriginalfilesizes) that will, for stores indicating that Dataverse can access the files (e.g. not a remote store where the URL may be blocked or point to a remote trusted store landing page whose size is not that of the real file), try to retrieve the size and update the database.
It may be worth seeing whether there are instances for which
select count(*) from datafile where filesize is null;is not zero to decide whether this is a useful addition for the community (feel free to comment on this PR if you see this in your installation).Which issue(s) this PR closes:
Special notes for your reviewer:
Suggestions on how to test this: It's not clear how to trigger this condition - easiest to just delete some filesizes from a test database and confirm that the api call restores them.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: