Fix: Eliminate redundant full table scans in messages and events collection#392
Open
MoralCode wants to merge 1 commit into
Open
Fix: Eliminate redundant full table scans in messages and events collection#392MoralCode wants to merge 1 commit into
MoralCode wants to merge 1 commit into
Conversation
…ection Signed-off-by: PredictiveManish <manish.tiwari.09@zohomail.in>
25b36d2 to
e3c6796
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The below PR contents are lightly modified (to adjust issue and PR references mostly) from github.com/augurlabs/augur/pull/3444, filed by @PredictiveManish. The PR itself has been rebased to account for changes to CollectOSS since the fork and resolve merge conflicts
Description
Moved mapping queries outside batch loops and pass pre-built mappings as parameters to processing functions, following the pattern established by Shlok in augurlabs/augur#3439.
Changes Made
collectoss/tasks/github/messages.pyissue_url_to_id_mapandpr_issue_url_to_id_maponce incollect_github_messages()before any batch processingprocess_messages()to accept mappings as parameters instead of rebuilding themprocess_large_issue_and_pr_message_collection()to accept and pass mappingscollectoss/tasks/github/events.pyissue_url_to_id_mapandpr_url_to_id_maponce inBulkGithubEventCollection.collect()before the batch loop_process_events(),_process_issue_events(), and_process_pr_events()to accept mappings as parameters_get_map_from_*()calls from batch processing methodsPerformance Improvement
Before: 1,000 messages -> 50 full scans of issues AND PRs tables
After: 1,000 messages -> 1 full scan of each table (50x reduction)
Before: 10,000 events -> 40 full scans total
After: 10,000 events → 1 full scan of each table (40x reduction)
This PR fixes #146
Notes for Reviewers
Signed commits