Skip to content

fix/Handle cross-year dedupes with deletes in student academic records#180

Open
alchenist wants to merge 1 commit intomainfrom
fix/dedupe_order_xyear_student_academic_record
Open

fix/Handle cross-year dedupes with deletes in student academic records#180
alchenist wants to merge 1 commit intomainfrom
fix/dedupe_order_xyear_student_academic_record

Conversation

@alchenist
Copy link
Copy Markdown
Contributor

We have comparable logic on other resources introduced in #123; believe we also need to handle this for student academic records as well. Seeing cases in SC where deletes in later ODSes are overriding undeleted records in previous ODSes.

@alchenist alchenist marked this pull request as ready for review March 12, 2026 15:41
@alchenist alchenist requested a review from sleblanc23 March 12, 2026 16:55
Copy link
Copy Markdown
Contributor

@sleblanc23 sleblanc23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me Alex! It got me wondering whether there are other places we might want to apply this logic. I could see a previous year's data being accidentally pushed then deleted. I tried searching for places where we were ordering by api_year, which I think should tell us where this might be needed:

  • calendars
  • course_transcripts
  • education_organization_network_associations
  • staff_education_organization_assignment_associations
  • staff_education_organization_employment_associations

This could totally be a separate chunk of work, so not a blocker to merging this PR!

@rlittle08
Copy link
Copy Markdown
Collaborator

I'm not sure whether we can make this assumption. Taking a look at an affected case:

image

Because studentAcademicRecords & courseTranscripts are uniquely longitudinal resources, we can't assume that the api_year 2023 record should be trusted over the 2024, or that this was a case of accidentally uploading prev year data.

This case could actually be showing a correction to a prior year transcript; therefore, the current logic would be correct. Now, that may be the minority of cases. And if so, then this branch would bring us closer to correct (more true positives than false positives). We may need more analysis to determine if both true positives and false positives exist in the wild

@alchenist
Copy link
Copy Markdown
Contributor Author

alchenist commented Apr 1, 2026

Key issue here for me is reproducibility. Current dedupe rules produce different results between situation a) deletes are present because we pulled incrementally from the ODS and situation b) if the deletes aren't present because we pulled a full refresh.

I see your point @rlittle08 that we shouldn't assume that a past year record should win the dedupe over a record from a more recent ODS (which existing logic also assumes in the opposite direction, so agree with further research). The assumption that a delete from one ODS should mean a delete from previous ODSes seems too strong to me, though. Can we restrict dedupes to the ODS they appeared in to solve the reproducibility issue and then deal with deduping cross-year student academic records separately? That is, might an intended deletion of a past-year ODS record be handled by a totally empty record in a more recent ODS, going by the higher api_year? And then corrections to past-year SARs would also be correctly handled (record in newer api_year wins)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants