Skip to content

Commit d2e5d61

Browse files
ejoranlieneagnguyen87rlittle08
authored
fix k_course fanout (#150)
* fix k_course fanout Because of the longitudinal nature of course transcripts, we get copies of the same records in each school year -- e.g. a student's 2019 course transcript will appear in 2019, 2020, 2021, etc. Ideally we would link the course metadata from the year the course was taken to the transcript record, however: 1. We do not have a reliable way to backfill course metadata for arbitrary years of history 2. This is not really how Ed-Fi works either: transcripts are directly linked to the contemporary year's course transcript. Because dim_course is annualized, our key generation uses the year the transcript was _submitted_ to generate `k_course`. This means that each subsequent copy of the transcript record is unique when we include `k_course` in the grain. This code changes the grain to the course's non-annualized observables, which has the effect of choosing the course metadata from the year the transcript was submitted (and preferring the most recent submission), which avoids fanning out the transcript record by the `k_course` corresponding to each unique year in which we received it. * apply 2 dedupes to retrieve all most recent non-deleted records * rename to correct CTEs * update filter for is_deleted * duplicated code --------- Co-authored-by: gnguyen87 <gnguyen@macalester.edu> Co-authored-by: rlittle08 <rlittle@edanalytics.org>
1 parent 02796ab commit d2e5d61

1 file changed

Lines changed: 17 additions & 6 deletions

File tree

models/staging/edfi_3/stage/stg_ef3__course_transcripts.sql

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,28 @@ keyed as (
1919
{{ extract_extension(model_name=this.name, flatten=True) }}
2020
from base_course_transcripts
2121
),
22-
deduped as (
22+
first_deduped as (
2323
{{
2424
dbt_utils.deduplicate(
2525
relation='keyed',
2626
partition_by='k_course, k_student_academic_record, course_attempt_result',
2727
order_by='api_year desc, last_modified_timestamp desc, pull_timestamp desc'
2828
)
29+
}}
30+
),
31+
no_deletes as (
32+
select * from first_deduped
33+
{% if not is_incremental() %}
34+
where not is_deleted
35+
{% endif %}
36+
),
37+
final_deduped as (
38+
{{
39+
dbt_utils.deduplicate(
40+
relation='no_deletes',
41+
partition_by='course_code, course_ed_org_id, k_student_academic_record, course_attempt_result',
42+
order_by='api_year desc, last_modified_timestamp desc, pull_timestamp desc'
43+
)
2944
}}
3045
)
31-
select * from deduped
32-
{% if not is_incremental() %}
33-
where not is_deleted
34-
{% endif %}
35-
46+
select * from final_deduped

0 commit comments

Comments
 (0)