feat(ENT-11384): add earliest_course_run_start to Algolia product index#8
feat(ENT-11384): add earliest_course_run_start to Algolia product index#8sjasti-sonata-svg wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an earliest_course_run_start numeric attribute to the Algolia product index so search consumers (e.g., Enterprise Learner Portal) can filter for “new content” based on a course’s first-ever run start date.
Changes:
- Exposes
earliest_course_run_startvia the Algolia proxy models (AlgoliaProxyCourse+ delegated field onAlgoliaProxyProduct). - Adds the field to both English and Spanish Algolia product indexes and enables filtering on it via
attributesForFaceting. - Adds unit tests covering empty/no-start runs and “minimum start date” behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
course_discovery/apps/course_metadata/algolia_models.py |
Adds delegated field and computes earliest run start as epoch seconds on AlgoliaProxyCourse. |
course_discovery/apps/course_metadata/index.py |
Includes the new attribute in indexed fields and enables filtering in both EN/ES indexes. |
course_discovery/apps/course_metadata/tests/test_algolia_models.py |
Adds tests for earliest_course_run_start edge cases and correctness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Epoch seconds of the earliest run start across all runs. Used by the | ||
| # "New content" filter in search surfaces (last-12-months window). | ||
| starts = [r.start for r in self.course_runs.all() if r.start] |
There was a problem hiding this comment.
earliest_course_run_start currently takes the minimum start across all related course runs, including Unpublished runs. Elsewhere in this module (e.g., get_course_availability) unpublished runs are intentionally ignored, and including them here can skew the “first released” timestamp (e.g., a draft/unpublished backfilled run with an old start date would cause a course to never qualify as “new”). Consider restricting the aggregation to published, non-restricted runs so the value reflects what learners can actually see/index.
| # Epoch seconds of the earliest run start across all runs. Used by the | |
| # "New content" filter in search surfaces (last-12-months window). | |
| starts = [r.start for r in self.course_runs.all() if r.start] | |
| # Epoch seconds of the earliest learner-visible run start. Used by the | |
| # "New content" filter in search surfaces (last-12-months window). | |
| starts = [ | |
| r.start for r in self.course_runs.all() | |
| if ( | |
| r.start | |
| and r.status == CourseRunStatus.Published | |
| and not getattr(r, 'restricted', False) | |
| ) | |
| ] |
There was a problem hiding this comment.
Addressed in commit 39072b666. earliest_course_run_start now filters by r.status == CourseRunStatus.Published, so unpublished/draft runs can no longer skew the first-released timestamp. This matches the behavior of get_course_availability elsewhere in this module. Thanks for the catch!
| result = course.earliest_course_run_start | ||
| assert isinstance(result, int) | ||
| assert result == int(earliest.timestamp()) | ||
|
|
There was a problem hiding this comment.
The new tests cover the empty/null-start cases and the basic “min start wins” behavior, but they don’t cover the case where an earlier run exists but is Unpublished (or otherwise not meant to be indexed/visible). If earliest_course_run_start is updated to ignore unpublished runs, adding an explicit regression test for that scenario would help prevent future changes from reintroducing incorrect “new content” behavior.
| def test_earliest_course_run_start_ignores_unpublished_runs(self): | |
| course = AlgoliaProxyCourseFactory(partner=self.__class__.edxPartner) | |
| unpublished_earlier = datetime.datetime.now(UTC) - datetime.timedelta(days=500) | |
| published_earliest = datetime.datetime.now(UTC) - datetime.timedelta(days=100) | |
| CourseRunFactory(course=course, start=unpublished_earlier, status=CourseRunStatus.Unpublished) | |
| CourseRunFactory(course=course, start=published_earliest, status=CourseRunStatus.Published) | |
| result = course.earliest_course_run_start | |
| assert isinstance(result, int) | |
| assert result == int(published_earliest.timestamp()) |
There was a problem hiding this comment.
Addressed in commit 39072b666. Added test_earliest_course_run_start_ignores_unpublished_runs which creates a course with an old Unpublished run (1000 days ago) and a recent Published run (30 days ago), and asserts the property returns the recent one's timestamp. This guards against future changes reintroducing the bug.
Introduces a new numeric attribute on the course Algolia proxy that holds the earliest course-run start date as a Unix epoch integer, computed as min(run.start) across PUBLISHED runs on the course. Unpublished/draft runs are excluded to match get_course_availability elsewhere in the module — otherwise an old backfilled draft would incorrectly mark a course as not new. The field is registered in attributesForFaceting as filterOnly() on both the English and Spanish product indexes so that numeric range queries of the form `earliest_course_run_start >= <epoch>` are accepted by Algolia. Tests cover empty/null-start courses, the min-as-epoch-int happy path, and the regression case for unpublished runs being ignored. This supports the "New content" filter on the enterprise Learner Portal, which narrows results to courses whose earliest run started within a rolling 12-month window. ENT-11384
39072b6 to
927ed8a
Compare
|
Closing this PR — per clarification from @adusenbery, the correct home for this change is |
Summary
Adds a new numeric attribute
earliest_course_run_startto the Algolia course index, enabling the upcoming "New content" filter on the enterprise Learner Portal.Ticket
ENT-11384 — "We have new content being released, but there isn't a way for learners to find new content."
Acceptance Criteria from the ticket:
This PR satisfies A/C #2 by providing the data layer. Without this attribute in the Algolia index, no frontend filter can query "earliest start within 12 months."
Why a new field is needed
The ticket defines "new content" as:
The only existing start-date field in the index is
active_run_start, which returns the advertised (next upcoming) run's start date. For a recurring course like CS50x (first run: 2012, next run: next month),active_run_startalways points to next month — making the course permanently appear "new." That contradicts the ticket's intent.earliest_course_run_startcomputesmin(run.start)across all runs on a course, which correctly identifies when the course was first released.Changes — file by file
course_discovery/apps/course_metadata/algolia_models.pyLine added to
result_fieldslist (inside@delegate_attributesdecorator):Why: The
@delegate_attributesdecorator onAlgoliaProxyProductdynamically generates getter methods for every name inresult_fields. Without this entry, the new property exists onAlgoliaProxyCoursebut the Algolia serializer (which reads from the proxy product, not the course directly) would never see it. The field would silently be absent from indexed records.New property on
AlgoliaProxyCourse:Why each line:
self.course_runs.all()— iterates ALL runs, not just the advertised one. This is what makes it "earliest" rather than "current."if r.start— filters out runs with no start date (defensive; avoidsmin()on empty sequence).if not starts: return None— courses with zero runs or all-null starts getNone, so Algolia skips indexing this field for them rather than erroring.int(min(starts).timestamp())— takes the minimum datetime, converts to Unix epoch seconds as an integer. Integer format is required because Algolia numeric range filters (>=) only work on numeric attributes, and integers compare without floating-point precision issues.course_discovery/apps/course_metadata/index.pyEnglish product index —
result_fields(around line 98):'earliest_course_run_start', )Why: Tells the
algoliasearch_djangolibrary to include this field when serializing each course record for upload to Algolia. Without it, the property is computed but never transmitted.English product index —
attributesForFaceting(around line 121):'filterOnly(earliest_course_run_start)',Why: This is the critical line. Algolia rejects any filter query on an attribute not listed in
attributesForFaceting. The frontend's queryearliest_course_run_start >= <epoch>would return an error without this entry.filterOnly()is the correct modifier because we only ever ask "is the timestamp >= X" — we never need Algolia to return a list of all possible timestamps as facet values (which would be meaningless for a continuous numeric field).Spanish product index — same two additions (around lines 154 and 180):
Why: The Algolia setup has both an English and a Spanish product index. Without parity, Spanish-locale enterprise customers would silently not have the filter available. Both indexes serve the same Learner Portal depending on locale.
course_discovery/apps/course_metadata/tests/test_algolia_models.py3 new test cases added:
test_earliest_course_run_start_none_when_no_runs— verifies that a course with zero runs returnsNone(not an error).test_earliest_course_run_start_none_when_all_runs_lack_start— verifies that runs withstart=Noneare gracefully skipped.test_earliest_course_run_start_returns_min_as_epoch_int— creates 3 runs (500 days ago, yesterday, tomorrow), verifies the property returns the 500-days-ago timestamp as anint, confirming the "earliest" semantic works correctly even when the advertised run would be different.Why these specific cases: They cover the three edge cases that would cause production issues if not handled — empty courses, null starts, and the core "min not advertised" semantic. The third test is the most important because it directly validates the ticket's definition against the existing
active_run_start(which would return yesterday's or tomorrow's run, not the 500-day-old one).Test plan
pytest course_discovery/apps/course_metadata/tests/test_algolia_models.py -k earliest_course_run_start— 3/3 passing127ea98b0) — no incompatibilitiesDependencies
This is PR 1 of 3 for ENT-11384:
edx/frontend-enterprise— UI component (NewContentRadioFacetdropdown in the shared filter row)edx/frontend-app-learner-portal-enterprise— query-time filter injection (useDefaultSearchFilterswiring)PRs 2 and 3 depend on this PR being merged and the Algolia index being reindexed before the filter can return results.
ENT-11384