Skip to content

feat(ENT-11384): add earliest_course_run_start to Algolia product index#8

Closed
sjasti-sonata-svg wants to merge 1 commit into
masterfrom
ENT-11384-new-content-filter
Closed

feat(ENT-11384): add earliest_course_run_start to Algolia product index#8
sjasti-sonata-svg wants to merge 1 commit into
masterfrom
ENT-11384-new-content-filter

Conversation

@sjasti-sonata-svg
Copy link
Copy Markdown

Summary

Adds a new numeric attribute earliest_course_run_start to the Algolia course index, enabling the upcoming "New content" filter on the enterprise Learner Portal.

Ticket

ENT-11384"We have new content being released, but there isn't a way for learners to find new content."

Acceptance Criteria from the ticket:

  1. A new content filter, like the one in B2C, at the end of the row of filters in the Learner Portal
  2. When filtered 'yes', the results should show courses with their earliest start dates within the last 12 months

This PR satisfies A/C #2 by providing the data layer. Without this attribute in the Algolia index, no frontend filter can query "earliest start within 12 months."

Why a new field is needed

The ticket defines "new content" as:

"Courses' earliest start day that is within the rolling 12 months is considered new content."

The only existing start-date field in the index is active_run_start, which returns the advertised (next upcoming) run's start date. For a recurring course like CS50x (first run: 2012, next run: next month), active_run_start always points to next month — making the course permanently appear "new." That contradicts the ticket's intent.

earliest_course_run_start computes min(run.start) across all runs on a course, which correctly identifies when the course was first released.

Changes — file by file

course_discovery/apps/course_metadata/algolia_models.py

Line added to result_fields list (inside @delegate_attributes decorator):

'product_marketing_video_url', 'earliest_course_run_start', ]

Why: The @delegate_attributes decorator on AlgoliaProxyProduct dynamically generates getter methods for every name in result_fields. Without this entry, the new property exists on AlgoliaProxyCourse but the Algolia serializer (which reads from the proxy product, not the course directly) would never see it. The field would silently be absent from indexed records.

New property on AlgoliaProxyCourse:

@property
def earliest_course_run_start(self):
    starts = [r.start for r in self.course_runs.all() if r.start]
    if not starts:
        return None
    return int(min(starts).timestamp())

Why each line:

  • self.course_runs.all() — iterates ALL runs, not just the advertised one. This is what makes it "earliest" rather than "current."
  • if r.start — filters out runs with no start date (defensive; avoids min() on empty sequence).
  • if not starts: return None — courses with zero runs or all-null starts get None, so Algolia skips indexing this field for them rather than erroring.
  • int(min(starts).timestamp()) — takes the minimum datetime, converts to Unix epoch seconds as an integer. Integer format is required because Algolia numeric range filters (>=) only work on numeric attributes, and integers compare without floating-point precision issues.

course_discovery/apps/course_metadata/index.py

English product index — result_fields (around line 98):

'earliest_course_run_start', )

Why: Tells the algoliasearch_django library to include this field when serializing each course record for upload to Algolia. Without it, the property is computed but never transmitted.

English product index — attributesForFaceting (around line 121):

'filterOnly(earliest_course_run_start)',

Why: This is the critical line. Algolia rejects any filter query on an attribute not listed in attributesForFaceting. The frontend's query earliest_course_run_start >= <epoch> would return an error without this entry. filterOnly() is the correct modifier because we only ever ask "is the timestamp >= X" — we never need Algolia to return a list of all possible timestamps as facet values (which would be meaningless for a continuous numeric field).

Spanish product index — same two additions (around lines 154 and 180):
Why: The Algolia setup has both an English and a Spanish product index. Without parity, Spanish-locale enterprise customers would silently not have the filter available. Both indexes serve the same Learner Portal depending on locale.

course_discovery/apps/course_metadata/tests/test_algolia_models.py

3 new test cases added:

  1. test_earliest_course_run_start_none_when_no_runs — verifies that a course with zero runs returns None (not an error).
  2. test_earliest_course_run_start_none_when_all_runs_lack_start — verifies that runs with start=None are gracefully skipped.
  3. test_earliest_course_run_start_returns_min_as_epoch_int — creates 3 runs (500 days ago, yesterday, tomorrow), verifies the property returns the 500-days-ago timestamp as an int, confirming the "earliest" semantic works correctly even when the advertised run would be different.

Why these specific cases: They cover the three edge cases that would cause production issues if not handled — empty courses, null starts, and the core "min not advertised" semantic. The third test is the most important because it directly validates the ticket's definition against the existing active_run_start (which would return yesterday's or tomorrow's run, not the 500-day-old one).

Test plan

  • pytest course_discovery/apps/course_metadata/tests/test_algolia_models.py -k earliest_course_run_start3/3 passing
  • Verified against Django 5 upgrade (merged in 127ea98b0) — no incompatibilities
  • After merge: run the Algolia reindex management command so existing course records pick up the new field

Dependencies

This is PR 1 of 3 for ENT-11384:

  1. This PR — backend data layer (Algolia index attribute)
  2. edx/frontend-enterprise — UI component (NewContentRadioFacet dropdown in the shared filter row)
  3. edx/frontend-app-learner-portal-enterprise — query-time filter injection (useDefaultSearchFilters wiring)

PRs 2 and 3 depend on this PR being merged and the Algolia index being reindexed before the filter can return results.

ENT-11384

Copilot AI review requested due to automatic review settings April 16, 2026 13:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an earliest_course_run_start numeric attribute to the Algolia product index so search consumers (e.g., Enterprise Learner Portal) can filter for “new content” based on a course’s first-ever run start date.

Changes:

  • Exposes earliest_course_run_start via the Algolia proxy models (AlgoliaProxyCourse + delegated field on AlgoliaProxyProduct).
  • Adds the field to both English and Spanish Algolia product indexes and enables filtering on it via attributesForFaceting.
  • Adds unit tests covering empty/no-start runs and “minimum start date” behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
course_discovery/apps/course_metadata/algolia_models.py Adds delegated field and computes earliest run start as epoch seconds on AlgoliaProxyCourse.
course_discovery/apps/course_metadata/index.py Includes the new attribute in indexed fields and enables filtering in both EN/ES indexes.
course_discovery/apps/course_metadata/tests/test_algolia_models.py Adds tests for earliest_course_run_start edge cases and correctness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +293 to +295
# Epoch seconds of the earliest run start across all runs. Used by the
# "New content" filter in search surfaces (last-12-months window).
starts = [r.start for r in self.course_runs.all() if r.start]
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

earliest_course_run_start currently takes the minimum start across all related course runs, including Unpublished runs. Elsewhere in this module (e.g., get_course_availability) unpublished runs are intentionally ignored, and including them here can skew the “first released” timestamp (e.g., a draft/unpublished backfilled run with an old start date would cause a course to never qualify as “new”). Consider restricting the aggregation to published, non-restricted runs so the value reflects what learners can actually see/index.

Suggested change
# Epoch seconds of the earliest run start across all runs. Used by the
# "New content" filter in search surfaces (last-12-months window).
starts = [r.start for r in self.course_runs.all() if r.start]
# Epoch seconds of the earliest learner-visible run start. Used by the
# "New content" filter in search surfaces (last-12-months window).
starts = [
r.start for r in self.course_runs.all()
if (
r.start
and r.status == CourseRunStatus.Published
and not getattr(r, 'restricted', False)
)
]

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 39072b666. earliest_course_run_start now filters by r.status == CourseRunStatus.Published, so unpublished/draft runs can no longer skew the first-released timestamp. This matches the behavior of get_course_availability elsewhere in this module. Thanks for the catch!

result = course.earliest_course_run_start
assert isinstance(result, int)
assert result == int(earliest.timestamp())

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests cover the empty/null-start cases and the basic “min start wins” behavior, but they don’t cover the case where an earlier run exists but is Unpublished (or otherwise not meant to be indexed/visible). If earliest_course_run_start is updated to ignore unpublished runs, adding an explicit regression test for that scenario would help prevent future changes from reintroducing incorrect “new content” behavior.

Suggested change
def test_earliest_course_run_start_ignores_unpublished_runs(self):
course = AlgoliaProxyCourseFactory(partner=self.__class__.edxPartner)
unpublished_earlier = datetime.datetime.now(UTC) - datetime.timedelta(days=500)
published_earliest = datetime.datetime.now(UTC) - datetime.timedelta(days=100)
CourseRunFactory(course=course, start=unpublished_earlier, status=CourseRunStatus.Unpublished)
CourseRunFactory(course=course, start=published_earliest, status=CourseRunStatus.Published)
result = course.earliest_course_run_start
assert isinstance(result, int)
assert result == int(published_earliest.timestamp())

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 39072b666. Added test_earliest_course_run_start_ignores_unpublished_runs which creates a course with an old Unpublished run (1000 days ago) and a recent Published run (30 days ago), and asserts the property returns the recent one's timestamp. This guards against future changes reintroducing the bug.

Introduces a new numeric attribute on the course Algolia proxy that
holds the earliest course-run start date as a Unix epoch integer,
computed as min(run.start) across PUBLISHED runs on the course.
Unpublished/draft runs are excluded to match get_course_availability
elsewhere in the module — otherwise an old backfilled draft would
incorrectly mark a course as not new.

The field is registered in attributesForFaceting as filterOnly() on
both the English and Spanish product indexes so that numeric range
queries of the form `earliest_course_run_start >= <epoch>` are
accepted by Algolia.

Tests cover empty/null-start courses, the min-as-epoch-int happy
path, and the regression case for unpublished runs being ignored.

This supports the "New content" filter on the enterprise Learner
Portal, which narrows results to courses whose earliest run started
within a rolling 12-month window.

ENT-11384
@sjasti-sonata-svg sjasti-sonata-svg force-pushed the ENT-11384-new-content-filter branch from 39072b6 to 927ed8a Compare April 17, 2026 05:24
@sjasti-sonata-svg
Copy link
Copy Markdown
Author

Closing this PR — per clarification from @adusenbery, the correct home for this change is enterprise-catalog/apps/catalog/algolia_utils.py, which owns the B2B Algolia index the Learner Portal actually queries (not course-discovery's product index). Replacement PR in enterprise-catalog will follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants