Skip to content

[PP-883] Implement Lexile DB Metadata Service#3154

Draft
dbernstein wants to merge 15 commits intomainfrom
feature/PP-883-implement-lexile-db
Draft

[PP-883] Implement Lexile DB Metadata Service#3154
dbernstein wants to merge 15 commits intomainfrom
feature/PP-883-implement-lexile-db

Conversation

@dbernstein
Copy link
Copy Markdown
Contributor

@dbernstein dbernstein commented Mar 19, 2026

Description

Implements a MetaMetrics Lexile DB integration that augments Lexile scores from the Lexile Titles Database API. A nightly Celery task processes ISBNs that lack Lexile data and adds scores from this source. Lexile DB scores are treated as high quality and override scores from other sources (e.g. Overdrive).

Motivation and Context

  • PP-883: Integrate Lexile DB to improve Lexile coverage and quality.
  • Lexile DB is the authoritative MetaMetrics source for Lexile measures.
  • Uses a nightly batch job instead of CoverageProvider/CoverageRecord.

Motivation and Context

https://ebce-lyrasis.atlassian.net/browse/PP-883

Changes

New Files

  • src/palace/manager/integration/metadata/lexile/__init__.py – Package init and exports
  • src/palace/manager/integration/metadata/lexile/settings.pyLexileDBSettings (username, password, base_url, sample_identifier for self-test)
  • src/palace/manager/integration/metadata/lexile/api.pyLexileDBAPI client for GET {base_url}/api/fab/v3/book/?format=json&ISBN={isbn} with HTTP Basic Auth
  • src/palace/manager/integration/metadata/lexile/service.pyLexileDBService (MetadataService) with HasSelfTests for connection self-test
  • src/palace/manager/celery/tasks/lexile.pyrun_lexile_db_update (orchestrator) and lexile_db_update_task (worker)
  • src/palace/manager/scripts/lexile_db.pyLexileDBUpdateScript for manual runs
  • bin/lexile_db_update – Script entry point
  • tests/manager/integration/metadata/lexile/test_api.py – API client tests
  • tests/manager/integration/metadata/lexile/test_service.py – Service and self-test tests
  • tests/manager/celery/tasks/test_lexile.py – Celery task tests

Modified Files

  • src/palace/manager/sqlalchemy/constants.py – Added LEXILE_DB = "Lexile DB"
  • src/palace/manager/sqlalchemy/model/datasource.py – Added Lexile DB to well_known_sources with offers_metadata_lookup=True, primary_identifier_type=ISBN
  • src/palace/manager/sqlalchemy/model/classification.py – Added (LEXILE_DB, Subject.LEXILE_SCORE): 0.95 so Lexile DB scores override others
  • src/palace/manager/service/integration_registry/metadata.py – Registered LexileDBService as "MetaMetrics Lexile DB Service"
  • src/palace/manager/service/celery/celery.py – Imported lexile tasks and added daily beat schedule (3:00 AM)
  • tests/manager/api/admin/controller/test_metadata_services.py – Assert Lexile DB protocol in metadata services and add Lexile DB fixture

Features

  • Integration configuration: Username, password, base URL, and optional sample ISBN for self-test
  • Nightly Celery task: Runs at 3:00 AM; checks for integration, acquires Redis lock, processes ISBNs in batches of 10
  • Lock strategy: Redis lock with Timestamp id as value; 30-minute TTL, renewed per batch
  • Manual run script: bin/lexile_db_update with --force to reprocess all ISBNs (including those with existing Lexile DB data)
  • Self-test: Fetches Lexile for a sample ISBN; configurable sample ISBN in the admin UI
  • Classification quality: Lexile DB scores use quality 0.95 and override lower-quality sources

How Has This Been Tested?

  • API tests (test_api.py): Lexile lookup, 10/13-digit ISBNs, hyphen stripping, not found, empty objects, null lexile, HTTP errors, empty ISBN, raise_on_error on 403
  • Service tests (test_service.py): Self-test success with/without data, custom sample ISBN, auth failure
  • Celery task tests (test_lexile.py): Orchestrator skip when not configured, worker queuing, classification creation, force mode replacement, Timestamp creation
  • Metadata services tests: Lexile DB protocol in protocol list
  • mypy: All new and modified code passes type checking

I've tested the CM Admin experience which works. However, since we don't yet have valid credentials, we can't verify that the service works yet.

Checklist

  • Documentation updated as needed
  • Add feature label to the PR
  • Add DB migration label if migrations are included (none in this PR)
  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 95.47325% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.29%. Comparing base (c046b9e) to head (b3175ea).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/palace/manager/celery/tasks/lexile.py 92.52% 2 Missing and 6 partials ⚠️
.../palace/manager/integration/metadata/lexile/api.py 96.07% 1 Missing and 1 partial ⚠️
...ace/manager/integration/metadata/lexile/service.py 97.91% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #3154    +/-   ##
========================================
  Coverage   93.28%   93.29%            
========================================
  Files         493      499     +6     
  Lines       45713    45956   +243     
  Branches     6264     6288    +24     
========================================
+ Hits        42645    42875   +230     
- Misses       1982     1987     +5     
- Partials     1086     1094     +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dbernstein dbernstein added the feature New feature label Mar 19, 2026
@dbernstein dbernstein force-pushed the feature/PP-883-implement-lexile-db branch from 32f9a18 to f13d549 Compare March 19, 2026 21:01
@dbernstein dbernstein marked this pull request as ready for review March 19, 2026 23:23
@dbernstein dbernstein force-pushed the feature/PP-883-implement-lexile-db branch 2 times, most recently from 33516e7 to f8b1427 Compare March 20, 2026 17:42
:param session: Database session.
:param offset: Offset for pagination.
:param limit: Maximum number of identifiers to return.
:param force: If True, include all ISBNs (including those with Lexile from other
Copy link
Copy Markdown
Contributor Author

@dbernstein dbernstein Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc here for "force" is out of line with the code - docs need to be corrected.

@dbernstein dbernstein marked this pull request as draft March 20, 2026 21:24
@dbernstein
Copy link
Copy Markdown
Contributor Author

I'm pulling this one back into draft mode after my PR PTSD on the latest round on the patron blocking rules PR. I keep underestimating how much self review work is required before submitting for code review.

@dbernstein dbernstein force-pushed the feature/PP-883-implement-lexile-db branch from f8b1427 to a73120a Compare March 23, 2026 17:11
Bug and logic fixes:
- Remove unreachable `if not data_source:` check after lookup with autocreate=True
- Prevent orphaned Timestamp when lock acquisition fails by creating the stamp
  only after acquiring the lock (use provisional uuid4 for offset 0)
- Use shared LEXILE_DB_LOCK_KEY and _lexile_db_lock in run_lexile_db_update
- Clarify force mode semantics and Overdrive-only exclusion in docstrings
Test coverage:
- Add test for timestamp_id=None when offset > 0
- Assert lexile_db_update_task.delay() is called in orchestrator test
- Add tests that Overdrive-only Lexile ISBNs are excluded in default and force modes
Dead and redundant code:
- Remove unused self._db from LexileDBService.__init__
- Remove LoggerMixin from LexileDBService (inherited via HasSelfTests)
Code quality:
- Add lexile_settings pytest fixture in test_api.py
- Replace N+1 classification loop with single DELETE in _process_identifier
- Add synchronize_session=False for delete with subquery
- Add comment documenting Celery replace idiom for task chaining
Replace the per-batch acquire/release lock pattern with a workflow-level
Redis lock (2-hour TTL) that persists across all batches in a paginated run.
A UUID is generated on the first batch and passed to every replacement task
via task.replace(), allowing each subsequent batch to extend the same lock.
The lock() context manager is configured with ignored_exceptions=(Ignore,)
so that Celery's Ignore exception (raised by task.replace()) does not trigger
a lock release mid-workflow.
Also adds lock_value parameter validation and three new tests covering the
workflow lock configuration, the first-batch skip-when-locked path, and the
missing-lock_value guard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant