Skip to content

Claude suggested improvements for repo #252

@farhan

Description

@farhan

I asked claude about the code improvements in this repository with following prompt

Prompt:

Use a "Fan-out" / "fan-in" (N researchers → synthesizer) approach to research the         
  question "how best should I optimize this codebase"?                                    
  ▎ Minimum 5 subagents, use Sonnet to do the research and individual contemplation; Opus to synthesize.  

Agents run by claude:

  1. Architecture & code structure — module organization, inheritance, coupling
  2. Test coverage & quality — gaps, brittleness, fixtures
  3. Performance & runtime — queries, caching, asset handling
  4. Dependency management & build tooling — packaging, uv/pyproject.toml, CI
  5. Code duplication & refactoring — DRY opportunities, dead code
  6. Developer experience & documentation — docs, types, linting, onboarding

and this is what Claude analyzed and replied:

Claude reply:

Executive Summary

The codebase is functional but carries significant technical debt inherited from its
edx-platform extraction. The three most consequential issues are:

  1. VideoBlock's redundant database calls on every render — direct learner-facing
    latency
  2. No coverage thresholds + critically under-tested handler code
    (video_handlers.py at 16.9% line coverage), making refactoring unsafe
  3. Legacy setup.py build system with unpinned dependencies and a dead six
    dependency that blocks modernization

Pervasive copy-paste duplication across XBlocks is a natural artifact of extraction
without a consolidation pass. Addressing these in order compounds: build system → CI
safety → safe refactoring → performance.


Tier 1 — Critical / Do First

1.1 Merge the pyproject.toml branch

  • Merge origin/pyproject.toml into main
  • Remove setup.py entirely
  • Fix [wheel] universal = 1 in setup.cfg — incorrect for a
    python_requires=">=3.12" package
  • Add lower-bound version constraints to core deps in requirements/base.in
    (currently all 28 deps are completely unpinned)
  • Replace python setup.py sdist bdist_wheel with python -m build in
    pypi-publish.yml:34

Why: The pyproject.toml branch already exists and is ready to land. The current
setup.py is 120+ lines of bespoke requirement-loading logic that exposes zero version
constraints to downstream consumers. This unblocks uv migration, proper dep bounds,
and all other CI fixes.


1.2 Establish a coverage floor

  • Add --cov-fail-under=60 to pytest config (ratchet up over time)
  • Fix CI to upload coverage for all Django environments (currently django52
    coverage is silently discarded)
  • Write tests for video/video_handlers.py16.9% line / 0% branch (AJAX
    dispatch layer handling student transcript/studio interactions)
  • Write tests for video/bumper_utils.py26.9% line / 0% branch
  • Write tests for legacy_utils/xml_utils.py34.4% line / 10.9% branch
    (shared by all 8 XBlocks)
  • Write tests for poll/poll.py46.9% line / 9.1% branch (only 2 tests;
    student_view, all handlers, and XML round-trip untested)
  • Write tests for discussion/discussion.py63.9% line / 25% branch
    (student_view, author_view, student_view_data all untested)

Why: Every refactoring item in Tiers 2 and 3 requires this safety net first.


1.3 Fix VideoBlock's redundant VAL database calls

  • Consolidate get_html() from 3 serial DB calls to 1: get_urls_for_profiles
    (video.py:323), get_video_info (video.py:349), get_course_video_image_url in
    _poster() (video.py:1164)
  • Fix get_context() dual VAL calls (video.py:803 + :829) — the second call
    subsumes the first; first result is unused
  • Add @request_cached to _poster() (video.py:1158)
  • Add request-level cache to get_available_transcript_languages()
    (video_transcripts_utils.py:79) — 10 videos in a vertical currently fires 10 separate
    DB calls

Why: Direct learner-facing latency. VideoBlock is the most-rendered XBlock in the
platform.


1.4 Remove the six dependency

  • Replace six.moves.map, six.moves.range, six.moves.zip in
    responsetypes.py:30 with stdlib equivalents
  • Replace six usage in inputtypes.py:54
  • Replace six.moves.xrange in safe_exec/safe_exec.py:51 with range
  • Replace six in test files: response_xml_factory.py:5, test_inputtypes.py:29,
    test_safe_exec.py:19
  • Remove six from requirements/base.in and requirements/test.in:6

Why: python_requires=">=3.12" makes every six usage a dead import. All
replacements are 1:1 stdlib equivalents.


Tier 2 — High Value

2.1 Consolidate cross-XBlock duplication into legacy_utils/

The extraction left utilities copy-pasted across modules with no consolidation pass.

  • Move stringify_children to legacy_utils/ — currently defined verbatim in
    problem/stringify.py, html/html.py:111, and poll/poll.py:52 (identical docstrings
    including the StackOverflow URL)
  • Centralize HTML() / Text (markupsafe wrappers) — currently in
    problem/markup.py, discussion/discussion.py:34-54, poll/poll.py:29-49 with
    copy-pasted docstrings
  • Create a single ATTR_KEY_* constants module —
    ATTR_KEY_DEPRECATED_ANONYMOUS_USER_ID and ATTR_KEY_USER_ID redefined in
    html/html.py, problem/capa_block.py:68-70, video/constants.py
  • Consolidate SerializationError — identical class defined in both
    annotatable/annotatable.py:27 and problem/capa_block.py:85
  • Consolidate StubUserService — duplicated across problem/tests/__init__.py,
    lti/tests/helpers.py, and html/tests/ into a single
    xblocks_contrib/tests/helpers.py
  • Extract pipeline asset-path resolution from discussion/discussion.py:137-144 and
    problem/capa_block.py:436-443video/video_utils.py:209-213 already has the
    correct helper; the others should import it

Why: ~200 lines of duplication eliminated; single source of truth prevents string
drift (the edx-platform.* attribute keys are platform contracts — a typo silently
breaks user attribute lookups).


2.2 Cache ProblemBlock.problem_types and VideoBlock.editable_metadata_fields

  • Add @cached_property to ProblemBlock.problem_types (capa_block.py:631-639) —
    calls etree.XML(self.data) (full DOM parse) on every access; called in
    index_dictionary() twice and in has_support()
  • Cache VideoBlock.editable_metadata_fields across its 3 accesses per studio
    request: studio_view line 256, get_context lines 781 and 789

Why: Eliminates redundant XML parsing and metadata computation on every Studio
editing request.


2.3 Fix WordCloudBlock JS inlining

  • Switch word_cloud.py from load_unicode to add_javascript_url for d3.min.js
    (139 KB), d3.layout.cloud.js (12.3 KB), and word_cloud.js (8.5 KB)
  • Same pattern applies to annotatable.py, lti.py, and poll.py which also use
    load_unicode for JS/CSS

Why: VideoBlock already does this correctly. 3 word clouds on one page = 480 KB of
non-cacheable JS inlined into the DOM on every render.


2.4 Fix CI environment naming + add Python 3.13

  • Align CI TOXENV values with tox envlist — CI passes django42/django52 but
    tox.ini declares py312-django{42,52} (naming mismatch means CI environments are not
    in the declared envlist)
  • Add Python 3.13 to CI matrix (python-tests.yml:17) and update classifiers in
    setup.py:155 / pyproject.toml
  • Add cache: 'pip' to actions/setup-python steps

Why: Python 3.13 shipped October 2024 and is 7+ months old with no CI coverage here.


2.5 Resolve LTI block deprecation status

  • Either document a concrete removal plan with target dates, or remove the
    deprecation notice at lti/lti.py:297

Why: The block is marked deprecated in favor of xblock-lti-consumer yet is fully
maintained (1,019 lines). Current ambiguity discourages contribution and signals
contradictory intent to maintainers and consumers.


Tier 3 — Continuous Improvement

3.1 Break up god files in capa/

  • Extract response type classes from responsetypes.py (3,866 lines,
    too-many-lines suppressed) into a responsetypes/ sub-package
  • Work toward reducing capa_block.py (2,737 lines) and inputtypes.py (1,822
    lines)

Prerequisite: Tier 1.2 (coverage floor) must be in place before touching these
files.


3.2 Adopt Ruff; begin adding type annotations

  • Add ruff to quality stack (replaces pycodestyle + isort, gains auto-fix)
  • Systematically clear 35 lint-amnesty markers (carried over from extraction;
    concentrated in lti.py x13, video.py x8)
  • Add type annotations to public method signatures starting from legacy_utils/
    currently only 14 return-type annotations exist across the entire xblocks_contrib/
    source
  • Add mypy or pyright (even --ignore-missing-imports mode) to the quality
    tox environment

3.3 Improve developer documentation

  • Create CONTRIBUTING.rst covering: XBlock extraction workflow, Waffle flag
    integration, running the full test matrix locally, JS test setup
  • Add a "Quick Start" section to README.rst covering the Python dev loop
    end-to-end (currently split between a 13-line docs/getting_started.rst and
    docs/testing.rst, neither linked from README)
  • Fix duplicate 0.16.0 entry in CHANGELOG.rst (appears on lines 17 and 24 with
    different fixes)
  • Fill or remove the four empty Sphinx stub files: docs/quickstarts/index.rst,
    concepts/index.rst, how-tos/index.rst, references/index.rst
  • Remove debug log artifact: lti_2_util.py:130-132 "#### COPY AND PASTE AUTHORIZATION HEADER ####"

3.4 Fix O(N) language scans

  • video.py:547-563 validate() — O(N×M) scan over transcripts × ALL_LANGUAGES;
    replace inner scan with dict lookup
  • video_transcripts_utils.py:194-235 get_endonym_or_label() — linear scan of
    ALL_LANGUAGES per call, called in a loop; convert to @lru_cache dict keyed by
    language code

3.5 Lazy-load heavy imports in responsetypes.py

  • Move module-level imports of numpy, html5lib, shapely.geometry, symmath,
    calc, pyparsing to function-level lazy imports
  • Evaluate the lxml.html.soupparser.fromstring as fromstring_bs import (comment in
    code: "uses Beautiful Soup!!! FIXME?")

Why: capa_block.pycapa_problem.pyresponsetypes.py — every worker process
importing ProblemBlock pays full cold-start cost for all these scientific/parsing
libraries.


3.6 Test quality fixes

  • Add pytest-xdist to requirements/test.in and configure addopts = -n auto for
    parallelism
  • Move pip install -e . from tox.ini commands to deps (currently defeats tox
    environment caching)
  • Fix non-deterministic test data in test_discussion.py_random_string() uses
    unseeded random.choice at collection time, making failures non-reproducible
  • Remove print debug statements from test_discussion.py:110 and :142
  • Fix str vs bytes assertion inconsistency in test_video.py:865 vs :875
  • Convert five near-identical HtmlBlockIndexingTestCase methods to
    @pytest.mark.parametrize
  • Convert for-loop in test_annotatable.py test_annotation_class_attr_with_invalid_highlight to @pytest.mark.parametrize
  • Remove tox from requirements/test.in:7 (circular — tox is installed inside
    tox environments)

Root Causes

Understanding why these issues accumulated helps prevent them from returning:

  1. Extraction without consolidation. Each block was extracted from edx-platform
    carrying its own copy of shared utilities. No consolidation pass followed. The fix is
    mechanical but requires deliberate scheduling.

  2. Inherited debt treated as permanent. The lint-amnesty markers, six,
    setup.py, and pylint suppressions were carried as temporary pragmatism. Without an
    explicit "pay extraction debt" phase, temporary became permanent. The unmerged
    pyproject.toml branch is the canonical example — the work is done but not landed.

  3. Missing safety nets enable entropy. No coverage thresholds, no type checking, no
    auto-formatter means each contribution can silently degrade quality. The codebase
    doesn't deteriorate through malice — it deteriorates because there's no automated
    ratchet preventing regression.


Suggested Sprint Sequencing

Sprint 1 (Foundation — parallelizable across 2-3 devs)
├── Merge pyproject.toml branch [1.1]
├── Add --cov-fail-under=60, fix CI env naming + coverage upload [1.2, 2.4]
└── Remove six [1.4]

Sprint 2 (Safety + Performance — depends on Sprint 1)
├── Write tests for video_handlers.py, xml_utils.py, poll.py [1.2 continued]
├── Consolidate VAL calls in VideoBlock [1.3]
└── Cache problem_types + editable_metadata_fields [2.2]

Sprint 3 (Consolidation — depends on Sprint 2 coverage)
├── Move shared utilities to legacy_utils/ [2.1]
├── Fix WordCloud JS inlining [2.3]
└── Resolve LTI deprecation [2.5]

Ongoing (paced by developer pain)
├── Adopt Ruff [3.2] — improves every subsequent PR immediately
├── Break up god files [3.1] — only after coverage is adequate
└── Documentation [3.3] — opportunistically

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions