I asked claude about the code improvements in this repository with following prompt
Prompt:
Use a "Fan-out" / "fan-in" (N researchers → synthesizer) approach to research the
question "how best should I optimize this codebase"?
▎ Minimum 5 subagents, use Sonnet to do the research and individual contemplation; Opus to synthesize.
Agents run by claude:
1. Architecture & code structure — module organization, inheritance, coupling
2. Test coverage & quality — gaps, brittleness, fixtures
3. Performance & runtime — queries, caching, asset handling
4. Dependency management & build tooling — packaging, uv/pyproject.toml, CI
5. Code duplication & refactoring — DRY opportunities, dead code
6. Developer experience & documentation — docs, types, linting, onboarding
and this is what Claude analyzed and replied:
Claude reply:
Executive Summary
The codebase is functional but carries significant technical debt inherited from its
edx-platform extraction. The three most consequential issues are:
- VideoBlock's redundant database calls on every render — direct learner-facing
latency
- No coverage thresholds + critically under-tested handler code
(video_handlers.py at 16.9% line coverage), making refactoring unsafe
- Legacy
setup.py build system with unpinned dependencies and a dead six
dependency that blocks modernization
Pervasive copy-paste duplication across XBlocks is a natural artifact of extraction
without a consolidation pass. Addressing these in order compounds: build system → CI
safety → safe refactoring → performance.
Tier 1 — Critical / Do First
1.1 Merge the pyproject.toml branch
Why: The pyproject.toml branch already exists and is ready to land. The current
setup.py is 120+ lines of bespoke requirement-loading logic that exposes zero version
constraints to downstream consumers. This unblocks uv migration, proper dep bounds,
and all other CI fixes.
1.2 Establish a coverage floor
Why: Every refactoring item in Tiers 2 and 3 requires this safety net first.
1.3 Fix VideoBlock's redundant VAL database calls
Why: Direct learner-facing latency. VideoBlock is the most-rendered XBlock in the
platform.
1.4 Remove the six dependency
Why: python_requires=">=3.12" makes every six usage a dead import. All
replacements are 1:1 stdlib equivalents.
Tier 2 — High Value
2.1 Consolidate cross-XBlock duplication into legacy_utils/
The extraction left utilities copy-pasted across modules with no consolidation pass.
Why: ~200 lines of duplication eliminated; single source of truth prevents string
drift (the edx-platform.* attribute keys are platform contracts — a typo silently
breaks user attribute lookups).
2.2 Cache ProblemBlock.problem_types and VideoBlock.editable_metadata_fields
Why: Eliminates redundant XML parsing and metadata computation on every Studio
editing request.
2.3 Fix WordCloudBlock JS inlining
Why: VideoBlock already does this correctly. 3 word clouds on one page = 480 KB of
non-cacheable JS inlined into the DOM on every render.
2.4 Fix CI environment naming + add Python 3.13
Why: Python 3.13 shipped October 2024 and is 7+ months old with no CI coverage here.
2.5 Resolve LTI block deprecation status
Why: The block is marked deprecated in favor of xblock-lti-consumer yet is fully
maintained (1,019 lines). Current ambiguity discourages contribution and signals
contradictory intent to maintainers and consumers.
Tier 3 — Continuous Improvement
3.1 Break up god files in capa/
Prerequisite: Tier 1.2 (coverage floor) must be in place before touching these
files.
3.2 Adopt Ruff; begin adding type annotations
3.3 Improve developer documentation
3.4 Fix O(N) language scans
3.5 Lazy-load heavy imports in responsetypes.py
Why: capa_block.py → capa_problem.py → responsetypes.py — every worker process
importing ProblemBlock pays full cold-start cost for all these scientific/parsing
libraries.
3.6 Test quality fixes
Root Causes
Understanding why these issues accumulated helps prevent them from returning:
-
Extraction without consolidation. Each block was extracted from edx-platform
carrying its own copy of shared utilities. No consolidation pass followed. The fix is
mechanical but requires deliberate scheduling.
-
Inherited debt treated as permanent. The lint-amnesty markers, six,
setup.py, and pylint suppressions were carried as temporary pragmatism. Without an
explicit "pay extraction debt" phase, temporary became permanent. The unmerged
pyproject.toml branch is the canonical example — the work is done but not landed.
-
Missing safety nets enable entropy. No coverage thresholds, no type checking, no
auto-formatter means each contribution can silently degrade quality. The codebase
doesn't deteriorate through malice — it deteriorates because there's no automated
ratchet preventing regression.
Suggested Sprint Sequencing
Sprint 1 (Foundation — parallelizable across 2-3 devs)
├── Merge pyproject.toml branch [1.1]
├── Add --cov-fail-under=60, fix CI env naming + coverage upload [1.2, 2.4]
└── Remove six [1.4]
Sprint 2 (Safety + Performance — depends on Sprint 1)
├── Write tests for video_handlers.py, xml_utils.py, poll.py [1.2 continued]
├── Consolidate VAL calls in VideoBlock [1.3]
└── Cache problem_types + editable_metadata_fields [2.2]
Sprint 3 (Consolidation — depends on Sprint 2 coverage)
├── Move shared utilities to legacy_utils/ [2.1]
├── Fix WordCloud JS inlining [2.3]
└── Resolve LTI deprecation [2.5]
Ongoing (paced by developer pain)
├── Adopt Ruff [3.2] — improves every subsequent PR immediately
├── Break up god files [3.1] — only after coverage is adequate
└── Documentation [3.3] — opportunistically
I asked claude about the code improvements in this repository with following prompt
Prompt:
Agents run by claude:
and this is what Claude analyzed and replied:
Claude reply:
Executive Summary
The codebase is functional but carries significant technical debt inherited from its
edx-platformextraction. The three most consequential issues are:latency
(
video_handlers.pyat 16.9% line coverage), making refactoring unsafesetup.pybuild system with unpinned dependencies and a deadsixdependency that blocks modernization
Pervasive copy-paste duplication across XBlocks is a natural artifact of extraction
without a consolidation pass. Addressing these in order compounds: build system → CI
safety → safe refactoring → performance.
Tier 1 — Critical / Do First
1.1 Merge the
pyproject.tomlbranchorigin/pyproject.tomlintomainsetup.pyentirely[wheel] universal = 1insetup.cfg— incorrect for apython_requires=">=3.12"packagerequirements/base.in(currently all 28 deps are completely unpinned)
python setup.py sdist bdist_wheelwithpython -m buildinpypi-publish.yml:34Why: The
pyproject.tomlbranch already exists and is ready to land. The currentsetup.pyis 120+ lines of bespoke requirement-loading logic that exposes zero versionconstraints to downstream consumers. This unblocks
uvmigration, proper dep bounds,and all other CI fixes.
1.2 Establish a coverage floor
--cov-fail-under=60topytestconfig (ratchet up over time)django52coverage is silently discarded)
video/video_handlers.py— 16.9% line / 0% branch (AJAXdispatch layer handling student transcript/studio interactions)
video/bumper_utils.py— 26.9% line / 0% branchlegacy_utils/xml_utils.py— 34.4% line / 10.9% branch(shared by all 8 XBlocks)
poll/poll.py— 46.9% line / 9.1% branch (only 2 tests;student_view, all handlers, and XML round-trip untested)discussion/discussion.py— 63.9% line / 25% branch(
student_view,author_view,student_view_dataall untested)Why: Every refactoring item in Tiers 2 and 3 requires this safety net first.
1.3 Fix VideoBlock's redundant VAL database calls
get_html()from 3 serial DB calls to 1:get_urls_for_profiles(
video.py:323),get_video_info(video.py:349),get_course_video_image_urlin_poster()(video.py:1164)get_context()dual VAL calls (video.py:803+:829) — the second callsubsumes the first; first result is unused
@request_cachedto_poster()(video.py:1158)get_available_transcript_languages()(
video_transcripts_utils.py:79) — 10 videos in a vertical currently fires 10 separateDB calls
Why: Direct learner-facing latency. VideoBlock is the most-rendered XBlock in the
platform.
1.4 Remove the
sixdependencysix.moves.map,six.moves.range,six.moves.zipinresponsetypes.py:30with stdlib equivalentssixusage ininputtypes.py:54six.moves.xrangeinsafe_exec/safe_exec.py:51withrangesixin test files:response_xml_factory.py:5,test_inputtypes.py:29,test_safe_exec.py:19sixfromrequirements/base.inandrequirements/test.in:6Why:
python_requires=">=3.12"makes everysixusage a dead import. Allreplacements are 1:1 stdlib equivalents.
Tier 2 — High Value
2.1 Consolidate cross-XBlock duplication into
legacy_utils/The extraction left utilities copy-pasted across modules with no consolidation pass.
stringify_childrentolegacy_utils/— currently defined verbatim inproblem/stringify.py,html/html.py:111, andpoll/poll.py:52(identical docstringsincluding the StackOverflow URL)
HTML()/Text(markupsafe wrappers) — currently inproblem/markup.py,discussion/discussion.py:34-54,poll/poll.py:29-49withcopy-pasted docstrings
ATTR_KEY_*constants module —ATTR_KEY_DEPRECATED_ANONYMOUS_USER_IDandATTR_KEY_USER_IDredefined inhtml/html.py,problem/capa_block.py:68-70,video/constants.pySerializationError— identical class defined in bothannotatable/annotatable.py:27andproblem/capa_block.py:85StubUserService— duplicated acrossproblem/tests/__init__.py,lti/tests/helpers.py, andhtml/tests/into a singlexblocks_contrib/tests/helpers.pydiscussion/discussion.py:137-144andproblem/capa_block.py:436-443—video/video_utils.py:209-213already has thecorrect helper; the others should import it
Why: ~200 lines of duplication eliminated; single source of truth prevents string
drift (the
edx-platform.*attribute keys are platform contracts — a typo silentlybreaks user attribute lookups).
2.2 Cache
ProblemBlock.problem_typesandVideoBlock.editable_metadata_fields@cached_propertytoProblemBlock.problem_types(capa_block.py:631-639) —calls
etree.XML(self.data)(full DOM parse) on every access; called inindex_dictionary()twice and inhas_support()VideoBlock.editable_metadata_fieldsacross its 3 accesses per studiorequest:
studio_viewline 256,get_contextlines 781 and 789Why: Eliminates redundant XML parsing and metadata computation on every Studio
editing request.
2.3 Fix WordCloudBlock JS inlining
word_cloud.pyfromload_unicodetoadd_javascript_urlford3.min.js(139 KB),
d3.layout.cloud.js(12.3 KB), andword_cloud.js(8.5 KB)annotatable.py,lti.py, andpoll.pywhich also useload_unicodefor JS/CSSWhy: VideoBlock already does this correctly. 3 word clouds on one page = 480 KB of
non-cacheable JS inlined into the DOM on every render.
2.4 Fix CI environment naming + add Python 3.13
TOXENVvalues with tox envlist — CI passesdjango42/django52buttox.inideclarespy312-django{42,52}(naming mismatch means CI environments are notin the declared envlist)
python-tests.yml:17) and update classifiers insetup.py:155/pyproject.tomlcache: 'pip'toactions/setup-pythonstepsWhy: Python 3.13 shipped October 2024 and is 7+ months old with no CI coverage here.
2.5 Resolve LTI block deprecation status
deprecation notice at
lti/lti.py:297Why: The block is marked deprecated in favor of
xblock-lti-consumeryet is fullymaintained (1,019 lines). Current ambiguity discourages contribution and signals
contradictory intent to maintainers and consumers.
Tier 3 — Continuous Improvement
3.1 Break up god files in
capa/responsetypes.py(3,866 lines,too-many-linessuppressed) into aresponsetypes/sub-packagecapa_block.py(2,737 lines) andinputtypes.py(1,822lines)
3.2 Adopt Ruff; begin adding type annotations
ruffto quality stack (replacespycodestyle+isort, gains auto-fix)lint-amnestymarkers (carried over from extraction;concentrated in
lti.pyx13,video.pyx8)legacy_utils/—currently only 14 return-type annotations exist across the entire
xblocks_contrib/source
mypyorpyright(even--ignore-missing-importsmode) to thequalitytox environment
3.3 Improve developer documentation
CONTRIBUTING.rstcovering: XBlock extraction workflow, Waffle flagintegration, running the full test matrix locally, JS test setup
README.rstcovering the Python dev loopend-to-end (currently split between a 13-line
docs/getting_started.rstanddocs/testing.rst, neither linked from README)0.16.0entry inCHANGELOG.rst(appears on lines 17 and 24 withdifferent fixes)
docs/quickstarts/index.rst,concepts/index.rst,how-tos/index.rst,references/index.rstlti_2_util.py:130-132"#### COPY AND PASTE AUTHORIZATION HEADER ####"3.4 Fix O(N) language scans
video.py:547-563validate()— O(N×M) scan over transcripts ×ALL_LANGUAGES;replace inner scan with dict lookup
video_transcripts_utils.py:194-235get_endonym_or_label()— linear scan ofALL_LANGUAGESper call, called in a loop; convert to@lru_cachedict keyed bylanguage code
3.5 Lazy-load heavy imports in
responsetypes.pynumpy,html5lib,shapely.geometry,symmath,calc,pyparsingto function-level lazy importslxml.html.soupparser.fromstring as fromstring_bsimport (comment incode: "uses Beautiful Soup!!! FIXME?")
Why:
capa_block.py→capa_problem.py→responsetypes.py— every worker processimporting
ProblemBlockpays full cold-start cost for all these scientific/parsinglibraries.
3.6 Test quality fixes
pytest-xdisttorequirements/test.inand configureaddopts = -n autoforparallelism
pip install -e .fromtox.inicommandstodeps(currently defeats toxenvironment caching)
test_discussion.py—_random_string()usesunseeded
random.choiceat collection time, making failures non-reproducibleprintdebug statements fromtest_discussion.py:110and:142strvsbytesassertion inconsistency intest_video.py:865vs:875HtmlBlockIndexingTestCasemethods to@pytest.mark.parametrizefor-loop intest_annotatable.py test_annotation_class_attr_with_invalid_highlightto@pytest.mark.parametrizetoxfromrequirements/test.in:7(circular —toxis installed insidetox environments)
Root Causes
Understanding why these issues accumulated helps prevent them from returning:
Extraction without consolidation. Each block was extracted from
edx-platformcarrying its own copy of shared utilities. No consolidation pass followed. The fix is
mechanical but requires deliberate scheduling.
Inherited debt treated as permanent. The
lint-amnestymarkers,six,setup.py, and pylint suppressions were carried as temporary pragmatism. Without anexplicit "pay extraction debt" phase, temporary became permanent. The unmerged
pyproject.tomlbranch is the canonical example — the work is done but not landed.Missing safety nets enable entropy. No coverage thresholds, no type checking, no
auto-formatter means each contribution can silently degrade quality. The codebase
doesn't deteriorate through malice — it deteriorates because there's no automated
ratchet preventing regression.
Suggested Sprint Sequencing
Sprint 1 (Foundation — parallelizable across 2-3 devs)
├── Merge pyproject.toml branch [1.1]
├── Add --cov-fail-under=60, fix CI env naming + coverage upload [1.2, 2.4]
└── Remove six [1.4]
Sprint 2 (Safety + Performance — depends on Sprint 1)
├── Write tests for video_handlers.py, xml_utils.py, poll.py [1.2 continued]
├── Consolidate VAL calls in VideoBlock [1.3]
└── Cache problem_types + editable_metadata_fields [2.2]
Sprint 3 (Consolidation — depends on Sprint 2 coverage)
├── Move shared utilities to legacy_utils/ [2.1]
├── Fix WordCloud JS inlining [2.3]
└── Resolve LTI deprecation [2.5]
Ongoing (paced by developer pain)
├── Adopt Ruff [3.2] — improves every subsequent PR immediately
├── Break up god files [3.1] — only after coverage is adequate
└── Documentation [3.3] — opportunistically