|
4 | 4 |
|
5 | 5 | ## Data & Content |
6 | 6 |
|
7 | | -- [ ] **Standards tagging** — Map 149 Grade 5 lessons to their CCSS standards. 110 standards (Gr 4-6) already imported. Could use AI to match lesson content against standard descriptions, or reference EngageNY's published curriculum maps. |
8 | | -- [ ] **Concept dependency graph** — Build prerequisite edges between standards and lessons. E.g., "5.NF.1 (add fractions) requires 4.NF.3 (understand fraction equivalence)." This is the core differentiator. |
| 7 | +- [ ] **Standards parsing (quick win)** — Extract CCSS codes from lesson HTML (they're in `ny-list-focusstandards` CSS classes) and create StandardTagging records linking them to lessons/topics. 110 standards already imported for Gr 4-6. This is mechanical parsing, not AI — the standards are explicitly listed in the HTML. |
| 8 | +- [ ] **Problem extraction (core data asset)** — AI reads each problem set/exit ticket/homework HTML, identifies individual problems, extracts text/answer/type, creates Problem records. Each Problem links to its Expression records for structured math rendering. This is where we hit the formatting chaos and start normalizing. Multi-tier validation: automated structural checks → Fiverr human review. Design with validation status fields (extracted, auto_validated, human_validated). |
| 9 | +- [ ] **Web presentation templates** — Stop serving raw Aspose HTML. Build purpose-built templates for each content type (lesson plan, problem set, exit ticket, homework) using proper web CSS with MathML from Expression records. Depends on Problem extraction to be truly useful — otherwise we're just wrapping HTML blobs. Key insight: Aspose HTML is source data, not presentation layer. |
| 10 | +- [ ] **Standards tagging (AI-assisted)** — Beyond the explicit CCSS codes in the HTML, use AI to match lesson content against standard descriptions for deeper tagging. Builds on the mechanical standards parsing above. |
| 11 | +- [ ] **Concept dependency graph** — Build prerequisite edges between standards and lessons. E.g., "5.NF.1 (add fractions) requires 4.NF.3 (understand fraction equivalence)." This is the core differentiator. Depends on Problem extraction + Standards tagging. |
9 | 12 | - [ ] **Inline homework/exit ticket solutions** — Currently links out to Google Drive PDFs. Fetch, parse, and render answer keys inline on lesson pages. 148 homework + 135 exit ticket solution URLs available. |
10 | 13 | - [ ] **Parent newsletter recovery** — 34 parent newsletter URLs redirect to Google sign-in (broken). Find working URLs or recreate the content. |
11 | 14 | - [ ] **Grades 4, 6-8 EngageNY content** — Convert remaining grade DOCXs through the Aspose pipeline. Source ZIPs available in concept-research project. |
|
29 | 32 | - [ ] **Tech launch** — Show HN + Product Hunt in the same week. Drafts in `docs/publicity-plan.md`. |
30 | 33 | - [ ] **Homeschool push** — Well-Trained Mind Forums, SEA Homeschoolers, Freedom Homeschooling. Time for July-August (curriculum planning season). |
31 | 34 |
|
| 35 | +## Research |
| 36 | + |
| 37 | +- [ ] **remark-math** — Evaluate https://github.com/remarkjs/remark-math for math rendering in web content. It's a remark/rehype plugin ecosystem for rendering LaTeX math syntax in Markdown/HTML via KaTeX or MathJax. Could be useful if we want to author/store math as LaTeX and render client-side, or for teacher-authored content. Compare with our current approach (MathML from Plurimath rendered natively by browsers). Key questions: do we need client-side rendering, or is server-side MathML sufficient? Does remark-math play well with Hotwire/Turbo? |
| 38 | +- [ ] **Plurimath.js for editing** — Plurimath has a JavaScript version that could enable WYSIWYG math editing in the browser. Teachers could create exercises with a dropdown to choose math format (LaTeX, visual editor, paste from Word). We'd store as MathML. Evaluate feasibility and integration with Stimulus controllers. |
| 39 | + |
32 | 40 | ## Infrastructure |
33 | 41 |
|
34 | 42 | - [ ] **pgvector embeddings** — Embed lesson content and concept descriptions for semantic search. PostgreSQL + pgvector already planned in the architecture. |
|
48 | 56 | - [x] Broken link audit — 34 parent newsletter URLs identified and cleared |
49 | 57 | - [x] Publicity plan drafted — 6 phases with draft copy for each channel (`docs/publicity-plan.md`) |
50 | 58 | - [x] EMBARC extraction plan documented (`docs/embarc-extraction-plan.md`) |
| 59 | +- [x] Domain model designed (12 tables): Grade → ContentModule → Topic → Lesson with LessonPlan, ProblemSet, ExitTicket, Homework, Assessment, Problem, Standard, Expression |
| 60 | +- [x] HTML normalizer (Nokogiri-based) strips meaningless whitespace from Aspose output |
| 61 | +- [x] HTML parser splits lessons into components by detecting Name/Date section boundaries |
| 62 | +- [x] Round-trip validation: 149/149 lessons pass (normalize → split → reconstruct = original) |
| 63 | +- [x] Browse UI: Grade → Module → Topic → Lesson with Tailwind |
| 64 | +- [x] OMML → MathML pipeline: 13,660 expressions extracted, 13,659 converted (99.99%) |
| 65 | +- [x] Plurimath fork (jcasimir/plurimath) with fixes for accent/overbar expressions — PRs submitted upstream |
| 66 | +- [x] MathConverter wrapper handles project-specific edge cases (track changes, Wingdings symbols) with manifest tracking |
| 67 | +- [x] Expression model stores OMML, MathML, text representation, and conversion status |
| 68 | +- [x] PDF comparison tooling (pdftoppm + ImageMagick) for visual validation |
| 69 | +- [x] Key finding: Aspose HTML is source data, not presentation layer — web templates should be purpose-built |
0 commit comments