An index and toolkit for the Encyclopedia of Triangle Centers.
Our goal is to modernize the ETC by:
- Structuring Data: Parsing unstructured HTML into rigorous Python objects and data structures.
- Semantic Linking: Establishing explicit, machine-verifiable links between centers (e.g.,
X(1)toX(2)) and glossary terms. - Enhanced Documentation: Generating high-quality ReStructuredText (RST) documentation compatible with Sphinx, featuring MathJax for coordinates and equations.
- Integration: Preparing the data for integration with
geometor-explorerand other tools in the GEOMETOR ecosystem.
The project employs a multi-stage pipeline to process the ETC:
- Ingestion: Splitting the massive source HTML files into manageable, individual center files.
- Parsing: Extracting structured data (trilinears, barycentrics, notes) from each center's HTML.
- Glossary Extraction: Automatically identifying and extracting terms from the ETC glossary.
- Transformation: Generating RST files with:
* Canonical cross-references (e.g.,
:ref:`X(1) <X(1)>`). * Automatic glossary term linking (e.g.,:term:`isogonal conjugate`). * Metadata for sorting and categorization.
To run the pipeline:
python3 -m geometor.etc --input /path/to/source --split-dir /path/to/split --output /path/to/docsrc/centersOptions:
--limit N: Process only the first N centers (useful for testing).--skip-ingest: Skip the HTML splitting step if already done.
The pipeline generates:
- Individual RST files for each center (e.g.,
x-00001.rst). - A
glossary.rstfile with extracted definitions. - An
index.rstusing thecollectiondirective to organize the centers.
- beautifulsoup4
- docutils
- sphinx
- photon-platform (for configuration)