TEI-encoded Middle High German literature texts with semantic annotations and dual web interfaces from the Mittelhochdeutsche Begriffsdatenbank (MHDBDB), University of Salzburg.
Live: Main Site | Playground
All data originates from the Mittelhochdeutsche Begriffsdatenbank (MHDBDB) at the University of Salzburg — a research project with over 50 years of medievalist text and concept scholarship.
- TEI-encoded texts of Middle High German literature
- Authority files: persons, works, lexicon, concepts, genres, names, variants
- Pre-built indices for fast search
- Comprehensive test suite with Playwright integration
| Feature | Main Site (index.html) | Playground (playground/) |
|---|---|---|
| Purpose | Public search & reading | Advanced research & analysis |
| Data | Pre-built indices | Pre-built authority index + lazy-loaded TEI |
| Search | Single lemma with filters | Multiple search types (incl. multi-lemma) |
| Target Users | General public, students | Researchers, medievalists |
- CLAUDE.md - Primary developer guide and project overview
- docs/INDEX.MD - Comprehensive knowledge base gateway with links to specialized documentation
- Playground includes built-in help and search examples
- Authority data browsing with filtering and sorting
npm run serve
# Opens http://localhost:8080Pre-built indices are included. To rebuild:
npm run build # Build CSS + all indices + manifest
npm run build:authority # Build authority index only
npm run build:corpus # Build corpus index only
npm run validate:indices # Validate generated indicesnpm test # Run all tests
npm run test:ui # Interactive test UI
npm run test:headed # Run with visible browserTEI files reference authority data via xml:id:
<author ref="#person_445">Meister Eckhart</author>
<w lemmaRef="lexicon.xml#lemma_38952" pos="N" ana="lexicon.xml#lemma_38952_sense_1">vriunt</w>//tei:persName[@type='preferred'] # All preferred person names
//tei:w[contains(@lemmaRef, 'lemma_38952')] # All tokens linked to lemma 'vriunt'
| File | Content |
|---|---|
| persons.xml | Authors and historical persons |
| works.xml | Works and manuscript metadata |
| lexicon.xml | Lemmata with grammatical annotations |
| concepts.xml | Semantic concepts (taxonomy) |
| genres.xml | Literary genres (taxonomy) |
| names.xml | Proper names with semantic relations |
| variants.xml | Orthographic variants mapped to lemmas |
Custom RELAX NG schemas validate all TEI files in the repository:
| Schema | Validates | Files |
|---|---|---|
mhdbdb.rnc |
Corpus texts (tei/*.tei.xml) |
666 |
mhdbdb-authority.rnc |
Authority files (authority-files/*.xml) |
8 |
The schemas constrain standard TEI attributes to MHDBDB conventions (required @xml:id on <w>, @join on <pc>, allowed div/@type values, etc.). See schema/README.md for rationale and design decisions.
The repository includes pre-built compressed indices for fast loading:
| Index | Contains |
|---|---|
| authority-index.json.gz | All authority files merged |
| corpus-index.json.gz | Texts with lemma positions |
Features:
- Compressed JSON format for reduced download size
- IndexedDB caching with automatic expiration
- No XML parsing overhead in browser
- Frontend: Vanilla JavaScript (ES Modules), Tailwind CSS
- Compression: Pako (gzip compression)
- Storage: Dexie.js (IndexedDB wrapper)
- Testing: Playwright
- Build: Python + lxml for index generation
- Server: http-server (npm) or Python http.server
All search functions use centralized MHG character normalization:
- Long vowels:
â→a, ê→e, î→i, ô→o, û→u - Umlauts:
ä→ae, ö→oe, ü→ue - Parity between Python (build) and JavaScript (runtime)
- Comprehensive automated test coverage
License: CC BY-NC-SA 4.0 Contact: mhdbdb@plus.ac.at | https://mhdbdb.plus.ac.at Project: University of Salzburg, 50+ years of medievalist research
This project was supported by CLARIAH-AT.
CLARIAH-AT provides essential infrastructure and support for digital humanities research in Austria. We gratefully acknowledge this contribution.