MHDBDB TEI Repository

TEI-encoded Middle High German literature texts with semantic annotations and dual web interfaces from the Mittelhochdeutsche Begriffsdatenbank (MHDBDB), University of Salzburg.

Live: Main Site | Playground

Overview

All data originates from the Mittelhochdeutsche Begriffsdatenbank (MHDBDB) at the University of Salzburg — a research project with over 50 years of medievalist text and concept scholarship.

Corpus Content

TEI-encoded texts of Middle High German literature
Authority files: persons, works, lexicon, concepts, genres, names, variants
Pre-built indices for fast search
Comprehensive test suite with Playwright integration

Two Web Interfaces

Feature	Main Site (index.html)	Playground (playground/)
Purpose	Public search & reading	Advanced research & analysis
Data	Pre-built indices	Pre-built authority index + lazy-loaded TEI
Search	Single lemma with filters	Multiple search types (incl. multi-lemma)
Target Users	General public, students	Researchers, medievalists

📚 Documentation

For Developers

CLAUDE.md - Primary developer guide and project overview
docs/INDEX.MD - Comprehensive knowledge base gateway with links to specialized documentation

For Users

Playground includes built-in help and search examples
Authority data browsing with filtering and sorting

Quick Start

Start Web Server

npm run serve
# Opens http://localhost:8080

Build Indices (Optional)

Pre-built indices are included. To rebuild:

npm run build              # Build CSS + all indices + manifest
npm run build:authority    # Build authority index only
npm run build:corpus       # Build corpus index only
npm run validate:indices   # Validate generated indices

Run Tests

npm test                   # Run all tests
npm run test:ui            # Interactive test UI
npm run test:headed        # Run with visible browser

Programmatic Access

TEI files reference authority data via xml:id:

<author ref="#person_445">Meister Eckhart</author>
<w lemmaRef="lexicon.xml#lemma_38952" pos="N" ana="lexicon.xml#lemma_38952_sense_1">vriunt</w>

XPath Examples

//tei:persName[@type='preferred']                    # All preferred person names
//tei:w[contains(@lemmaRef, 'lemma_38952')]          # All tokens linked to lemma 'vriunt'

Authority Files

File	Content
persons.xml	Authors and historical persons
works.xml	Works and manuscript metadata
lexicon.xml	Lemmata with grammatical annotations
concepts.xml	Semantic concepts (taxonomy)
genres.xml	Literary genres (taxonomy)
names.xml	Proper names with semantic relations
variants.xml	Orthographic variants mapped to lemmas

Schema

Custom RELAX NG schemas validate all TEI files in the repository:

Schema	Validates	Files
`mhdbdb.rnc`	Corpus texts (`tei/*.tei.xml`)	666
`mhdbdb-authority.rnc`	Authority files (`authority-files/*.xml`)	8

The schemas constrain standard TEI attributes to MHDBDB conventions (required @xml:id on <w>, @join on <pc>, allowed div/@type values, etc.). See schema/README.md for rationale and design decisions.

Architecture

Pre-Built Indices

The repository includes pre-built compressed indices for fast loading:

Index	Contains
authority-index.json.gz	All authority files merged
corpus-index.json.gz	Texts with lemma positions

Features:

Compressed JSON format for reduced download size
IndexedDB caching with automatic expiration
No XML parsing overhead in browser

Technology Stack

Frontend: Vanilla JavaScript (ES Modules), Tailwind CSS
Compression: Pako (gzip compression)
Storage: Dexie.js (IndexedDB wrapper)
Testing: Playwright
Build: Python + lxml for index generation
Server: http-server (npm) or Python http.server

Middle High German Normalization

All search functions use centralized MHG character normalization:

Long vowels: â→a, ê→e, î→i, ô→o, û→u
Umlauts: ä→ae, ö→oe, ü→ue
Parity between Python (build) and JavaScript (runtime)
Comprehensive automated test coverage

License & Contact

License: CC BY-NC-SA 4.0 Contact: mhdbdb@plus.ac.at | https://mhdbdb.plus.ac.at Project: University of Salzburg, 50+ years of medievalist research

Acknowledgement

This project was supported by CLARIAH-AT.

CLARIAH-AT provides essential infrastructure and support for digital humanities research in Austria. We gratefully acknowledge this contribution.

Name		Name	Last commit message	Last commit date
Latest commit History 378 Commits
.agent/workflows		.agent/workflows
.gemini/skills/pos-disambiguator		.gemini/skills/pos-disambiguator
.github/workflows		.github/workflows
Wenzelsbibel		Wenzelsbibel
assets		assets
authority-files		authority-files
data		data
docs		docs
lemma		lemma
playground		playground
publications		publications
schema		schema
scripts		scripts
tei		tei
testing		testing
.gitattributes		.gitattributes
.gitignore		.gitignore
404.html		404.html
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
README.md		README.md
hilfe-daten.html		hilfe-daten.html
hilfe-korpussuche.html		hilfe-korpussuche.html
hilfe-playground.html		hilfe-playground.html
hilfe.html		hilfe.html
impressum.html		impressum.html
index.html		index.html
korpus.html		korpus.html
package-lock.json		package-lock.json
package.json		package.json
tailwind.config.js		tailwind.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MHDBDB TEI Repository

Overview

Corpus Content

Two Web Interfaces

📚 Documentation

For Developers

For Users

Quick Start

Start Web Server

Build Indices (Optional)

Run Tests

Programmatic Access

XPath Examples

Authority Files

Schema

Architecture

Pre-Built Indices

Technology Stack

Middle High German Normalization

License & Contact

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MHDBDB TEI Repository

Overview

Corpus Content

Two Web Interfaces

📚 Documentation

For Developers

For Users

Quick Start

Start Web Server

Build Indices (Optional)

Run Tests

Programmatic Access

XPath Examples

Authority Files

Schema

Architecture

Pre-Built Indices

Technology Stack

Middle High German Normalization

License & Contact

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages