Kardenwort

Kontext. Kern. Karte. (Context. Core. Card.)

Kardenwort is an intelligent command-line utility designed to accelerate language learning by deconstructing text and automatically creating context-rich flashcards for Anki. It serves as a powerful offline companion to your study materials, transforming any text—books, articles, or AI-generated content—into a structured vocabulary list ready for efficient learning.

This tool is not just a word collector; it's an intelligent pipeline powered by two NLP libraries, large dictionaries, semantic rules, and a user-trainable override system to achieve high-accuracy lemmatization and word deconstruction, especially for grammatically complex languages like German.

Map of Contents

Kardenwort

The Kardenwort Philosophy in Brief

The goal of Kardenwort is to reduce the complexity of language learning, particularly for synthetic languages like German where words are heavily inflected and compounded. It achieves this by automating the difficult task of deconstructing words to their base form (lemma).

Our core principles are:

Separating Reading from Study: Reduce cognitive load by splitting content consumption and vocabulary acquisition into two distinct, focused activities.
Medium Independence: Kardenwort is a companion to your learning material, not a replacement. Use it with physical books, PDFs, or any other media without losing the original context (diagrams, formatting, etc.).
Offline First & Privacy: The entire process runs locally. Your data is never sent to the cloud, ensuring privacy and reliability.
Simple is Not Easy: We do the complex work of linguistic analysis to provide you with a simple, clean, and actionable list of words, making your learning process easy.

Name		Name	Last commit message	Last commit date
Latest commit History 929 Commits
.vscode		.vscode
data		data
docs		docs
scripts		scripts
src/kardenwort		src/kardenwort
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.ini.template		config.ini.template
pyproject.toml		pyproject.toml
release-notes.md		release-notes.md
requirements.txt		requirements.txt

Argument	Description	Example
`--type`	The type of cards to create (`word` or `sentence`). Not needed for `mixed-triple` mode.	`--type word`
`--lemmas-per-line`	A special mode that outputs one line of sorted lemmas per input line. Mutually exclusive with `--type`.	`--lemmas-per-line`
`--language`	The source language of the text (`de` or `en`).	`--language de`
`--mode`	(Runner only) Processing mode (`single`, `dual`, `triple`, `mixed-triple`). `mixed-triple` runs sentence and word modes sequentially for a shared deck.	`--mode mixed-triple`
`--anki-csv-header`	(Runner only) JSON list of Anki field names. Overrides `[anki_fields]` from `config.ini`.	`--anki-csv-header '["FieldA", "FieldB"]'`
`--anki-field-mapping`	(Runner only) JSON object mapping Anki fields to data sources. Overrides `[anki_field_mapping.*]` from `config.ini`.	`--anki-field-mapping '{"FieldA": "lemma"}'`

Argument	Description	Example
`--text`	Process a string directly. Mutually exclusive with `--text1-file`.	`--text "This is a test."`
`--multi-text`	Parse `--text` or `stdin` as up to three texts separated by `---`.	`--multi-text`
`--text1-file`	Path to the primary source text file.	`--text1-file "source.txt"`
`--text2-file`	Path to the second text file (e.g., translation).	`--text2-file "target.txt"`
`--text3-file`	Path to the third text file.	`--text3-file "extra.txt"`
`--output-file`	Path for the output `.tsv` file. If omitted, prints to standard output.	`--output-file "out/my_deck.tsv"`
`--basename-add-timestamp`	Prepend a `YYYYMMDDHHMMSS-` timestamp to the output filename.	`--basename-add-timestamp`
`--basename-add-first-words`	Appends a slug to the filename from the first `N` words (default: 4).	`--basename-add-first-words 3`
`--stdout-print-output-basename`	Print the final output filename to standard output.	`--stdout-print-output-basename`

Argument	Description	Example
`--anki-create-subdecks`	Generates a parent deck with a subdeck for each mode (e.g., `My-Text::My-Text.word.de`).	`--anki-create-subdecks`
`--anki-markdown-decks`	Parses Markdown headers in the source text to create a hierarchical deck structure.	`--anki-markdown-decks`
`--anki-sentence-subdecks`	Creates a final subdeck level for each sentence. Requires `--anki-markdown-decks`.	`--anki-sentence-subdecks`
`--anki-parent-deck`	Manually specifies a parent deck name for shared deck creation.	`--anki-parent-deck "My-Book"`
`--anki-deck-content`	Populates Anki deck descriptions. Choices: `parent-source`, `parent-translations`, `subdeck-source`, `subdeck-translations`.	`--anki-deck-content parent-source`
`--strip-headers`	Strip Markdown headers from text fields in the final output. Choices: `all`, `source`, `translations`. Default is `all` if no argument is given.	`--strip-headers source`
`--suspend-cards`	Suspends all newly imported or updated cards in Anki.	`--suspend-cards`

Argument	Description	Example
`--sentence-context-size`	Sets the number of preceding and succeeding sentences (`N`) to include as context. Runner default is `4`.	`--sentence-context-size 2`
`--tts-destination-lang`	The destination language for TTS field activation (e.g., 'ru', 'en').	`--tts-destination-lang ru`
`--add-wordlist-col`	(Auto-enabled) Include a list of unique words in `SentenceSourceWordlist`. Driven by mapping.	`--add-wordlist-col`
`--wordlist-use-br`	Use `<br>` tags for wordlist. Can be set in `config.ini` `[output_format]`.	`--wordlist-use-br`
`--add-header`	Include TSV header row. Can be set in `config.ini` `[output_format]`.	`--add-header`
`--add-source-word-col`	(Auto-enabled) Add inflected word to `WordSourceInflectedForm`. Driven by mapping.	`--add-source-word-col`
`--add-sentence-index-col`	(Auto-enabled) Add index for sorting to `SentenceSourceIndex`. Driven by mapping.	`--add-sentence-index-col`

Argument	Description	Example
`--lemma-override-file`	Path to a TSV file for context-aware lemma overrides.	`--lemma-override-file "data/overrides.tsv"`
`--lemma-index-file`	Path to a word frequency CSV file for sorting.	`--lemma-index-file "data/frequency.csv"`
`--deduplication-scope`	Sets the scope for lemma deduplication. `global`: unique lemmas across the entire text. `sentence`: unique per sentence. `none`: no deduplication.	`--deduplication-scope sentence`
`--prefer-shortest-form`	With `global` deduplication, prefer the shortest word form of a lemma instead of the first one encountered.	`--prefer-shortest-form`
`--force-proper-noun-capitalization`	Force capitalization of proper noun lemmas (PROPN).	`--force-proper-noun-capitalization`

Argument	Description	Example
`--de-gcs`	Enable German Compound Splitting.	`--de-gcs`
`--de-dictionary-file`	Path to the dictionary file used by GCS for validation.	`--de-dictionary-file "data/de/german.dic"`
`--de-gcs-preserve-compound-word`	Include the original compound word in the card list along with its split parts.	`--de-gcs-preserve-compound-word`
`--de-gcs-add-parts-to-wordlist`	Also add the split components to the `SentenceSourceWordlist` field.	`--de-gcs-add-parts-to-wordlist`
`--de-gcs-split-mode`	Set splitting mode: `only-nouns` (safe), `any` (aggressive), or `combined`.	`--de-gcs-split-mode combined`
`--de-gcs-pos-tags`	Specify which Part-of-Speech tags to apply splitting to (e.g. `NOUN PROPN` or `!VERB`).	`--de-gcs-pos-tags "NOUN PROPN"`
`--de-fix-genitive`	Attempts to correct German genitive noun lemmas (e.g., 'Hauses' -> 'Haus').	`--de-fix-genitive`
`--de-force-noun-capitalization`	Force capitalization of all German noun lemmas (NOUN, PROPN).	`--de-force-noun-capitalization`

Argument	Description	Example
`--show-success-message`	Display a user-friendly success message on standard output upon completion.	`--show-success-message`
`--play-sound-on-completion`	Play a system beep sound upon successful completion of the entire process.	`--play-sound-on-completion`

Key	Description	Mode
`lemma`	The base form of the word (lemmatized).	Word
`source_word`	The original inflected word from the text.	Word
`source_sentence`	The current sentence/unit being processed.	Both
`source_context_left`	Preceding context sentence(s).	Both
`source_context_right`	Succeeding context sentence(s).	Both
`target_sentence`	Primary translation of the source sentence.	Both
`target_context_left`	Preceding translation context.	Both
`target_context_right`	Succeeding translation context.	Both
`tertiary_sentence`	Tertiary translation (if available).	Both
`tertiary_context_left`	Preceding tertiary translation context.	Both
`tertiary_context_right`	Succeeding tertiary translation context.	Both
`cloze`	The source sentence, intended for cloze deletion.	Both
`wordlist`	A list of all unique lemmas found in the sentence.	Both
`sentence_index`	The serial index of the sentence (e.g., `000001`).	Both
`deck_name`	The final computed Anki deck name.	Both
`tts_source_[lang]`	TTS flag (e.g., `tts_source_de`) - set to "1" on match.	Both
`tts_dest_[lang]`	TTS flag (e.g., `tts_dest_en`) - set to "1" on match.	Both

Folders and files

Latest commit

History

Repository files navigation

Kardenwort

Map of Contents

The Kardenwort Philosophy in Brief

Key Features

Key Advantages and Differences from Alternatives

Project Structure

Installation and Setup

Setup Steps

Usage and Workflows

Command-Line Runner

Using Pre-configured Windows CMD Scripts

⚠️ Important Limitation: Single-Line Processing

GoldenDict-ng Integration

Core Functionality: The Two Main Modes

Understanding Input Processing

The Hybrid Mechanism for Sentence Splitting

Multi-Text Input from a Single Source

The Processing Pipeline in Detail

The Anki Card Template

Command-Line Arguments Reference

Core Arguments

Input & Output

Anki Deck Control & Import Options

Card Content & Formatting

NLP & Lemmatization Control

German Compound Splitting (GCS) Options

Runner-Specific & UX Options

Standard Output (STDOUT) Options

Configuration

Configuration Priority and Overrides

Flexible Anki Field Mapping

1. Defining your Note Type

2. Mapping Data Sources

Available Data Source Keys

Important Notes

Our Ecosystem

Development and Testing

Running Tests and Coverage

Development Repositories

My Personal Motivation

Kardenwort Ecosystem

License and Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages