feat: add entity pages (entities/) as a first-class wiki type#78
Open
KylinMountain wants to merge 15 commits into
Open
feat: add entity pages (entities/) as a first-class wiki type#78KylinMountain wants to merge 15 commits into
KylinMountain wants to merge 15 commits into
Conversation
…set_fm_line
Add explicit ordering assertion in test_update_prepends_source_keeps_type
verifying the deterministic json.dumps form ("summaries/b.md", "summaries/a.md").
Pass count=1 to re.sub in _set_fm_line to make first-occurrence intent explicit.
Also wires the entity track into _compile_concepts (Tasks 7 + 8 combined,
since the {entity_briefs} placeholder and the _CONCEPTS_PLAN_USER.format call
are co-dependent — splitting would leave an intermediate red state).
- add _ENTITY_TYPES, _filter_entity_items, _parse_entities_plan
- rewrite _CONCEPTS_PLAN_USER to request nested concepts+entities groups
- add _ENTITY_PAGE_USER / _ENTITY_UPDATE_USER prompts
- read entity briefs and pass both briefs to the plan prompt
- parse nested 'concepts' group with legacy flat-list/flat-dict fallbacks
- generate entities in their own asyncio.gather (4-arity tuples)
- strip ghost links + _write_entity each; handle entity related cross-links
- backlink summary<->entities; pass entity_names/entity_meta to _update_index
Mirror the concept track: collect related-entity slugs into a separate local list used only for backlinks; pass only created/updated entity_names (+entity_meta) to _update_index. Defense-in-depth in _update_index: only _replace_section_entry when name is in entity_meta, otherwise only insert if the link is absent, so a related-only entity can never clobber a pre-existing correct (type + brief) index line with "(other)". Adds regression test test_related_entity_does_not_downgrade_index_label.
- `openkb init` now creates wiki/entities/ alongside wiki/concepts/ - init seed index.md gains ## Entities between ## Concepts and ## Explorations, matching the _update_index template in compiler.py - print_status subdirs list gains "entities" after "concepts" - Tests updated: assert wiki/entities/ exists and index.md contains ## Entities; status test asserts "entities" appears in output
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a first-class entity page type (
wiki/entities/) — specific named things (people, organizations, places, products, named works, events) — maintained incrementally alongsideconcepts/, distinct from abstract concept pages. This closes the largest gap found in the Karpathy LLM Wiki alignment audit (entity pages were named as a first-class type but absent).Design follows the entity track as a parallel to the concept track inside the shared
_compile_concepts, so both short-doc and long-doc (PageIndex) ingest get entities from one place:{concepts, entities}JSON response) — no extra LLM call, rides the prompt cache, salience-filtered (only entities central to a doc or recurring across sources, not every proper noun).type(person/organization/place/product/work/event/other),sources,brief, optionalaliases.entities/registered in the wikilink whitelist (so[[entities/X]]survives ghost-link stripping); bidirectional summary↔entity backlinks;## Entitiessection inindex.md; full cleanup onopenkb remove; declared inAGENTS.mdschema; query agent points "who/what is X" atentities/;openkb initscaffolds the dir andopenkb statuscounts it.Non-goals (deliberately out of scope): no LangExtract / external extractor, no TF-IDF salience prior (IDF is anti-correlated with what we want), no migration of existing concepts, no
relationshipsfrontmatter. Recorded as future work: char-level source grounding, and scale-time frequency/retrieval ranking.Spec:
docs/superpowers/specs/2026-05-30-entity-pages-design.md· Plan:docs/superpowers/plans/2026-05-30-entity-pages.md(both local,docs/is gitignored).Test Plan
pytest -q→ 541 passed_read_entity_briefs,_write_entity, whitelist, summary↔entity backlinks,index.mdEntities section,remove_doc_from_entity_pages, entity plan parsing + end-to-end concept/entity split, related-entity index-label regression, schema declaration, query strategy, init/status wiringopenkb add <doc>→ verifywiki/entities/populated,index.mdhas## Entities, wikilinks resolve