Skip to content

Extract document references for automated discovery #6

@cameronhotchkies

Description

@cameronhotchkies

Problem

Documents often reference other standards, frameworks, and related docs in bibliographies, "further reading" sections, and inline hyperlinks. We currently don't capture these, missing an opportunity for automated document discovery.

Proposal

Extract bibliography entries, citations, and hyperlinked documents as external_reference facts to enable a "retrieve next level" workflow: upload one document, discover 6 more to fetch.

Implementation

  1. Add external_reference fact type to identify citations, bibliography entries, and hyperlinks to external documents
  2. Update extraction prompt to classify external document references
  3. Build workflow to present extracted references as retrievable documents
  4. Add UI for "fetch referenced documents" action
  5. Consider pattern matching for common citation formats (ISO standards, NIST, RFCs, etc.) and URL patterns

Expected Outcome

User uploads SOC 2 report that references ISO 27001, NIST CSF, links to 4 policy documents → system presents list of 6 documents to retrieve → user clicks "fetch all" → knowledge base grows automatically.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions