LinkML schemas modelling REDCap structures. This package is the canonical, vendor-neutral source of the REDCap primitives that downstream projects build on — what a valid data dictionary looks like, and what record data looks like in both its flat and structured shapes. You can find more information in the documentation.
It is an umbrella with two submodules:
data_dictionary— the meta-schema that formalises a REDCap data dictionary: the 18-column CSV structure, field-naming rules, field/validation types, and matrix groups.record— a reusable envelope for REDCap record data: the structural fields REDCap adds to every export (record_id,redcap_event_name,redcap_repeat_instrument,redcap_repeat_instance, ...), the universal*_complete/ yes-no / true-false / checkbox value spaces, the REDCap date/time string types, and theRepeatedElementwrapper — plus the generic flat ⇄ structured grouping step.
It contains no project-specific or RareLink-specific rules. Those conventions (variable naming, instrument naming, annotation profile) live in RareLink and are layered on top of these primitives. Contributions and collaboration are welcome.
pip install linkml-redcap# --- the data dictionary meta-schema ---
from linkml_redcap.data_dictionary import schema_view as dd_view
sv = dd_view()
assert "Field" in sv.all_classes()
assert "FieldType" in sv.all_enums()
# --- the record-data envelope ---
from linkml_redcap.record import schema_view as record_view, group_flat_records
rv = record_view()
assert "FlatRecord" in rv.all_classes()
assert "StructuredRecord" in rv.all_classes()
assert "FormCompleteStatus" in rv.all_enums()
# group a flat REDCap export into per-record objects (generic structural step)
grouped = group_flat_records(flat_rows, drop_empty=True)Each submodule exposes the same loader API — schema_path() -> Path (for LinkML
CLI tools) and schema_view() -> SchemaView (for introspection). Schemas load
via importlib.resources, so they work from any install location — regular
installs, zipped wheels, or air-gapped environments.
A LinkML meta-schema for the REDCap data dictionary CSV:
DataDictionary— tree root, a complete DDField— one CSV row, all 18 REDCap columns as typed/constrained slotsChoice— structured representation of one permissible valueInstrument— logical grouping of contiguous fields sharing aform_name
Enums cover REDCap's native value spaces (FieldType, TextValidationType
incl. the European comma-decimal variants, CustomAlignment, IdentifierStatus,
MatrixRanking).
This schema is vendor-neutral REDCap only. RareLink ontology conventions — variable-name prefixes (
snomedct_…), the structuredfield_annotationgrammar, and BioPortal/ontology curation — are not defined here; they live in rarelink.
The single source of truth for the REDCap record representation — the actual shape of REDCap record data, in the two serializations the data uses:
FlatRecord(abstract) — one flat-export row; a project specialises it viais_a: FlatRecordand adds one typed slot per field variable.StructuredRecord(abstract) — the lossless record-grouped serialization: one object perrecord_idwith its repeating-instrument instances nested. A project specialises it (is_a), marks itselftree_root, and adds its typed non-repeating instrument slots.RepeatedElement— the(redcap_repeat_instrument, redcap_repeat_instance)wrapper for one repeating-instrument instance. The canonical version of the class rarelink and cieinr currently redeclare per repo — import this one.- Structural slots:
record_id,redcap_event_name,redcap_repeat_instrument,redcap_repeat_instance,redcap_data_access_group,redcap_survey_identifier,repeated_elements. - Enums
FormCompleteStatus(0/1/2),YesNo,TrueFalse,CheckboxState— REDCap's universal value spaces. - Types
redcap_date/redcap_datetime/redcap_time/redcap_integer/redcap_number/redcap_email— one typed primitive per REDCap value space, so a project can give every field a defined range (never a barestringwhere REDCap constrains the value, neverAny). grouping.group_flat_records/ungroup_records— the lossless flat ⇄ structured conversion (the cardinality changelinkml-mapcan't express), depending only on REDCap-native fields.
The structured envelope here is deliberately generic, abstract and convention-free — REDCap's own record-and-repeats model, serialized hierarchically. It defines no typed instrument classes, variable-naming rules, or ontology/Phenopacket mappings: that semantic layer is the consuming project's (the instrument classes that specialise these) and rarelink's (the conventions and the Phenopacket/FHIR engine). Nothing RareLink-specific is defined here.
See instructions/MAPPING_WORKFLOW.md for the
full flat → grouped → structured pipeline and how it feeds Phenopacket/FHIR export.
REDCap data dictionaries are unstructured CSVs governed by implicit rules, and
REDCap record exports are flat and wide. Making both machine-readable lets
downstream tools validate any DD or export, generate DDs from higher-level
specs, and transform flat exports into structured models with linkml-map
instead of hand-written, per-instrument Python — the same envelope serving
RareLink-CDM, the MII KDS-SE editions, CIEINR, and any other RareLink-based model.
This package is an import dependency, so its schema ids and public
class/slot/enum/type names are treated as a stable API: additive minor releases
only, deprecate-don't-remove, breaking changes ⇒ major bump. The policy is in
instructions/COMPATIBILITY.md and enforced by
tests/test_public_surface.py, which fails CI if any public name disappears.
- RareLink — the REDCap-based rare disease interoperability framework that defines the rules built on these primitives
- rd-cdm — ontology code systems & versions
git clone https://github.com/linkml/linkml-redcap
cd linkml-redcap
poetry install --with dev
# Validate the schemas
poetry run linkml-lint src/linkml_redcap/data_dictionary/schema/redcap_data_dictionary.yaml
poetry run linkml-lint src/linkml_redcap/record/schema/redcap_record.yaml
# Run tests
poetry run pytestReleases are tag-triggered: push a vX.Y.Z tag matching the version in
pyproject.toml and GitHub Actions builds and publishes to PyPI via OIDC
trusted publishing.
MIT