metaphacts ETL pipeline

The Extract-Transform-Load (ETL) pipeline provides a means to convert structured data to RDF, perform post-processing steps, and ingest it into a graph database.

The pipeline follows the principles described in Concepts and is based on an opinionated selection of components and tools:

Amazon Web Services (AWS) as cloud environment
a selection of AWS services such as S3, CloudFormation, StepFunctions, Lambda, EC2, etc. for various parts
RDF Mapping Language (RML) as declarative mapping language with Carml as mapping engine
Ontotext GraphDB as RDF database. Please note that a valide GraphDB license is needed in order to run the pipeline end-to-end. The GraphDB license is needed for the data ingestion part of the ETL pipeline (i.e. to load teh data into GraphDB).

Features

The ETL pipeline has the following features:

read source files from a S3 bucket
convert source files to RDF using RML mappings
supported formats are CSV, XML, JSON, JSONL, also in compressed (gzipped) form
the RDF files are written to an S3 bucket, one RDF file per source file
the RDF files are ingested into a graph using the GraphDB Preload tool
adding new files into the source bucket after the initial ingestion will add them as incremental updates

Setup and Operation

See ETL Pipeline Setup for how to set up and run the pipeline.

Architecture

The following diagram shows the architecture of the ETL pipeline:

See Architecture for a detailed description.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.vscode		.vscode
assets		assets
docs		docs
etl-pipeline		etl-pipeline
examples		examples
mappings		mappings
output-files		output-files
source-files		source-files
user-files		user-files
.gitignore		.gitignore
.project		.project
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

metaphacts ETL pipeline

Features

Setup and Operation

Architecture

Copyright

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

metaphacts ETL pipeline

Features

Setup and Operation

Architecture

Copyright

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages