Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions pipeline/ingestion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,19 @@ The pipeline is configured using `IngestionPipelineOptions`. Key options include

## Example Usage

To run the pipeline locally using the Direct runner:
First, ensure all dependencies are installed locally. After cloning the `datacommons/import` repository, run the following command from the project's root directory:

```bash
mvn clean install
```

To run the pipeline locally using the Direct runner, cd to the `pipeline/ingestion` directory and run:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While adding the instruction to cd into the pipeline/ingestion directory is a good clarification, the mvn command that follows will likely fail when executed from this location. The -pl ingestion -am flags are intended for use when running Maven from a parent directory of the ingestion module, not from within the module directory itself. To fix this, you should either remove the -pl ingestion -am flags from the command or, alternatively, instruct the user to run the command from the project root and adjust the path in the -pl flag (e.g., -pl pipeline/ingestion).


```bash
mvn -Pdirect-runner compile exec:java \
-pl ingestion -am \
-Dexec.mainClass=org.datacommons.ingestion.pipeline.GraphIngestionPipeline \
-Dexec.args="--project=YOUR_PROJECT_ID \
-Dexec.args="--projectId=YOUR_PROJECT_ID \
--spannerInstanceId=YOUR_INSTANCE_ID \
--spannerDatabaseId=YOUR_DATABASE_ID \
--importList='[{\"importName\": \"Schema\", \"graphPath\": \"gs://path/to/schema/mcf/\"}, {\"importName\": \"SampleImport\", \"graphPath\": \"gs://path/to/data.tfrecord\"}]' \
Expand All @@ -49,7 +55,7 @@ To run the pipeline using the Dataflow runner:
mvn -Pdataflow-runner compile exec:java \
-pl ingestion -am \
-Dexec.mainClass=org.datacommons.ingestion.pipeline.GraphIngestionPipeline \
-Dexec.args="--project=YOUR_PROJECT_ID \
-Dexec.args="--projectId=YOUR_PROJECT_ID \
--spannerInstanceId=YOUR_INSTANCE_ID \
--spannerDatabaseId=YOUR_DATABASE_ID \
--importList='[{\"importName\": \"Schema\", \"graphPath\": \"gs://path/to/schema/mcf/\"}, {\"importName\": \"SampleImport\", \"graphPath\": \"gs://path/to/data.tfrecord\"}]' \
Expand Down