Skip to content

Commit 95e25c0

Browse files
author
Valentin Schneider-Lunitz
committed
docs(guides): improve Crypt4GH + proTES tutorial with a detailed use case
1 parent 040017d commit 95e25c0

1 file changed

Lines changed: 62 additions & 73 deletions

File tree

docs/guides/guide-admin/crypt4gh_to_protes.md

Lines changed: 62 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,44 @@
11
# Setting up Crypt4GH encryption/decryption in Funnel
22

3-
This guide explains how to configure and deploy an environment that enables encryption and decryption of sensitive data files using TES/[Funnel](https://github.com/ohsu-comp-bio/funnel) with [proTES](https://github.com/elixir-cloud-aai/proTES) as a stable and scalable [GA4GH TES](https://github.com/ga4gh/task-execution-schemas) gateway.
3+
This guide explains how to configure and deploy an environment that enables collaborative research on sensitive genomic data. Data holders can securely provide encrypted data for analysis while researchers process it through TES/[Funnel](https://github.com/ohsu-comp-bio/funnel) and [proTES](https://github.com/elixir-cloud-aai/proTES), where automatic decryption occurs within secure containers without granting researchers direct access to the sensitive data. This setup leverages [GA4GH TES](https://github.com/ga4gh/task-execution-schemas) standards for scalable and secure task execution.
44

55
## Use Case
66

7-
Imagine you are a researcher who needs to analyse sensitive data in a cloud environment. You need to ensure:
7+
A data holder needs to provide sensitive genomic data for analysis to researchers in a cloud environment. The data must remain encrypted during storage and transfer, with decryption occurring only within a secure computational environment (container), without granting direct data access to the researcher.
88

9-
- **Your data is encrypted during transfer**: Your files are encrypted for transfer. Raw sensitive data remains located at your storage.
10-
- **Only authorized researcher can decrypt the data**: Data can only be decrypted with specific private keys. Data theft is useless without specific keys.
11-
- **Automatic decryption**: Your setup does automatic decryption given `.c4gh` files and the correct private key.
12-
- **Secure collaboration**: Data exchange between collaborators is not restricted, as long as the correct key is available.
9+
1. The data holder encrypts sensitive data using Crypt4GH and stores them at a secure storage (e.g. S3 buckets).
10+
2. The researcher submits a GA4GH TES task to `proTES` for analysis of the encrypted data.
11+
3. The installed `proTES middleware` automatically detects the encrypted data and decrypts them using Crypt4GH keys that are managed by `proTES`.
12+
4. The researcher's task command is executed on the decrypted data.
13+
5. The analysis results are stored at a dedicated storage accessible to the researcher
1314

14-
This tutorial presents a solution where:
15+
`Note` all computational steps are done in a secure containerized environment.
1516

16-
1. A data provider encrypts sensitive data using Crypt4GH before uploading them to storage.
17-
2. Encrypted data is sent to a `Task Execution Service (TES)` instance via `proTES` and a `proTES middleware` for processing.
18-
3. A researcher (recipient) can process these files in a secure containerized environment where automatic decryption happens using the `proTES middleware`.
17+
This approach allows collaborative research where sensitive data can be processed in cloud environments without provisioning data access to the researcher but instead utilizing a combination of `Crypt4GH` and `proTES` for data encryption, decryption, and analysis.
18+
Additionally, the researcher can repeat the analysis with adjusted parameters anytime without further action of the data holder.
1919

20-
This approach allows collaborative research where sensitive data can be processed in cloud environments while maintaining strict access controls and encryption throughout the data lifecycle.
2120

2221
## Overview
2322

2423
[Crypt4GH](https://crypt4gh.readthedocs.io/) is a standard for encrypting sensitive genomic data. This setup demonstrates:
2524

26-
- Generating cryptographic key pairs for data exchange between parties (sender and recipient)
27-
- Encrypting files using the sender's private key and recipient's public key
25+
- Generating cryptographic key pairs for data exchange between parties (data holder and researcher)
26+
- Encrypting files using the data holder's private key and researcher's public key
2827
- Automatically decrypting `.c4gh` encrypted files during task execution using [protes-middleware-crypt4gh](https://github.com/elixir-cloud-aai/protes-middleware-crypt4gh)
2928
- Securely processing sensitive data in containerized environments
3029

31-
**Security Note:** Private keys should be stored in secure locations and used only for decryption. Consider using signed URLs for transferring private keys to the TES instance.
30+
**Security Note:** Private keys should be stored in secure locations and used only for encryption/decryption. Consider using signed URLs for transferring private keys to the TES instance.
3231

33-
**Goal of this tutorial:** You'll have a setup where you can submit encrypted files via task inputs, and they will be automatically decrypted and processed, ensuring that sensitive data remains protected.
32+
**Goal of this tutorial:** You'll have a setup which encrypts sensitive data, stores them in a secure storage, automatic detection of encrypted data triggers decryption followed by processing, ensuring that sensitive data remains protected.
3433

3534
## Setup
3635

3736
The complete setup consists of three main tasks:
3837

39-
1. **Key Generation**: Generate Crypt4GH key pairs for the sender and recipient parties (optional).
40-
2. **File Encryption**: Encrypt sensitive data using the generated keys.
41-
3. **File Decryption**: Decrypt and process encrypted files in a secure environment.
38+
1. **Key Generation**: Generate Crypt4GH key pairs for the data holder and researcher parties (optional).
39+
2. **File Encryption**: Encrypt sensitive data using the Crypt4GH keys.
40+
3. **File Decryption**: automatic detection of encrypted data, their decryption and processing in a secure computing environment.
4241

43-
All keys are generated inside containers and exported to configured storage via TES outputs. The encrypted files (with `.c4gh` extension) are automatically decrypted during task execution using the proTES middleware.
4442

4543
## Prerequisites
4644

@@ -52,7 +50,7 @@ Before starting, ensure you have:
5250
- ProTES deployment VM
5351
- [Docker](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) installed on all VMs
5452
- Network connectivity between all VMs
55-
- Sufficient storage space for encrypted/decrypted files
53+
- Sufficient storage space for encrypted/decrypted files and results.
5654

5755
## Installation and Configuration
5856

@@ -200,49 +198,49 @@ The following examples demonstrate the complete encryption/decryption workflow u
200198

201199
### Task 1: Generate Crypt4GH Key Pairs
202200

203-
This task generates cryptographic key pairs for both the sender and recipient parties. This step is independent of the following steps and may have happened a while ago. Your private keys may already be in a secure place. If you have crypt4gh keys, feel free to skip this step.
201+
This task generates cryptographic key pairs for both the data holder and researcher. This step is independent of the following steps and may have happened a while ago. Your private keys may already be in a secure place. If you have crypt4gh keys, feel free to skip this step.
204202

205203
Create a file named `task1_keygen.json`:
206204

207205
```json
208206
{
209207
"name": "Generate crypt4gh key pairs",
210-
"description": "Generate sender and recipient key pairs locally in container",
208+
"description": "Generate data holder and researcher key pairs locally in container",
211209
"inputs": [],
212210
"outputs": [
213211
{
214-
"name": "sender_sk",
215-
"description": "Sender secret key",
216-
"url": "file:///tmp/funnel-storage/keys/sender/sender.sec",
217-
"path": "/outputs/keys/sender/sender.sec",
212+
"name": "data_holder_sk",
213+
"description": "Data holder secret key",
214+
"url": "file:///tmp/funnel-storage/keys/data_holder/data_holder.sec",
215+
"path": "/outputs/keys/data_holder/data_holder.sec",
218216
"type": "FILE"
219217
},
220218
{
221-
"name": "sender_pk",
222-
"description": "Sender public key",
223-
"url": "file:///tmp/funnel-storage/keys/sender/sender.pub",
224-
"path": "/outputs/keys/sender/sender.pub",
219+
"name": "data_holder_pk",
220+
"description": "data_holder public key",
221+
"url": "file:///tmp/funnel-storage/keys/data_holder/data_holder.pub",
222+
"path": "/outputs/keys/data_holder/data_holder.pub",
225223
"type": "FILE"
226224
},
227225
{
228-
"name": "recipient_sk",
229-
"description": "Recipient secret key",
230-
"url": "file:///tmp/funnel-storage/keys/recipient/recipient.sec",
231-
"path": "/outputs/keys/recipient/recipient.sec",
226+
"name": "researcher_sk",
227+
"description": "researcher secret key",
228+
"url": "file:///tmp/funnel-storage/keys/researcher/researcher.sec",
229+
"path": "/outputs/keys/researcher/researcher.sec",
232230
"type": "FILE"
233231
},
234232
{
235-
"name": "recipient_pk",
236-
"description": "Recipient public key",
237-
"url": "file:///tmp/funnel-storage/keys/recipient/recipient.pub",
238-
"path": "/outputs/keys/recipient/recipient.pub",
233+
"name": "researcher_pk",
234+
"description": "researcher public key",
235+
"url": "file:///tmp/funnel-storage/keys/researcher/researcher.pub",
236+
"path": "/outputs/keys/researcher/researcher.pub",
239237
"type": "FILE"
240238
},
241239
{
242-
"name": "recipient_pk_copy",
243-
"description": "Copy of recipient public key",
244-
"url": "file:///tmp/funnel-storage/keys/sender/recipient.pub",
245-
"path": "/outputs/keys/sender/recipient.pub",
240+
"name": "researcher_pk_copy",
241+
"description": "Copy of researcher public key",
242+
"url": "file:///tmp/funnel-storage/keys/data_holder/researcher.pub",
243+
"path": "/outputs/keys/data_holder/researcher.pub",
246244
"type": "FILE"
247245
}
248246
],
@@ -252,7 +250,7 @@ Create a file named `task1_keygen.json`:
252250
"command": [
253251
"/bin/bash",
254252
"-c",
255-
"crypt4gh-keygen --sk /outputs/keys/sender/sender.sec --pk /outputs/keys/sender/sender.pub -f --nocrypt && crypt4gh-keygen --sk /outputs/keys/recipient/recipient.sec --pk /outputs/keys/recipient/recipient.pub -f --nocrypt && cp /outputs/keys/recipient/recipient.pub /outputs/keys/sender/recipient.pub"
253+
"crypt4gh-keygen --sk /outputs/keys/data_holder/data_holder.sec --pk /outputs/keys/data_holder/data_holder.pub -f --nocrypt && crypt4gh-keygen --sk /outputs/keys/researcher/researcher.sec --pk /outputs/keys/researcher/researcher.pub -f --nocrypt && cp /outputs/keys/researcher/researcher.pub /outputs/keys/data_holder/researcher.pub"
256254
],
257255
"workdir": "/tmp"
258256
}
@@ -267,32 +265,32 @@ Create a file named `task1_keygen.json`:
267265

268266
**Key Details:**
269267

270-
- Generates two key pairs: one for the sender and one for the recipient
268+
- Generates two key pairs: one for the data holder and one for the researcher
271269
- Keys are generated without encryption (`--nocrypt`) for demonstration purposes
272-
- The recipient's public key is copied to the sender's directory for use in encryption
270+
- The researcher's public key is copied to the data holder's directory for use in encryption
273271
- All keys are exported to local storage via TES outputs
274272

275273
### Task 2: Encrypt a File
276274

277-
This task downloads a file, encrypts it using Crypt4GH, and stores both the encrypted file and metadata. Create a file named `task2_encrypt_file.json`:
275+
This task retrieves a file, encrypts it using Crypt4GH, and stores both the encrypted file and metadata. Create a file named `task2_encrypt_file.json`:
278276

279277
```json
280278
{
281279
"name": "Encrypt file with crypt4gh",
282-
"description": "Download a file, record its size, and encrypt it locally using sender and recipient keys",
280+
"description": "Retrieve a file, record its size, and encrypt it using data holder and researcher keys",
283281
"inputs": [
284282
{
285-
"name": "sender_sk",
286-
"description": "Sender secret key",
287-
"url": "file:///tmp/funnel-storage/keys/sender/sender.sec",
288-
"path": "/inputs/keys/sender/sender.sec",
283+
"name": "data_holder_sk",
284+
"description": "data_holder secret key",
285+
"url": "file:///tmp/funnel-storage/keys/data_holder/data_holder.sec",
286+
"path": "/inputs/keys/data_holder/data_holder.sec",
289287
"type": "FILE"
290288
},
291289
{
292-
"name": "recipient_pk",
293-
"description": "Recipient public key",
294-
"url": "file:///tmp/funnel-storage/keys/recipient/recipient.pub",
295-
"path": "/inputs/keys/recipient/recipient.pub",
290+
"name": "researcher_pk",
291+
"description": "researcher public key",
292+
"url": "file:///tmp/funnel-storage/keys/researcher/researcher.pub",
293+
"path": "/inputs/keys/researcher/researcher.pub",
296294
"type": "FILE"
297295
}
298296
],
@@ -318,7 +316,7 @@ This task downloads a file, encrypts it using Crypt4GH, and stores both the encr
318316
"command": [
319317
"/bin/bash",
320318
"-c",
321-
"curl -L -o /tmp/file.png http://britishfamily.co.uk/wp-content/uploads/2015/02/MADE_IN_BRITAIN_web_300x300.png && stat -c %s /tmp/file.png > /outputs/raw/united_kingdom_logo_size.txt && crypt4gh encrypt --sk /inputs/keys/sender/sender.sec --recipient_pk /inputs/keys/recipient/recipient.pub < /outputs/raw/united_kingdom_logo_size.txt > /outputs/encrypted/united_kingdom_logo_size.txt.c4gh"
319+
"curl -L -o /tmp/file.png http://britishfamily.co.uk/wp-content/uploads/2015/02/MADE_IN_BRITAIN_web_300x300.png && stat -c %s /tmp/file.png > /outputs/raw/united_kingdom_logo_size.txt && crypt4gh encrypt --sk /inputs/keys/data_holder/data_holder.sec --recipient_pk /inputs/keys/researcher/researcher.pub < /outputs/raw/united_kingdom_logo_size.txt > /outputs/encrypted/united_kingdom_logo_size.txt.c4gh"
322320
],
323321
"workdir": "/tmp"
324322
}
@@ -333,15 +331,15 @@ This task downloads a file, encrypts it using Crypt4GH, and stores both the encr
333331

334332
**Key Details:**
335333

336-
- Takes the sender's private key and recipient's public key as inputs
334+
- Takes the data holder's private key and researcher's public key as inputs
337335
- Downloads a sample file from a URL
338336
- Records the original file size for verification
339337
- Encrypts the file using Crypt4GH, producing a `.c4gh` encrypted file
340338
- Stores both the encrypted file and size metadata
341339

342340
### Task 3: Decrypt and Process File
343341

344-
This task decrypts the encrypted file using the recipient's private key and processes it.
342+
This task decrypts the encrypted file using the researcher's private key and processes it.
345343

346344
**Note:** The different paths indicate isolated storage paths that do not necessarily see each other. For example, distinct S3 buckets.
347345

@@ -350,7 +348,7 @@ Create a file named `task3_decrypt_and_write_size.json`:
350348
```json
351349
{
352350
"name": "Decrypt crypt4gh file",
353-
"description": "Decrypt an encrypted file using recipient key locally",
351+
"description": "Decrypt an encrypted file using researcher key locally",
354352
"volumes": ["/outputs/test"],
355353
"inputs": [
356354
{
@@ -361,10 +359,10 @@ Create a file named `task3_decrypt_and_write_size.json`:
361359
"type": "FILE"
362360
},
363361
{
364-
"name": "recipient_sk",
365-
"description": "Recipient secret key",
366-
"url": "file:///tmp/funnel-storage/keys/recipient/recipient.sec",
367-
"path": "/inputs/keys/recipient/recipient.sec",
362+
"name": "researcher_sk",
363+
"description": "researcher secret key",
364+
"url": "file:///tmp/funnel-storage/keys/researcher/researcher.sec",
365+
"path": "/inputs/keys/researcher/researcher.sec",
368366
"type": "FILE"
369367
}
370368
],
@@ -398,7 +396,7 @@ Create a file named `task3_decrypt_and_write_size.json`:
398396

399397
**Key Details:**
400398

401-
- Takes the encrypted `.c4gh` file and recipient's private key as inputs
399+
- Takes the encrypted `.c4gh` file and researcher's private key as inputs
402400
- The proTES middleware automatically decrypts the file during task execution
403401
- Computes an MD5 checksum of the decrypted data for verification
404402
- Stores the checksum in the output directory
@@ -432,15 +430,6 @@ Each task submission returns a task ID that you can use to monitor progress:
432430
curl http://localhost:8080/ga4gh/tes/v1/tasks/<task-id>
433431
```
434432

435-
## How It Works
436-
437-
1. **Task Submission**: Tasks are submitted to proTES via the GA4GH TES API
438-
2. **Task Distribution**: ProTES distributes tasks to available Funnel TES endpoints
439-
3. **Automatic Decryption**: The Crypt4GH middleware automatically detects `.c4gh` files and injects a decryption step
440-
4. **Container Execution**: Funnel worker nodes execute tasks in isolated containers
441-
5. **Result Storage**: All results are stored in the configured local storage directory (`/tmp/funnel-storage`)
442-
6. **Data Persistence**: Task and scheduler metadata is stored in the Funnel BoltDB database
443-
444433
## Troubleshooting
445434

446435
### Common Issues

0 commit comments

Comments
 (0)