You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/guide-admin/crypt4gh_to_protes.md
+62-73Lines changed: 62 additions & 73 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,46 +1,44 @@
1
1
# Setting up Crypt4GH encryption/decryption in Funnel
2
2
3
-
This guide explains how to configure and deploy an environment that enables encryption and decryption of sensitive data files using TES/[Funnel](https://github.com/ohsu-comp-bio/funnel)with[proTES](https://github.com/elixir-cloud-aai/proTES) as a stable and scalable [GA4GH TES](https://github.com/ga4gh/task-execution-schemas)gateway.
3
+
This guide explains how to configure and deploy an environment that enables collaborative research on sensitive genomic data. Data holders can securely provide encrypted data for analysis while researchers process it through TES/[Funnel](https://github.com/ohsu-comp-bio/funnel)and[proTES](https://github.com/elixir-cloud-aai/proTES), where automatic decryption occurs within secure containers without granting researchers direct access to the sensitive data. This setup leverages [GA4GH TES](https://github.com/ga4gh/task-execution-schemas)standards for scalable and secure task execution.
4
4
5
5
## Use Case
6
6
7
-
Imagine you are a researcher who needs to analyse sensitive data in a cloud environment. You need to ensure:
7
+
A data holder needs to provide sensitive genomic data for analysis to researchers in a cloud environment. The data must remain encrypted during storage and transfer, with decryption occurring only within a secure computational environment (container), without granting direct data access to the researcher.
8
8
9
-
-**Your data is encrypted during transfer**: Your files are encrypted for transfer. Raw sensitive data remains located at your storage.
10
-
-**Only authorized researcher can decrypt the data**: Data can only be decrypted with specific private keys. Data theft is useless without specific keys.
11
-
-**Automatic decryption**: Your setup does automatic decryption given `.c4gh` files and the correct private key.
12
-
-**Secure collaboration**: Data exchange between collaborators is not restricted, as long as the correct key is available.
9
+
1. The data holder encrypts sensitive data using Crypt4GH and stores them at a secure storage (e.g. S3 buckets).
10
+
2. The researcher submits a GA4GH TES task to `proTES` for analysis of the encrypted data.
11
+
3. The installed `proTES middleware` automatically detects the encrypted data and decrypts them using Crypt4GH keys that are managed by `proTES`.
12
+
4. The researcher's task command is executed on the decrypted data.
13
+
5. The analysis results are stored at a dedicated storage accessible to the researcher
13
14
14
-
This tutorial presents a solution where:
15
+
`Note` all computational steps are done in a secure containerized environment.
15
16
16
-
1. A data provider encrypts sensitive data using Crypt4GH before uploading them to storage.
17
-
2. Encrypted data is sent to a `Task Execution Service (TES)` instance via `proTES` and a `proTES middleware` for processing.
18
-
3. A researcher (recipient) can process these files in a secure containerized environment where automatic decryption happens using the `proTES middleware`.
17
+
This approach allows collaborative research where sensitive data can be processed in cloud environments without provisioning data access to the researcher but instead utilizing a combination of `Crypt4GH` and `proTES` for data encryption, decryption, and analysis.
18
+
Additionally, the researcher can repeat the analysis with adjusted parameters anytime without further action of the data holder.
19
19
20
-
This approach allows collaborative research where sensitive data can be processed in cloud environments while maintaining strict access controls and encryption throughout the data lifecycle.
21
20
22
21
## Overview
23
22
24
23
[Crypt4GH](https://crypt4gh.readthedocs.io/) is a standard for encrypting sensitive genomic data. This setup demonstrates:
25
24
26
-
- Generating cryptographic key pairs for data exchange between parties (sender and recipient)
27
-
- Encrypting files using the sender's private key and recipient's public key
25
+
- Generating cryptographic key pairs for data exchange between parties (data holder and researcher)
26
+
- Encrypting files using the data holder's private key and researcher's public key
28
27
- Automatically decrypting `.c4gh` encrypted files during task execution using [protes-middleware-crypt4gh](https://github.com/elixir-cloud-aai/protes-middleware-crypt4gh)
29
28
- Securely processing sensitive data in containerized environments
30
29
31
-
**Security Note:** Private keys should be stored in secure locations and used only for decryption. Consider using signed URLs for transferring private keys to the TES instance.
30
+
**Security Note:** Private keys should be stored in secure locations and used only for encryption/decryption. Consider using signed URLs for transferring private keys to the TES instance.
32
31
33
-
**Goal of this tutorial:** You'll have a setup where you can submit encrypted files via task inputs, and they will be automatically decrypted and processed, ensuring that sensitive data remains protected.
32
+
**Goal of this tutorial:** You'll have a setup which encrypts sensitive data, stores them in a secure storage, automatic detection of encrypted data triggers decryption followed by processing, ensuring that sensitive data remains protected.
34
33
35
34
## Setup
36
35
37
36
The complete setup consists of three main tasks:
38
37
39
-
1.**Key Generation**: Generate Crypt4GH key pairs for the sender and recipient parties (optional).
40
-
2.**File Encryption**: Encrypt sensitive data using the generated keys.
41
-
3.**File Decryption**: Decrypt and process encrypted files in a secure environment.
38
+
1.**Key Generation**: Generate Crypt4GH key pairs for the data holder and researcher parties (optional).
39
+
2.**File Encryption**: Encrypt sensitive data using the Crypt4GH keys.
40
+
3.**File Decryption**: automatic detection of encrypted data, their decryption and processing in a secure computing environment.
42
41
43
-
All keys are generated inside containers and exported to configured storage via TES outputs. The encrypted files (with `.c4gh` extension) are automatically decrypted during task execution using the proTES middleware.
44
42
45
43
## Prerequisites
46
44
@@ -52,7 +50,7 @@ Before starting, ensure you have:
52
50
- ProTES deployment VM
53
51
-[Docker](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) installed on all VMs
54
52
- Network connectivity between all VMs
55
-
- Sufficient storage space for encrypted/decrypted files
53
+
- Sufficient storage space for encrypted/decrypted files and results.
56
54
57
55
## Installation and Configuration
58
56
@@ -200,49 +198,49 @@ The following examples demonstrate the complete encryption/decryption workflow u
200
198
201
199
### Task 1: Generate Crypt4GH Key Pairs
202
200
203
-
This task generates cryptographic key pairs for both the sender and recipient parties. This step is independent of the following steps and may have happened a while ago. Your private keys may already be in a secure place. If you have crypt4gh keys, feel free to skip this step.
201
+
This task generates cryptographic key pairs for both the data holder and researcher. This step is independent of the following steps and may have happened a while ago. Your private keys may already be in a secure place. If you have crypt4gh keys, feel free to skip this step.
204
202
205
203
Create a file named `task1_keygen.json`:
206
204
207
205
```json
208
206
{
209
207
"name": "Generate crypt4gh key pairs",
210
-
"description": "Generate sender and recipient key pairs locally in container",
208
+
"description": "Generate data holder and researcher key pairs locally in container",
@@ -267,32 +265,32 @@ Create a file named `task1_keygen.json`:
267
265
268
266
**Key Details:**
269
267
270
-
- Generates two key pairs: one for the sender and one for the recipient
268
+
- Generates two key pairs: one for the data holder and one for the researcher
271
269
- Keys are generated without encryption (`--nocrypt`) for demonstration purposes
272
-
- The recipient's public key is copied to the sender's directory for use in encryption
270
+
- The researcher's public key is copied to the data holder's directory for use in encryption
273
271
- All keys are exported to local storage via TES outputs
274
272
275
273
### Task 2: Encrypt a File
276
274
277
-
This task downloads a file, encrypts it using Crypt4GH, and stores both the encrypted file and metadata. Create a file named `task2_encrypt_file.json`:
275
+
This task retrieves a file, encrypts it using Crypt4GH, and stores both the encrypted file and metadata. Create a file named `task2_encrypt_file.json`:
278
276
279
277
```json
280
278
{
281
279
"name": "Encrypt file with crypt4gh",
282
-
"description": "Download a file, record its size, and encrypt it locally using sender and recipient keys",
280
+
"description": "Retrieve a file, record its size, and encrypt it using data holder and researcher keys",
0 commit comments