|
| 1 | +# hss3dump - Dump HSDS Domains to local filesystem |
| 2 | + |
| 3 | +> **Important**: hss3dump is still in an early stage of development, so results |
| 4 | +> may vary. It has only been tested with small datasets. |
| 5 | +
|
| 6 | +`hss3dump` is a command-line utility allowing you to list and/or replicate one |
| 7 | +or more HSDS domains from an S3 bucket to your local filesystem. It will |
| 8 | +replicate the data in such a way that it can be used as the root directory for a |
| 9 | +local HSDS instance and, thus, allows restoring of h5 files using h5pyd's |
| 10 | +`hsget`. |
| 11 | + |
| 12 | +Additionally, if your S3 bucket has **versioning enabled** and its life cycle is |
| 13 | +set up in such a way that it **does not delete** versions, the tool also |
| 14 | +supports fetching different S3 object versions in order to allow restoring HSDS |
| 15 | +domains from older versions. |
| 16 | + |
| 17 | +## Installation |
| 18 | + |
| 19 | +```sh |
| 20 | +go install github.com/methodpark/hss3dump |
| 21 | +``` |
| 22 | + |
| 23 | +## Usage |
| 24 | + |
| 25 | +Hss3dump offers the `-h` flag to get more information on its usage: |
| 26 | + |
| 27 | +``` |
| 28 | +$ hss3dump -h |
| 29 | +usage: hss3dump [OPTIONS] BUCKET DOMAIN... |
| 30 | +
|
| 31 | +Hss3dump downloads one or more HSDS domains from an S3 bucket, storing them on |
| 32 | +the local filesystem in such a way that the target directory can be used as the |
| 33 | +root directory for a local HSDS deployment. |
| 34 | +
|
| 35 | +It can restore different states of the target domain based on the versions |
| 36 | +available in the S3 bucket. If an RFC3339 timestamp is supplied with the -b |
| 37 | +flag, hss3dump will download the most recent versions of a domain's files that |
| 38 | +are older or equal to the supplied time. |
| 39 | +
|
| 40 | +Options: |
| 41 | + -b string |
| 42 | + Return the first version of the domain before the given RFC3339 timestamp. |
| 43 | + -h Print this command information. |
| 44 | + -l Output a list with all available file versions of each domain's files. |
| 45 | + -r string |
| 46 | + Choose the root directory of the local HSDS filesystem. (default ".") |
| 47 | +``` |
| 48 | + |
| 49 | +### Fetching Most Recent Data |
| 50 | + |
| 51 | +In order to fetch the most recent version of an HSDS domain called |
| 52 | +`home/user/domain.h5` from an S3 bucket called `hsds-bucket` and dump it to the |
| 53 | +current directory, run the following command: |
| 54 | + |
| 55 | +```sh |
| 56 | +$ hss3dump hsds-bucket home/user/domain.h5 |
| 57 | +``` |
| 58 | + |
| 59 | +### Supplying a Different Target Directory |
| 60 | + |
| 61 | +The directory to which files will be written can be changed by specifying the |
| 62 | +directory root with `-r`. Assuming we would like to replicate the domain above |
| 63 | +to the directory `/var/db/hsds_data` the command would have to look like this: |
| 64 | + |
| 65 | + |
| 66 | +```sh |
| 67 | +$ hss3dump -r /var/db/hsds_data hsds-bucket home/user/domain.h5 |
| 68 | +``` |
| 69 | + |
| 70 | +### Restoring Previous Domain Versions |
| 71 | + |
| 72 | +If we want to restore a previous version of a domain, we have to take a look at |
| 73 | +the available versions first. Hss3dump makes this easy with its `-l` flag. |
| 74 | +Assuming we have accidentally deleted data from a domain, the output could look |
| 75 | +something like this: |
| 76 | + |
| 77 | +```sh |
| 78 | +$ hss3dump -l hsds-bucket home/user/domain.h5 |
| 79 | +home/user/domain.h5: |
| 80 | + db/e32b20a5-6c27622f/d/693e-302825-f8c087/.dataset.json |
| 81 | + ock7uFraVWjrotdTtGwXFR1N0TasC+ln 489 Bytes 2022-10-05T16:06:57+0100 |
| 82 | + db/e32b60a5-6c27622f/d/693e-302825-f8c087/0 |
| 83 | + HikS0B1PNyvCKLO+BmagsRaAnF1sL9zL 0 Bytes 2022-10-10 09:06:59+0100 |
| 84 | + U9LG1wDd4EdzQj0PtZqPvvTH9/BdzvVH 1296 Bytes 2022-10-05 16:06:59+0100 |
| 85 | + db/e32b60a5-6c27622f/g/40c5-5e41ac-92006c/.group.json |
| 86 | + sQwXZJAcjr1M0do1BsaFmnN6FlDLRwzM 1056 Bytes 2022-10-10 09:07:00+0100 |
| 87 | + zkRK4cagD9alQWUeN3BKTi9T+SqQdjcO 193 Bytes 2022-10-05 16:07:00+0100 |
| 88 | +``` |
| 89 | + |
| 90 | +The output shows that the most recent version |
| 91 | +(`HikS0B1PNyvCKLO+BmagsRaAnF1sL9zL`) of the file |
| 92 | +`db/e32b60a5-6c27622f/d/693e-302825-f8c087/0` is 0 bytes large, while its |
| 93 | +previous version was 1296 bytes large. This means that its content has been |
| 94 | +deleted on the 10th October. If we want to restore the old data, we can do so by |
| 95 | +letting hss3dump know that it should download only versions older than October |
| 96 | +10th. This can be done by supplying a corresponding RFC3339 timestamp via the |
| 97 | +command's `-b` flag: |
| 98 | + |
| 99 | +```sh |
| 100 | +$ hss3dump -b "2022-10-10T00:00:00+0100" hsds-bucket home/user/domain.h5 |
| 101 | +``` |
| 102 | + |
| 103 | +Hss3dump will then either download the most recent version that satisfies this |
| 104 | +condition, or - if no version of an object satisfies the condition - the oldest |
| 105 | +version present is chosen instead. |
0 commit comments