Skip to content

Commit f0eb9d6

Browse files
committed
Initial commit
0 parents  commit f0eb9d6

12 files changed

Lines changed: 1090 additions & 0 deletions

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/hss3dump

LICENSE

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Copyright 2022 UL Method Park GmbH
2+
3+
Redistribution and use in source and binary forms, with or without modification,
4+
are permitted provided that the following conditions are met:
5+
6+
1. Redistributions of source code must retain the above copyright notice, this
7+
list of conditions and the following disclaimer.
8+
9+
2. Redistributions in binary form must reproduce the above copyright notice,
10+
this list of conditions and the following disclaimer in the documentation
11+
and/or other materials provided with the distribution.
12+
13+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
14+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
15+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
16+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
17+
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
18+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
19+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
20+
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
21+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
22+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# hss3dump - Dump HSDS Domains to local filesystem
2+
3+
> **Important**: hss3dump is still in an early stage of development, so results
4+
> may vary. It has only been tested with small datasets.
5+
6+
`hss3dump` is a command-line utility allowing you to list and/or replicate one
7+
or more HSDS domains from an S3 bucket to your local filesystem. It will
8+
replicate the data in such a way that it can be used as the root directory for a
9+
local HSDS instance and, thus, allows restoring of h5 files using h5pyd's
10+
`hsget`.
11+
12+
Additionally, if your S3 bucket has **versioning enabled** and its life cycle is
13+
set up in such a way that it **does not delete** versions, the tool also
14+
supports fetching different S3 object versions in order to allow restoring HSDS
15+
domains from older versions.
16+
17+
## Installation
18+
19+
```sh
20+
go install github.com/methodpark/hss3dump
21+
```
22+
23+
## Usage
24+
25+
Hss3dump offers the `-h` flag to get more information on its usage:
26+
27+
```
28+
$ hss3dump -h
29+
usage: hss3dump [OPTIONS] BUCKET DOMAIN...
30+
31+
Hss3dump downloads one or more HSDS domains from an S3 bucket, storing them on
32+
the local filesystem in such a way that the target directory can be used as the
33+
root directory for a local HSDS deployment.
34+
35+
It can restore different states of the target domain based on the versions
36+
available in the S3 bucket. If an RFC3339 timestamp is supplied with the -b
37+
flag, hss3dump will download the most recent versions of a domain's files that
38+
are older or equal to the supplied time.
39+
40+
Options:
41+
-b string
42+
Return the first version of the domain before the given RFC3339 timestamp.
43+
-h Print this command information.
44+
-l Output a list with all available file versions of each domain's files.
45+
-r string
46+
Choose the root directory of the local HSDS filesystem. (default ".")
47+
```
48+
49+
### Fetching Most Recent Data
50+
51+
In order to fetch the most recent version of an HSDS domain called
52+
`home/user/domain.h5` from an S3 bucket called `hsds-bucket` and dump it to the
53+
current directory, run the following command:
54+
55+
```sh
56+
$ hss3dump hsds-bucket home/user/domain.h5
57+
```
58+
59+
### Supplying a Different Target Directory
60+
61+
The directory to which files will be written can be changed by specifying the
62+
directory root with `-r`. Assuming we would like to replicate the domain above
63+
to the directory `/var/db/hsds_data` the command would have to look like this:
64+
65+
66+
```sh
67+
$ hss3dump -r /var/db/hsds_data hsds-bucket home/user/domain.h5
68+
```
69+
70+
### Restoring Previous Domain Versions
71+
72+
If we want to restore a previous version of a domain, we have to take a look at
73+
the available versions first. Hss3dump makes this easy with its `-l` flag.
74+
Assuming we have accidentally deleted data from a domain, the output could look
75+
something like this:
76+
77+
```sh
78+
$ hss3dump -l hsds-bucket home/user/domain.h5
79+
home/user/domain.h5:
80+
db/e32b20a5-6c27622f/d/693e-302825-f8c087/.dataset.json
81+
ock7uFraVWjrotdTtGwXFR1N0TasC+ln 489 Bytes 2022-10-05T16:06:57+0100
82+
db/e32b60a5-6c27622f/d/693e-302825-f8c087/0
83+
HikS0B1PNyvCKLO+BmagsRaAnF1sL9zL 0 Bytes 2022-10-10 09:06:59+0100
84+
U9LG1wDd4EdzQj0PtZqPvvTH9/BdzvVH 1296 Bytes 2022-10-05 16:06:59+0100
85+
db/e32b60a5-6c27622f/g/40c5-5e41ac-92006c/.group.json
86+
sQwXZJAcjr1M0do1BsaFmnN6FlDLRwzM 1056 Bytes 2022-10-10 09:07:00+0100
87+
zkRK4cagD9alQWUeN3BKTi9T+SqQdjcO 193 Bytes 2022-10-05 16:07:00+0100
88+
```
89+
90+
The output shows that the most recent version
91+
(`HikS0B1PNyvCKLO+BmagsRaAnF1sL9zL`) of the file
92+
`db/e32b60a5-6c27622f/d/693e-302825-f8c087/0` is 0 bytes large, while its
93+
previous version was 1296 bytes large. This means that its content has been
94+
deleted on the 10th October. If we want to restore the old data, we can do so by
95+
letting hss3dump know that it should download only versions older than October
96+
10th. This can be done by supplying a corresponding RFC3339 timestamp via the
97+
command's `-b` flag:
98+
99+
```sh
100+
$ hss3dump -b "2022-10-10T00:00:00+0100" hsds-bucket home/user/domain.h5
101+
```
102+
103+
Hss3dump will then either download the most recent version that satisfies this
104+
condition, or - if no version of an object satisfies the condition - the oldest
105+
version present is chosen instead.

fs_hsds_storer.go

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
// Copyright 2022 UL Method Park GmbH. All rights reserved.
2+
// Use of this source code is governed by a BSD-style
3+
// license that can be found in the LICENSE file.
4+
5+
package main
6+
7+
import (
8+
"context"
9+
"encoding/json"
10+
"errors"
11+
"fmt"
12+
"io"
13+
"os"
14+
"path/filepath"
15+
)
16+
17+
type pathError struct {
18+
path string
19+
}
20+
21+
func (err *pathError) Error() string {
22+
return fmt.Sprintf("filesystem: '%s' is not a valid filename", err.path)
23+
}
24+
25+
// filesystemHSDSStorer is an implementation of the DomainStorer and
26+
// ObjectStorer interfaces that uses the local filesystem as its underlying
27+
// storage.
28+
type filesystemHSDSStorer struct {
29+
// Root is the storer's root directory. All domains and domain objects
30+
// stored by the storer will reside in this directory.
31+
Root string
32+
}
33+
34+
func sanitizePath(root, name string) (string, error) {
35+
name = filepath.Join("/", filepath.FromSlash(name))
36+
if name == "/" {
37+
return "", &pathError{path: name}
38+
}
39+
name = filepath.Join(root, name)
40+
return name, nil
41+
}
42+
43+
func openForWriting(root, name string) (io.WriteCloser, error) {
44+
name, err := sanitizePath(root, name)
45+
if err != nil {
46+
return nil, err
47+
}
48+
f, err := os.OpenFile(name, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0644)
49+
if err != nil {
50+
return nil, err
51+
}
52+
return f, err
53+
}
54+
55+
func createParentDomains(root, name string, domain *hsdsDomain) error {
56+
name = filepath.Clean(name)
57+
if name == "." {
58+
return nil
59+
}
60+
61+
dirName, err := sanitizePath(root, name)
62+
if err != nil {
63+
return err
64+
}
65+
err = os.MkdirAll(dirName, 0744)
66+
if err != nil {
67+
return err
68+
}
69+
70+
parentDir, _ := filepath.Split(name)
71+
parentDirs := filepath.SplitList(parentDir)
72+
// Directory domains do not have a root group.
73+
parent := *domain
74+
parent.Root = nil
75+
dn := root
76+
for _, subDir := range parentDirs {
77+
dn = filepath.Join(dn, subDir, ".domain.json")
78+
f, err := os.OpenFile(dn, os.O_CREATE|os.O_WRONLY|os.O_EXCL, 0644)
79+
// We only create domain files for parent directories that do not already exist.
80+
if errors.Is(err, os.ErrExist) {
81+
continue
82+
} else if err != nil {
83+
return err
84+
}
85+
86+
enc := json.NewEncoder(f)
87+
err = enc.Encode(parent)
88+
if err != nil {
89+
f.Close()
90+
return err
91+
}
92+
err = f.Close()
93+
if err != nil {
94+
return err
95+
}
96+
}
97+
98+
return nil
99+
}
100+
101+
func (s *filesystemHSDSStorer) StoreDomain(ctx context.Context, name string, domain *hsdsDomain) error {
102+
err := createParentDomains(s.Root, name, domain)
103+
if err != nil {
104+
return err
105+
}
106+
107+
name = filepath.Join(name, ".domain.json")
108+
f, err := openForWriting(s.Root, name)
109+
enc := json.NewEncoder(f)
110+
err = enc.Encode(domain)
111+
if err != nil {
112+
f.Close()
113+
return err
114+
}
115+
return f.Close()
116+
}
117+
118+
func (s *filesystemHSDSStorer) StoreObject(ctx context.Context, name string, data []byte) error {
119+
dir, err := sanitizePath(s.Root, name)
120+
if err != nil {
121+
return err
122+
}
123+
dir, _ = filepath.Split(dir)
124+
err = os.MkdirAll(dir, 0755)
125+
if err != nil {
126+
return err
127+
}
128+
129+
f, err := openForWriting(s.Root, name)
130+
if err != nil {
131+
return err
132+
}
133+
_, err = f.Write(data)
134+
if err != nil {
135+
f.Close()
136+
return err
137+
}
138+
err = f.Close()
139+
if err != nil {
140+
return err
141+
}
142+
return nil
143+
}

go.mod

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
module github.com/methodpark/hss3dump
2+
3+
go 1.15
4+
5+
require (
6+
github.com/aws/aws-sdk-go-v2 v1.17.1
7+
github.com/aws/aws-sdk-go-v2/config v1.18.2
8+
github.com/aws/aws-sdk-go-v2/service/s3 v1.29.3
9+
)

go.sum

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
github.com/aws/aws-sdk-go-v2 v1.17.1 h1:02c72fDJr87N8RAC2s3Qu0YuvMRZKNZJ9F+lAehCazk=
2+
github.com/aws/aws-sdk-go-v2 v1.17.1/go.mod h1:JLnGeGONAyi2lWXI1p0PCIOIy333JMVK1U7Hf0aRFLw=
3+
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.4.9 h1:RKci2D7tMwpvGpDNZnGQw9wk6v7o/xSwFcUAuNPoB8k=
4+
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.4.9/go.mod h1:vCmV1q1VK8eoQJ5+aYE7PkK1K6v41qJ5pJdK3ggCDvg=
5+
github.com/aws/aws-sdk-go-v2/config v1.18.2 h1:tRhTb3xMZsB0gW0sXWpqs9FeIP8iQp5SvnvwiPXzHwo=
6+
github.com/aws/aws-sdk-go-v2/config v1.18.2/go.mod h1:9XVoZTdD8ICjrgI5ddb8j918q6lEZkFYpb7uohgvU6c=
7+
github.com/aws/aws-sdk-go-v2/credentials v1.13.2 h1:F/v1w0XcFDZjL0bCdi9XWJenoPKjGbzljBhDKcryzEQ=
8+
github.com/aws/aws-sdk-go-v2/credentials v1.13.2/go.mod h1:eAT5aj/WJ2UDIA0IVNFc2byQLeD89SDEi4cjzH/MKoQ=
9+
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.12.19 h1:E3PXZSI3F2bzyj6XxUXdTIfvp425HHhwKsFvmzBwHgs=
10+
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.12.19/go.mod h1:VihW95zQpeKQWVPGkwT+2+WJNQV8UXFfMTWdU6VErL8=
11+
github.com/aws/aws-sdk-go-v2/internal/configsources v1.1.25 h1:nBO/RFxeq/IS5G9Of+ZrgucRciie2qpLy++3UGZ+q2E=
12+
github.com/aws/aws-sdk-go-v2/internal/configsources v1.1.25/go.mod h1:Zb29PYkf42vVYQY6pvSyJCJcFHlPIiY+YKdPtwnvMkY=
13+
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.4.19 h1:oRHDrwCTVT8ZXi4sr9Ld+EXk7N/KGssOr2ygNeojEhw=
14+
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.4.19/go.mod h1:6Q0546uHDp421okhmmGfbxzq2hBqbXFNpi4k+Q1JnQA=
15+
github.com/aws/aws-sdk-go-v2/internal/ini v1.3.26 h1:Mza+vlnZr+fPKFKRq/lKGVvM6B/8ZZmNdEopOwSQLms=
16+
github.com/aws/aws-sdk-go-v2/internal/ini v1.3.26/go.mod h1:Y2OJ+P+MC1u1VKnavT+PshiEuGPyh/7DqxoDNij4/bg=
17+
github.com/aws/aws-sdk-go-v2/internal/v4a v1.0.16 h1:2EXB7dtGwRYIN3XQ9qwIW504DVbKIw3r89xQnonGdsQ=
18+
github.com/aws/aws-sdk-go-v2/internal/v4a v1.0.16/go.mod h1:XH+3h395e3WVdd6T2Z3mPxuI+x/HVtdqVOREkTiyubs=
19+
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.9.10 h1:dpiPHgmFstgkLG07KaYAewvuptq5kvo52xn7tVSrtrQ=
20+
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.9.10/go.mod h1:9cBNUHI2aW4ho0A5T87O294iPDuuUOSIEDjnd1Lq/z0=
21+
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.1.20 h1:KSvtm1+fPXE0swe9GPjc6msyrdTT0LB/BP8eLugL1FI=
22+
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.1.20/go.mod h1:Mp4XI/CkWGD79AQxZ5lIFlgvC0A+gl+4BmyG1F+SfNc=
23+
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.9.19 h1:GE25AWCdNUPh9AOJzI9KIJnja7IwUc1WyUqz/JTyJ/I=
24+
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.9.19/go.mod h1:02CP6iuYP+IVnBX5HULVdSAku/85eHB2Y9EsFhrkEwU=
25+
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.13.19 h1:piDBAaWkaxkkVV3xJJbTehXCZRXYs49kvpi/LG6LR2o=
26+
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.13.19/go.mod h1:BmQWRVkLTmyNzYPFAZgon53qKLWBNSvonugD1MrSWUs=
27+
github.com/aws/aws-sdk-go-v2/service/s3 v1.29.3 h1:F6wgg8aHGNyhaAy2ONnWBThiPdLa386qNA0j33FIuSM=
28+
github.com/aws/aws-sdk-go-v2/service/s3 v1.29.3/go.mod h1:/NHbqPRiwxSPVOB2Xr+StDEH+GWV/64WwnUjv4KYzV0=
29+
github.com/aws/aws-sdk-go-v2/service/sso v1.11.25 h1:GFZitO48N/7EsFDt8fMa5iYdmWqkUDDB3Eje6z3kbG0=
30+
github.com/aws/aws-sdk-go-v2/service/sso v1.11.25/go.mod h1:IARHuzTXmj1C0KS35vboR0FeJ89OkEy1M9mWbK2ifCI=
31+
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.13.8 h1:jcw6kKZrtNfBPJkaHrscDOZoe5gvi9wjudnxvozYFJo=
32+
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.13.8/go.mod h1:er2JHN+kBY6FcMfcBBKNGCT3CarImmdFzishsqBmSRI=
33+
github.com/aws/aws-sdk-go-v2/service/sts v1.17.4 h1:YNncBj5dVYd05i4ZQ+YicOotSXo0ufc9P8kTioi13EM=
34+
github.com/aws/aws-sdk-go-v2/service/sts v1.17.4/go.mod h1:bXcN3koeVYiJcdDU89n3kCYILob7Y34AeLopUbZgLT4=
35+
github.com/aws/smithy-go v1.13.4 h1:/RN2z1txIJWeXeOkzX+Hk/4Uuvv7dWtCjbmVJcrskyk=
36+
github.com/aws/smithy-go v1.13.4/go.mod h1:Tg+OJXh4MB2R/uN61Ko2f6hTZwB/ZYGOtib8J3gBHzA=
37+
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
38+
github.com/google/go-cmp v0.5.8/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
39+
github.com/jmespath/go-jmespath v0.4.0/go.mod h1:T8mJZnbsbmF+m6zOOFylbeCJqk5+pHWvzYPziyZiYoo=
40+
github.com/jmespath/go-jmespath/internal/testify v1.5.1/go.mod h1:L3OGu8Wl2/fWfCI6z80xFu9LTZmf1ZRjMHUOPmWr69U=
41+
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
42+
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
43+
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
44+
gopkg.in/yaml.v2 v2.2.8/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=

0 commit comments

Comments
 (0)