Skip to content

Commit ede8625

Browse files
committed
Public Release
1 parent 602a25b commit ede8625

53 files changed

Lines changed: 3470 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.dockerignore

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
README.md
2+
tests
3+
*.pyc
4+
*.pyo
5+
*.pyd
6+
notebooks
7+
outputs
8+
docs
9+
htmlcov
10+
dist
11+
conf
12+
ci
13+
.tox
14+
.vscode
15+
.mypy_cache
16+
.ipynb_checkpoints
17+
uis
18+
__pycache__
19+
.pytest_cache

.gcloudignore

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
README.md
2+
tests
3+
*.pyc
4+
*.pyo
5+
*.pyd
6+
notebooks
7+
outputs
8+
docs
9+
htmlcov
10+
dist
11+
conf
12+
ci
13+
.tox
14+
.vscode
15+
.mypy_cache
16+
.ipynb_checkpoints
17+
uis
18+
__pycache__
19+
.pytest_cache

.pre-commit-config.yaml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# To install the git pre-commit hook run:
2+
# pre-commit install
3+
# To update the pre-commit hooks run:
4+
# pre-commit install-hooks
5+
exclude: '^(\.tox|ci/templates|\.bumpversion\.cfg)(/|$)'
6+
7+
repos:
8+
- repo: https://github.com/ambv/black
9+
rev: 21.7b0
10+
hooks:
11+
- id: black
12+
language_version: python3.9
13+
- repo: https://github.com/pre-commit/pre-commit-hooks
14+
rev: v4.0.1
15+
hooks:
16+
- id: trailing-whitespace
17+
- id: end-of-file-fixer
18+
- id: check-docstring-first
19+
- id: check-yaml
20+
- id: debug-statements
21+
- id: name-tests-test
22+
- repo: https://github.com/PyCQA/flake8
23+
rev: 3.9.2
24+
hooks:
25+
- id: flake8
26+
additional_dependencies: [flake8-typing-imports==1.10.0]
27+
- repo: https://github.com/asottile/reorder_python_imports
28+
rev: v2.5.0
29+
hooks:
30+
- id: reorder-python-imports
31+
args: [--py3-plus]
32+
- repo: https://github.com/asottile/add-trailing-comma
33+
rev: v2.1.0
34+
hooks:
35+
- id: add-trailing-comma
36+
args: [--py36-plus]
37+
- repo: https://github.com/pre-commit/mirrors-mypy
38+
rev: v0.902
39+
hooks:
40+
- id: mypy
41+
additional_dependencies: [types-all,types-attrs]
42+
exclude: ^setup.py|main.py|.*test.py
43+
- repo: meta
44+
hooks:
45+
- id: check-hooks-apply
46+
- id: check-useless-excludes

Dockerfile

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Use the official osgeo/gdal image.
2+
FROM osgeo/gdal:ubuntu-small-latest
3+
4+
# Allow statements and log messages to immediately appear in the Knative logs
5+
ENV PYTHONUNBUFFERED True
6+
7+
# Set provider dir path
8+
ENV PROVIDER providers/gcp
9+
10+
ENV APP_HOME /app
11+
12+
RUN if [ -z "$MONITOR_TABLE" ]; then echo 'WARNING: Environment variable MONITOR_TABLE not specified. Task statuses wont be output.'; fi
13+
14+
15+
WORKDIR $APP_HOME
16+
# Copy local code to the container image.
17+
COPY . ./satextractor
18+
COPY $PROVIDER ./
19+
# Install GDAL dependencies
20+
RUN apt-get update
21+
RUN apt-get install -y python3-pip
22+
# Install production dependencies.
23+
RUN pip install --no-cache-dir ./satextractor
24+
RUN pip install --no-cache-dir -r requirements.txt
25+
26+
# Run the web service on container startup. Here we use the gunicorn
27+
# webserver, with one worker process and 8 threads.
28+
# For environments with multiple CPU cores, increase the number of workers
29+
# to be equal to the cores available.
30+
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.
31+
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app

LICENSE

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
BSD 2-Clause License
2+
3+
Copyright (c) 2021, Francisco Dorr. All rights reserved.
4+
5+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
6+
7+
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
8+
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
9+
10+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
<div id="top"></div>
2+
<!--
3+
*** Thanks for checking out the Best-README-Template. If you have a suggestion
4+
*** that would make this better, please fork the repo and create a pull request
5+
*** or simply open an issue with the tag "enhancement".
6+
*** Don't forget to give the project a star!
7+
*** Thanks again! Now go create something AMAZING! :D
8+
-->
9+
10+
11+
12+
<!-- PROJECT SHIELDS -->
13+
<!--
14+
*** I'm using markdown "reference style" links for readability.
15+
*** Reference links are enclosed in brackets [ ] instead of parentheses ( ).
16+
*** See the bottom of this document for the declaration of the reference variables
17+
*** for contributors-url, forks-url, etc. This is an optional, concise syntax you may use.
18+
*** https://www.markdownguide.org/basic-syntax/#reference-style-links
19+
-->
20+
<!-- [![Contributors][contributors-shield]][contributors-url]
21+
[![Forks][forks-shield]][forks-url]
22+
[![Stargazers][stars-shield]][stars-url]
23+
[![Issues][issues-shield]][issues-url]
24+
[![MIT License][license-shield]][license-url]
25+
[![LinkedIn][linkedin-shield]][linkedin-url] -->
26+
27+
28+
29+
<!-- PROJECT LOGO -->
30+
<br />
31+
<div align="center">
32+
<a href="https://github.com/othneildrew/Best-README-Template">
33+
<img src="images/satextractor.png" alt="Logo">
34+
</a>
35+
36+
<h3 align="center">SatExtractor</h3>
37+
38+
<p align="center">
39+
Build, deploy and extract satellite public constellations with one command line.
40+
<br />
41+
<a href="https://github.com/othneildrew/Best-README-Template">
42+
<img src="images/stac.gif" alt="Logo">
43+
</a>
44+
</div>
45+
46+
47+
48+
<!-- TABLE OF CONTENTS -->
49+
<details>
50+
<summary>Table of Contents</summary>
51+
<ol>
52+
<li>
53+
<a href="#about-the-project">About The Project</a>
54+
</li>
55+
<li>
56+
<a href="#getting-started">Getting Started</a>
57+
<ul>
58+
<li><a href="#structure">Structure</a></li>
59+
<li><a href="#prerequisites">Prerequisites</a></li>
60+
<li><a href="#installation">Installation</a></li>
61+
</ul>
62+
</li>
63+
<li><a href="#usage">Usage</a></li>
64+
<li><a href="#contributing">Contributing</a></li>
65+
<li><a href="#license">License</a></li>
66+
</ol>
67+
</details>
68+
69+
70+
71+
<!-- ABOUT THE PROJECT -->
72+
## About The Project
73+
74+
- *tldr*: **SatExtractor** gets **all revisits in a date range** from a given **geojson region** from any public satellite constellation and store it in a **cloud friendly format**.
75+
76+
77+
The large amount of image data makes it difficult to create datasets to train models quickly and reliably. Existing methods for extracting satellite images take a long time to process and have user quotas that restrict access.
78+
79+
Therefore, we created an open source extraction tool **SatExtractor** to perform worldwide datasets extractions using serverless providers such as **Google Cloud Platform** or **AWS** and based on a common existing standard: **STAC**.
80+
81+
The tool scales horizontally as needed, extracting revisits and storing them in **zarr** format to be easily used by deep learning models.
82+
83+
It is fully configurable using [Hydra]([hydra](https://hydra.cc/)).
84+
85+
<p align="right">(<a href="#top">back to top</a>)</p>
86+
87+
88+
<!-- GETTING STARTED -->
89+
## Getting Started
90+
91+
**SatExtractor** needs a cloud provider to work. Before you start using it, you'll need to create and configure a cloud provider account.
92+
93+
We provide the implementation to work with [Google Cloud](https://cloud.google.com/), but **SatExtractor** is implemented to be easily extensible to other providers.
94+
95+
### Structure
96+
97+
The package is structured in a modular and configurable approach. It is basically a pipeline containing 6 important steps (separated in modules).
98+
99+
- **Builder**: contains the logic to build the container that will run the extraction. <details>
100+
<summary>more info</summary>
101+
SatExtractor is based on a docker container. The Dockerfile in the root dir is used to build the core package and a reference in it to the specific provider extraction logic should be explicitly added (see the gcp example in directory providers/gcp).
102+
103+
This is done by setting <code> ENV PROVIDER </code> var to point the provider directory. In the default Dockerfile it is set to gcp: <code> ENV PROVIDER providers/gcp </code>.
104+
</details>
105+
106+
- **Stac**: converts a public constellation to the **STAC standard**. <details>
107+
<summary>more info</summary>
108+
If the original constellation is not already in STAC standard it should be converted. To do so, you have to implement the constellation specific STAC conversor. Sentinel 2 and Landsat 7/8 examples can be found in <code> src/satextractor/stac </code>. The function that is actually called to perform the conversion to the STAC standard is set in stac hydra config file ( <code> conf/stac/gcp.yaml </code>)
109+
</details>
110+
111+
- **Tiler**: Creates tiles of the given region to perform the extraction. <details>
112+
<summary>more info</summary>
113+
The Tiler split the region in UTM tiles using <a href=https://sentinelhub-py.readthedocs.io/en/latest/examples/large_area_utilities.html> SentinelHub splitter </a>. There will be one Extraction Task per Tile. The config about the tiler can be found in <code> conf/tiler/utm.yaml </code>. There, the size of the tiles can be specified. Take into account that these tiles are not the actual patches that are later stored in your cloud provider, this is just the unit from where the (smaller) patches will be extracted.
114+
</details>
115+
116+
- **Scheduler**: Decides how those tiles are going to be scheduled creating extractions tasks. <details>
117+
<summary>more info</summary>
118+
The Scheduler takes the resulting tiles from the Tiler and creates the actual patches (called also tiles) to be extracted.
119+
120+
For example, if the Tiler splitted the region in 10000x10000 tiles, now the scheduler can be set to extract from each of the tiles smaller patches of, say, 1000x1000. Also, the scheduler calculates the intersection between the patches and the constellation STAC assets. At the end, you'll have and object called <code> ExtractionTask </code> with the information to extract one revisit, one band and one tile splitted in multiple patches. This <code> ExtractionTask </code> will be send to the cloud provider to perform the actual extraction.
121+
122+
The config about the scheduler can be found in <code> conf/scheduler/utm.yaml </code>.
123+
</details>
124+
125+
- **Preparer**: Prepare the files in the cloud storage. <details>
126+
<summary>more info</summary>
127+
The Preparer creates the cloud file structure. It creates the needed zarr groups and arrays in order to later store the extracted patches.
128+
129+
The gcp preparer config can be found in <code> conf/preparer/gcp.yaml </code>.
130+
</details>
131+
132+
- **Deployer**: Deploy the extraction tasks created by the scheduler to perform the extraction. <details>
133+
<summary>more info</summary>
134+
The Deployer sends one message per ExtractionTask to the cloud provider to perform the actal extraction. It works by publishing messages to a PubSub queue where the extraction is subscribed to. When a new message (ExtractionTask) arrives it will be automatically run on the cloud autoscaling.
135+
The gcp deployer config can be found in <code> conf/deployer/gcp.yaml </code>.
136+
</details>
137+
138+
139+
All the steps are **optional** and the user decides which to run the **main config file**.
140+
141+
142+
### Prerequisites
143+
144+
In order to run **SatExtractor** we recommend to have a virtual env and a cloud provider user should already been created.
145+
146+
### Installation
147+
148+
149+
1. Clone the repo
150+
```sh
151+
git clone https://github.com/FrontierDevelopmentLab/sat-extractor
152+
```
153+
2. Install python packages
154+
```sh
155+
pip install .
156+
```
157+
158+
<p align="right">(<a href="#top">back to top</a>)</p>
159+
160+
161+
162+
<!-- USAGE EXAMPLES -->
163+
## Usage
164+
165+
Once a cloud provider user is set and the package is installed you'll need to grab the geojson region you want (you can get it from the super-cool tool [geojson.io](geojson.io)) and change the config files.
166+
167+
1. Save the region as `<your_region_name>.geojson` and store it in the `outputs` folder (you can change your output dir in the `config.yaml`)
168+
2. Open the `config.yaml` and you'll see something like this:
169+
170+
<img src="images/config.png" alt="Logo">
171+
172+
The important here is to set the `dataset_name` to `<your_region_name>`, define the `start_date` and `end_date` for your revisits, your `constellations` and the tasks to be run (you would want to run the `build` only one time and the comment it out.)
173+
174+
**Important**: the `token.json` contains the needed credentials to access you cloud provider. In this example case it contains the gcp credentials. You'll need to provide it.
175+
176+
3. Open the `cloud/<provider>.yaml` and add there your account info as in the default provided file.
177+
(optional): you can choose different configurations by changing modules configs: `builder`, `stac`, `tiler`, `scheduler`, `preparer`, etc. There you can change things like patch_size, chunk_size.
178+
179+
4. Run `python src/satextractor/cli.py` and enjoy!
180+
181+
182+
<p align="right">(<a href="#top">back to top</a>)</p>
183+
184+
185+
See the [open issues](https://github.com/FrontierDevelopmentLab/sat-extractor/issues) for a full list of proposed features (and known issues).
186+
187+
<p align="right">(<a href="#top">back to top</a>)</p>
188+
189+
190+
191+
<!-- CONTRIBUTING -->
192+
## Contributing
193+
194+
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.
195+
196+
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
197+
Don't forget to give the project a star! Thanks again!
198+
199+
1. Fork the Project
200+
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
201+
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
202+
4. Push to the Branch (`git push origin feature/AmazingFeature`)
203+
5. Open a Pull Request
204+
205+
<p align="right">(<a href="#top">back to top</a>)</p>
206+
207+
208+
209+
<!-- LICENSE -->
210+
## License
211+
212+
Distributed under the MIT License. See `LICENSE.txt` for more information.
213+
214+
<p align="right">(<a href="#top">back to top</a>)</p>

app.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
from ui import create_app
2+
3+
app = create_app()
4+
5+
if __name__ == "__main__":
6+
app.run(host="0.0.0.0", debug=True)

conf/builder/gcp.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
_target_: satextractor.builder.gcp_builder.build_gcp

conf/cloud/gcp.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
user_id: lucas
2+
project: fdl-europe-world-food
3+
region: europe-west1
4+
storage_root: lk-wfe-test-eu
5+
storage_prefix: gs:/

conf/config.yaml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
defaults:
2+
- stac: gcp
3+
- tiler: utm
4+
- scheduler: utm
5+
- deployer: gcp
6+
- builder: gcp
7+
- cloud: gcp
8+
- preparer: gcp
9+
- _self_
10+
tasks:
11+
- build
12+
- stac
13+
- tile
14+
- schedule
15+
- prepare
16+
- deploy
17+
dataset_name: cordoba
18+
output: ./outputs/${dataset_name}
19+
hydra:
20+
run:
21+
dir: .
22+
log_path: ${output}/main.log
23+
credentials: ${output}/token.json
24+
gpd_input: ${output}/${dataset_name}.geojson
25+
item_collection: ${output}/item_collection.geojson
26+
tiles: ${output}/tiles.pkl
27+
extraction_tasks: ${output}/extraction_tasks.pkl
28+
start_date: 2020-01-01
29+
end_date: 2020-02-01
30+
constellations:
31+
- sentinel-2
32+
- landsat-5
33+
- landsat-7
34+
- landsat-8

0 commit comments

Comments
 (0)