Skip to content

Commit a1d191b

Browse files
committed
update docs
1 parent b7de78d commit a1d191b

11 files changed

Lines changed: 130 additions & 19 deletions
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
#AWS hygiene scripts
2+
3+
## Too many old alarms lying around?
4+
5+
Python:
6+
7+
```python
8+
import boto3
9+
import time
10+
filterstring = 'MyProjectName'
11+
client = boto3.client('cloudwatch')
12+
alarms = client.describe_alarms(AlarmTypes=['MetricAlarm'],StateValue='INSUFFICIENT_DATA')
13+
while True:
14+
for eachalarm in alarms['MetricAlarms']:
15+
if eachalarm['StateValue'] == 'INSUFFICIENT_DATA':
16+
if filterstring in eachalarm['AlarmName']:
17+
client.delete_alarms(AlarmNames = [eachalarm['AlarmName']])
18+
time.sleep(1) #avoid throttling
19+
token = alarms['NextToken']
20+
print(token)
21+
alarms = client.describe_alarms(AlarmTypes=['MetricAlarm'],StateValue='INSUFFICIENT_DATA',NextToken=token)
22+
```
23+
24+
## Too many old empty log groups lying around?
25+
Bash:
26+
27+
```sh
28+
aws logs describe-log-groups| in2csv -f json --key logGroups > logs.csv
29+
```
30+
31+
R:
32+
33+
(requires `dplyr` and `readr`)
34+
35+
```r
36+
library(dplyr)
37+
library(readr)
38+
read_csv(
39+
"logs.csv",
40+
col_types = cols_only(
41+
storedBytes = col_integer(),
42+
creationTime = col_double(),
43+
logGroupName = col_character()
44+
)
45+
) %>%
46+
mutate(creationTime =
47+
as.POSIXct(creationTime / 1000,
48+
origin = "1970-01-01")) %>%
49+
filter(storedBytes == 0) %>%
50+
select(logGroupName) %>%
51+
write_tsv("logs_clear.txt", col_names = F)
52+
```
53+
54+
Bash:
55+
56+
```sh
57+
parallel aws logs delete-log-group --log-group-name {1} :::: logs_clear.txt
58+
```

documentation/DF-documentation/SQS_QUEUE_information.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Do NOT set your SQS_MESSAGE_VISIBILITY very short (e.g. seconds) as if it can ma
6868

6969
## Example SQS Queue
7070

71-
[[images/Sample_SQS_Queue.png|alt="Sample_SQS_Queue"]]
71+
![Sample_SQS_Queue](images/Sample_SQS_Queue.png)
7272

7373
This is an example of an SQS Queue.
7474
You can see that there is one active task with 64 jobs in it.

documentation/DF-documentation/_toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,5 +15,7 @@ parts:
1515
- file: step_4_monitor
1616
- caption:
1717
chapters:
18+
- file: advanced_configuration
1819
- file: troubleshooting_runs
20+
- file: AWS_hygiene_scripts
1921
- file: versions
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Advanced Configuration
2+
3+
We've tried very hard to make Distributed-FIJI light and adaptable, but keeping the configuration settings to a manageable number requires making some default assumptions.
4+
Below is a non-comprehensive list of places where you can adapt the code to your own purposes.
5+
6+
## Changes you can make to Distributed-FIJI outside of the Docker container
7+
8+
* **Location of ECS configuration files:** By default these are placed into your bucket with a prefix of 'ecsconfigs/'.
9+
Alternate locations can be designated in the run script.
10+
* **Log configuration and location of exported logs:** Distributed-FIJI creates log groups with a default retention of 60 days (to avoid hitting the AWS limit of 250) and after finishing the run exports them into your bucket with a prefix of 'exportedlogs/LOG_GROUP_NAME/'.
11+
These may be modified in the run script.
12+
* **Advanced EC2 configuration:** Any additional configuration of your EC2 spot fleet (such as installing additional packages or running scripts on startup) can be done by modifying the userData parameter in the run script.
13+
* **SQS queue detailed configuration:** Distributed-FIJI creates a queue where messages will be tried 10 times before being consigned to a DeadLetterQueue, and unprocessed messages will expire after 14 days (the AWS maximum).
14+
These values can be modified in run.py.
15+
16+
## Changes that will require you to make your own Docker container
17+
18+
* **Fiji version:** We ship the most recent fiji-open-jdk-8, but in case you want to use your own Dockerized version of a different Fiji build you can edit the Dockerfile to call that Fiji Docker instead.
19+
* **Alarm names or thresholds:** These can be modified in the run-worker script.
20+
* **Frequency or types of information included in the per-instance logs:** These can be adjusted in the instance-monitor script.
21+
* **Log stream names or logging level:** These can be modified in the fiji-worker.py script.
88.6 KB
Loading
-36.4 KB
Binary file not shown.
156 KB
Loading
73.6 KB
Loading

documentation/DF-documentation/overview.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -56,15 +56,14 @@ If SQS tells them there are no visible jobs then they shut themselves down.
5656

5757
## What does this look like?
5858

59-
![Example Instance Configuration](images/sample_DCP_config_1.png)
59+
![Example Instance Configuration](images/sample_DF_config_1.png)
6060

61-
This is an example of one possible instance configuration using [Distributed-CellProfiler](http://github.com/cellprofiler/distributed-cellprofiler) as an example.
62-
This is one m4.16xlarge EC2 instance (64 CPUs, 250GB of RAM) with a 165 EBS volume mounted on it. A spot fleet could contain many such instances.
61+
This is an example of one possible instance configuration of Distributed-FIJI.
62+
This is one m4.16xlarge EC2 instance (64 CPUs, 250GB of RAM) with a 165 EBS volume mounted on it.
63+
A spot fleet could contain many such instances.
6364
It has 16 tasks (individual Docker containers).
6465
Each Docker container uses 10GB of hard disk space and is assigned 4 CPUs and 15 GB of RAM (which it does not share with other Docker containers).
65-
Each container shares its individual resources among 4 copies of CellProfiler.
66-
Each copy of CellProfiler runs a pipeline on one "job", which can be anything from a single image to an entire 384 well plate or timelapse movie.
67-
You can optionally stagger the start time of these 4 copies of CellProfiler, ensuring that the most memory- or disk-intensive steps aren't happening simultaneously, decreasing the likelihood of a crash.
66+
Each copy of Fiji runs a pipeline on one "job", which can be anything from a single image to an entire 384 well plate or timelapse movie.
6867

6968
Read more about this and other configurations in [Step 1: Configuration](step_1_configuration.md).
7069

documentation/DF-documentation/step_1_configuration.md

Lines changed: 35 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,11 @@ Once the config file is created, simply type `python run.py setup` to set up you
99

1010
* **APP_NAME:** This will be used to tie your clusters, tasks, services, logs, and alarms together.
1111
It need not be unique, but it should be descriptive enough that you can tell jobs apart if you're running multiple jobs.
12-
* **LOG_GROUP_NAME:** The name to give the log group that will monitor the progress of your jobs and allow you to check performance or look for problems after the fact.
1312

1413
***
1514
### DOCKER REGISTRY INFORMATION
1615

17-
* **DOCKERHUB_TAG:** This is the encapsulated version of BioFormats2Raw you will be running.
16+
* **DOCKERHUB_TAG:** This is the encapsulated version of FIJI you will be running.
1817

1918
***
2019

@@ -37,12 +36,16 @@ If your jobs complete quickly and/or you don't need the data immediately you can
3736
* **EBS_VOL_SIZE:** The size of the temporary hard drive associated with each EC2 instance in GB.
3837
The minimum allowed is 22.
3938
If you have multiple Dockers running per machine, each Docker will have access to (EBS_VOL_SIZE/TASKS_PER_MACHINE) - 2 GB of space.
39+
* **DOWNLOAD_FILES:** Whether or not to download the image files to the EBS volume before processing.
40+
This completely bypasses mounting the source bucket with S3FS.
41+
This typically requires a larger EBS volume (depending on the size of your image sets, and how many sets are processed per group).
42+
It avoids occasional issues with S3FS that can crop up on longer runs and permissions issues with mounting a source bucket.
4043

4144
***
4245

4346
### DOCKER INSTANCE RUNNING ENVIRONMENT
44-
* **CPU_SHARES:** How many CPUs each Docker container may have. (1024 units = 1 core)
4547
* **MEMORY:** How much memory each Docker container may have.
48+
* **SCRIPT_DOWNLOAD_URL:** Where to download the FIJI script you will be running.
4649

4750
***
4851

@@ -58,11 +61,34 @@ See [Step 0: Prep](step_0_prep.med) for more information.
5861

5962
***
6063

64+
### LOG GROUP INFORMATION
65+
66+
* **LOG_GROUP_NAME:** The name to give the log group that will monitor the progress of your jobs and allow you to check performance or look for problems after the fact.
67+
6168
### REDUNDANCY CHECKS
6269

63-
* **CHECK_IF_DONE_BOOL:** Whether or not to check the output folder before proceeding.
64-
Case-insensitive.
65-
If an analysis fails partway through (due to some of the files being in the wrong place, an AWS outage, a machine crash, etc.), setting this to 'True' this allows you to resubmit the whole analysis but only reprocess jobs that haven't already been done.
66-
This saves you from having to try to parse exactly which jobs succeeded versus failed or from having to pay to rerun the entire analysis.
67-
If Distributed-FIJI determines the correct number of files are already in the output folder it will designate that job as completed and move onto the next one.
68-
If you actually do want to overwrite files that were previously generated (such as when you have improved a pipeline and no longer want the output of the old version), set this to 'False' to process jobs whether or not there are already files in the output folder.
70+
* **EXPECTED_NUMBER_FILES:** How many files need to be in the output folder in order to mark a job as completed.
71+
* **MIN_FILE_SIZE_BYTES:** What is the minimal number of bytes an object should be to "count"?
72+
Useful when trying to detect jobs that may have exported smaller corrupted files vs larger, full-size files.
73+
* **NECESSARY_STRING:** This allows you to optionally set a string that must be included in your file to count towards the total in EXPECTED_NUMBER_FILES.
74+
This can be helpful if your pipeline puts out a mixture of file types and you want to count only how many images were produced, for example.
75+
76+
### EXAMPLE CONFIGURATIONS
77+
78+
This is an example of one possible configuration. It's a fairly large machine that is able to process 16 jobs at the same time.
79+
80+
![Sample DF Configuration 1](images/sample_DF_config_1.png)
81+
82+
The Config settings for this example are:
83+
TASKS_PER_MACHINE = 16 (number of Dockers)
84+
EBS_VOL_SIZE = 165
85+
MEMORY = 15000 (MB for each Docker)
86+
87+
![Sample DF Configuration 2](images/sample_DF_config_2.png)
88+
89+
This is an example of another possible configuration. When we run Distributed Fiji we tend to prefer running a larger number of smaller machine. This is an example of a configuration we often use. We might use a spot fleet of 100 of these machines (CLUSTER_MACHINES = 100).
90+
91+
The Config settings for this example are:
92+
TASKS_PER_MACHINE = 1 (number of Dockers)
93+
EBS_VOL_SIZE = 22
94+
MEMORY = 15000 (MB for each Docker)

0 commit comments

Comments
 (0)