DistributedScience
diff --git a/‎documentation/DF-documentation/AWS_hygiene_scripts.md‎
Lines changed: 58 additions & 0 deletions b/‎documentation/DF-documentation/AWS_hygiene_scripts.md‎
Lines changed: 58 additions & 0 deletions
diff --git a/‎documentation/DF-documentation/SQS_QUEUE_information.md‎
Lines changed: 1 addition & 1 deletion b/‎documentation/DF-documentation/SQS_QUEUE_information.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎documentation/DF-documentation/_toc.yml‎
Lines changed: 2 additions & 0 deletions b/‎documentation/DF-documentation/_toc.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎documentation/DF-documentation/advanced_configuration.md‎
Lines changed: 21 additions & 0 deletions b/‎documentation/DF-documentation/advanced_configuration.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎documentation/DF-documentation/images/Sample_SQS_Queue.png‎
88.6 KB b/‎documentation/DF-documentation/images/Sample_SQS_Queue.png‎
88.6 KB
diff --git a/‎documentation/DF-documentation/images/sample_DCP_config_1.png‎
-36.4 KB b/‎documentation/DF-documentation/images/sample_DCP_config_1.png‎
-36.4 KB
diff --git a/‎documentation/DF-documentation/images/sample_DF_config_1.png‎
156 KB b/‎documentation/DF-documentation/images/sample_DF_config_1.png‎
156 KB
diff --git a/‎documentation/DF-documentation/images/sample_DF_config_2.png‎
73.6 KB b/‎documentation/DF-documentation/images/sample_DF_config_2.png‎
73.6 KB
diff --git a/‎documentation/DF-documentation/overview.md‎
Lines changed: 5 additions & 6 deletions b/‎documentation/DF-documentation/overview.md‎
Lines changed: 5 additions & 6 deletions
diff --git a/‎documentation/DF-documentation/step_1_configuration.md‎
Lines changed: 35 additions & 9 deletions b/‎documentation/DF-documentation/step_1_configuration.md‎
Lines changed: 35 additions & 9 deletions
@@ -0,0 +1,58 @@
+#AWS hygiene scripts
+
+## Too many old alarms lying around?
+
+Python:
+
+```python
+import boto3
+import time
+filterstring = 'MyProjectName'
+client = boto3.client('cloudwatch')
+alarms = client.describe_alarms(AlarmTypes=['MetricAlarm'],StateValue='INSUFFICIENT_DATA')
+while True:
+  for eachalarm in alarms['MetricAlarms']:
+    if eachalarm['StateValue'] == 'INSUFFICIENT_DATA':
+      if filterstring in eachalarm['AlarmName']:
+        client.delete_alarms(AlarmNames = [eachalarm['AlarmName']])
+        time.sleep(1) #avoid throttling
+  token = alarms['NextToken']
+  print(token)
+  alarms = client.describe_alarms(AlarmTypes=['MetricAlarm'],StateValue='INSUFFICIENT_DATA',NextToken=token)
+```
+
+## Too many old empty log groups lying around?
+Bash:
+
+```sh
+aws logs describe-log-groups| in2csv -f json --key logGroups > logs.csv
+```
+
+R:
+
+(requires `dplyr` and `readr`)
+
+```r
+library(dplyr)
+library(readr)
+read_csv(
+  "logs.csv",
+  col_types = cols_only(
+    storedBytes = col_integer(),
+    creationTime = col_double(),
+    logGroupName = col_character()
+  )
+) %>%
+  mutate(creationTime =
+           as.POSIXct(creationTime / 1000,
+                      origin = "1970-01-01")) %>%
+  filter(storedBytes == 0) %>%
+  select(logGroupName) %>%
+  write_tsv("logs_clear.txt", col_names = F)
+```
+
+Bash:
+
+```sh
+parallel aws logs delete-log-group --log-group-name {1} :::: logs_clear.txt
+```
@@ -68,7 +68,7 @@ Do NOT set your SQS_MESSAGE_VISIBILITY very short (e.g. seconds) as if it can ma
 
 ## Example SQS Queue
 
-[[images/Sample_SQS_Queue.png|alt="Sample_SQS_Queue"]]
+![Sample_SQS_Queue](images/Sample_SQS_Queue.png)
 
 This is an example of an SQS Queue.
 You can see that there is one active task with 64 jobs in it.
 
@@ -15,5 +15,7 @@ parts:
   - file: step_4_monitor
 - caption:
   chapters:
+  - file: advanced_configuration
   - file: troubleshooting_runs
+  - file: AWS_hygiene_scripts
   - file: versions
@@ -0,0 +1,21 @@
+# Advanced Configuration
+
+We've tried very hard to make Distributed-FIJI light and adaptable, but keeping the configuration settings to a manageable number requires making some default assumptions.
+Below is a non-comprehensive list of places where you can adapt the code to your own purposes.
+
+## Changes you can make to Distributed-FIJI outside of the Docker container
+
+* **Location of ECS configuration files:** By default these are placed into your bucket with a prefix of 'ecsconfigs/'.
+Alternate locations can be designated in the run script.
+* **Log configuration and location of exported logs:** Distributed-FIJI creates log groups with a default retention of 60 days (to avoid hitting the AWS limit of 250) and after finishing the run exports them into your bucket with a prefix of 'exportedlogs/LOG_GROUP_NAME/'.
+These may be modified in the run script.
+* **Advanced EC2 configuration:** Any additional configuration of your EC2 spot fleet (such as installing additional packages or running scripts on startup) can be done by modifying the userData parameter in the run script.
+* **SQS queue detailed configuration:** Distributed-FIJI creates a queue where messages will be tried 10 times before being consigned to a DeadLetterQueue, and unprocessed messages will expire after 14 days (the AWS maximum).
+These values can be modified in run.py.
+
+## Changes that will require you to make your own Docker container
+
+* **Fiji version:** We ship the most recent fiji-open-jdk-8, but in case you want to use your own Dockerized version of a different Fiji build you can edit the Dockerfile to call that Fiji Docker instead.
+* **Alarm names or thresholds:** These can be modified in the run-worker script.
+* **Frequency or types of information included in the per-instance logs:** These can be adjusted in the instance-monitor script.
+* **Log stream names or logging level:** These can be modified in the fiji-worker.py script.
@@ -56,15 +56,14 @@ If SQS tells them there are no visible jobs then they shut themselves down.
 
 ## What does this look like?
 
-![Example Instance Configuration](images/sample_DCP_config_1.png)
+![Example Instance Configuration](images/sample_DF_config_1.png)
 
-This is an example of one possible instance configuration using [Distributed-CellProfiler](http://github.com/cellprofiler/distributed-cellprofiler) as an example.
-This is one m4.16xlarge EC2 instance (64 CPUs, 250GB of RAM) with a 165 EBS volume mounted on it. A spot fleet could contain many such instances.
+This is an example of one possible instance configuration of Distributed-FIJI.
+This is one m4.16xlarge EC2 instance (64 CPUs, 250GB of RAM) with a 165 EBS volume mounted on it.
+A spot fleet could contain many such instances.
 It has 16 tasks (individual Docker containers).
 Each Docker container uses 10GB of hard disk space and is assigned 4 CPUs and 15 GB of RAM (which it does not share with other Docker containers).
-Each container shares its individual resources among 4 copies of CellProfiler.
-Each copy of CellProfiler runs a pipeline on one "job", which can be anything from a single image to an entire 384 well plate or timelapse movie.
-You can optionally stagger the start time of these 4 copies of CellProfiler, ensuring that the most memory- or disk-intensive steps aren't happening simultaneously, decreasing the likelihood of a crash.
+Each copy of Fiji runs a pipeline on one "job", which can be anything from a single image to an entire 384 well plate or timelapse movie.
 
 Read more about this and other configurations in [Step 1: Configuration](step_1_configuration.md).
 
 
@@ -9,12 +9,11 @@ Once the config file is created, simply type `python run.py setup` to set up you
 
 * **APP_NAME:** This will be used to tie your clusters, tasks, services, logs, and alarms together.
 It need not be unique, but it should be descriptive enough that you can tell jobs apart if you're running multiple jobs.
-* **LOG_GROUP_NAME:** The name to give the log group that will monitor the progress of your jobs and allow you to check performance or look for problems after the fact.
 
 ***
 ### DOCKER REGISTRY INFORMATION
 
-* **DOCKERHUB_TAG:** This is the encapsulated version of BioFormats2Raw you will be running.
+* **DOCKERHUB_TAG:** This is the encapsulated version of FIJI you will be running.
 
 ***
 
@@ -37,12 +36,16 @@ If your jobs complete quickly and/or you don't need the data immediately you can
 * **EBS_VOL_SIZE:** The size of the temporary hard drive associated with each EC2 instance in GB.
 The minimum allowed is 22.
 If you have multiple Dockers running per machine, each Docker will have access to (EBS_VOL_SIZE/TASKS_PER_MACHINE) - 2 GB of space.
+* **DOWNLOAD_FILES:** Whether or not to download the image files to the EBS volume before processing.
+This completely bypasses mounting the source bucket with S3FS.
+This typically requires a larger EBS volume (depending on the size of your image sets, and how many sets are processed per group).
+It avoids occasional issues with S3FS that can crop up on longer runs and permissions issues with mounting a source bucket.
 
 ***
 
 ### DOCKER INSTANCE RUNNING ENVIRONMENT
-* **CPU_SHARES:** How many CPUs each Docker container may have. (1024 units = 1 core)
 * **MEMORY:** How much memory each Docker container may have.
+* **SCRIPT_DOWNLOAD_URL:** Where to download the FIJI script you will be running.
 
 ***
 
@@ -58,11 +61,34 @@ See [Step 0: Prep](step_0_prep.med) for more information.
 
 ***
 
+### LOG GROUP INFORMATION
+
+* **LOG_GROUP_NAME:** The name to give the log group that will monitor the progress of your jobs and allow you to check performance or look for problems after the fact.
+
 ### REDUNDANCY CHECKS
 
-* **CHECK_IF_DONE_BOOL:** Whether or not to check the output folder before proceeding.
-Case-insensitive.
-If an analysis fails partway through (due to some of the files being in the wrong place, an AWS outage, a machine crash, etc.), setting this to 'True' this allows you to resubmit the whole analysis but only reprocess jobs that haven't already been done.
-This saves you from having to try to parse exactly which jobs succeeded versus failed or from having to pay to rerun the entire analysis.
-If Distributed-FIJI determines the correct number of files are already in the output folder it will designate that job as completed and move onto the next one.
-If you actually do want to overwrite files that were previously generated (such as when you have improved a pipeline and no longer want the output of the old version), set this to 'False' to process jobs whether or not there are already files in the output folder.
+* **EXPECTED_NUMBER_FILES:** How many files need to be in the output folder in order to mark a job as completed.
+* **MIN_FILE_SIZE_BYTES:** What is the minimal number of bytes an object should be to "count"?
+Useful when trying to detect jobs that may have exported smaller corrupted files vs larger, full-size files.
+* **NECESSARY_STRING:** This allows you to optionally set a string that must be included in your file to count towards the total in EXPECTED_NUMBER_FILES.
+This can be helpful if your pipeline puts out a mixture of file types and you want to count only how many images were produced, for example.
+
+### EXAMPLE CONFIGURATIONS
+
+This is an example of one possible configuration. It's a fairly large machine that is able to process 16 jobs at the same time.
+
+![Sample DF Configuration 1](images/sample_DF_config_1.png)
+
+The Config settings for this example are:
+TASKS_PER_MACHINE = 16 (number of Dockers)
+EBS_VOL_SIZE = 165
+MEMORY = 15000 (MB for each Docker)
+
+![Sample DF Configuration 2](images/sample_DF_config_2.png)
+
+This is an example of another possible configuration. When we run Distributed Fiji we tend to prefer running a larger number of smaller machine. This is an example of a configuration we often use. We might use a spot fleet of 100 of these machines (CLUSTER_MACHINES = 100).
+
+The Config settings for this example are:
+TASKS_PER_MACHINE = 1 (number of Dockers)
+EBS_VOL_SIZE = 22
+MEMORY = 15000 (MB for each Docker)