RandomDefaultUser
diff --git a/‎.dockerignore‎
Lines changed: 1 addition & 1 deletion b/‎.dockerignore‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/cpu-tests.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/cpu-tests.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎Dockerfile‎
Lines changed: 1 addition & 1 deletion b/‎Dockerfile‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/CONTRIBUTE.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/CONTRIBUTE.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/advanced_usage/hyperparameters.rst‎
Lines changed: 29 additions & 1 deletion b/‎docs/source/advanced_usage/hyperparameters.rst‎
Lines changed: 29 additions & 1 deletion
diff --git a/‎docs/source/advanced_usage/trainingmodel.rst‎
Lines changed: 7 additions & 7 deletions b/‎docs/source/advanced_usage/trainingmodel.rst‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎docs/source/install/installing_mala.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/install/installing_mala.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/advanced/ex01_checkpoint_training.py‎
Lines changed: 3 additions & 3 deletions b/‎examples/advanced/ex01_checkpoint_training.py‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/advanced/ex03_tensor_board.py‎
Lines changed: 2 additions & 2 deletions b/‎examples/advanced/ex03_tensor_board.py‎
Lines changed: 2 additions & 2 deletions
@@ -1,2 +1,2 @@
 *
-!install/*
+!pipeline/*
@@ -183,7 +183,7 @@ jobs:
           # be there before it has been installed.
           sed -i '/materials-learning-algorithms/d' ./env_after.yml
 
-          # if comparison fails, `install/mala_cpu_[base]_environment.yml` needs to be aligned with
+          # if comparison fails, `pipeline/mala_cpu_[base]_environment.yml` needs to be aligned with
           # `requirements.txt` and/or extra dependencies are missing in the Docker Conda environment
 
           if diff --brief env_before.yml env_after.yml
 
@@ -153,6 +153,7 @@ cython_debug/
 *.out
 *.npy
 *.pkl
+*.pk
 *.pth
 *.json
 
 
@@ -14,7 +14,7 @@ RUN apt-get --allow-releaseinfo-change update && apt-get upgrade -y && \
 
 # Choose 'cpu' or 'gpu'
 ARG DEVICE=cpu
-COPY install/mala_${DEVICE}_environment.yml .
+COPY pipeline/mala_${DEVICE}_environment.yml .
 RUN conda env create -f mala_${DEVICE}_environment.yml && rm -rf /opt/conda/pkgs/*
 
 # Install optional MALA dependencies into Conda environment with pip
 
@@ -116,7 +116,7 @@ If you add additional dependencies, make sure to add them to `requirements.txt`
 if they are required or to `setup.py` under the appropriate `extras` tag if
 they are not.
 Further, in order for them to be available during the CI tests, make sure to
-add _required_ dependencies to the appropriate environment files in folder `install/` and _extra_ requirements directly in the `Dockerfile` for the `conda` environment build.
+add _required_ dependencies to the appropriate environment files in folder `pipeline/` and _extra_ requirements directly in the `Dockerfile` for the `conda` environment build.
 
 ## Pull Requests
 We actively welcome pull requests.
 
@@ -96,6 +96,34 @@ are started with ``wait_time`` time interval in between (to avoid race
 conditions when accessing the same data base) and further only use the data
 base, not MPI, for communication.
 
+The batch job on your HPC cluster will get killed after the designated runtime.
+Then unfinished trials will remain in the Optuna database in state RUNNING.
+
+The current workflow for resuming the study which makes use of MALA's own
+resume tooling
+(see ``examples/advanced/ex05_checkpoint_hyperparameter_optimization.py``) is
+this: before submitting the batch job again and let the script do the resume
+work, a user needs to modify the database like so:
+
+    .. code-block:: bash
+
+        python3 -c "import mala; mala.HyperOptOptuna.requeue_zombie_trials('hyperopt01', 'sqlite:///hyperopt.db')"
+
+which will set the RUNNING trials to state WAITING.
+When Optuna resumes, it will pick up and re-run those, before carrying on
+running the resumed study.
+
+Common questions related to this feature:
+
+- "Does "injecting" jobs like this disturb Optuna's operation in any way?":
+  No, the study object takes all of its information directly from the
+  data base, which in this case has "WAITING" trials now.
+- "Do those trials have to be run?": Technically not. One could simply ignore
+  them and re-run without them. The problem is that in this case, the study
+  will have missing data points from trials that have been suggested for a
+  reason, so even if Optuna would resume fine, we still want to re-run them
+  from an optimization point of view.
+
 If you do distributed hyperparameter optimization, another useful option
 is
 
@@ -114,7 +142,7 @@ a physical validation metric such as
 
       .. code-block:: python
 
-            parameters.running.after_training_metric = "band_energy"
+            parameters.running.final_validation_metric = "band_energy"
 
 Advanced optimization algorithms
 ********************************
 
@@ -71,13 +71,13 @@ is directly outputted by MALA. By default, this validation loss gives the
 mean squared error between LDOS prediction and actual value. From a purely
 ML point of view, this is fine; however, the correctness of the LDOS itself
 does not hold much physical virtue. Thus, MALA implements physical validation
-metrics to be accessed before and after the training routine.
+metrics which can be evaluated for example after the training.
 
 Specifically, when setting
 
       .. code-block:: python
 
-            parameters.running.after_training_metric = "band_energy"
+            parameters.running.final_validation_metric = "band_energy"
 
 the error in the band energy between actual and predicted LDOS will be
 calculated and printed before and after network training (in meV/atom).
@@ -212,7 +212,7 @@ in the file ``advanced/ex03_tensor_board``. Simply select a logger prior to trai
       .. code-block:: python
 
             parameters.running.logger = "tensorboard"
-            parameters.running.logging_dir = "mala_vis"
+            parameters.running.logging_dir = "mala_logs"
 
 or
 
@@ -224,14 +224,14 @@ or
                   entity="your_wandb_entity"
             )
             parameters.running.logger = "wandb"
-            parameters.running.logging_dir = "mala_vis"
+            parameters.running.logging_dir = "mala_logs"
 
 where ``logging_dir`` specifies some directory in which to save the
 MALA logging data. You can also select which metrics to record via
 
       .. code-block:: python
 
-            parameters.validation_metrics = ["ldos", "dos", "density", "total_energy"]
+            parameters.logging_metrics = ["ldos", "dos", "density", "total_energy"]
 
 Full list of available metrics:
       - "ldos": MSE of the LDOS.
@@ -249,14 +249,14 @@ To save time and resources you can specify the logging interval via
 
       .. code-block:: python
 
-            parameters.running.validate_every_n_epochs = 10
+            parameters.running.logging_metrics_interval = 10
 
 If you want to monitor the degree to which the model overfits to the training data,
 you can use the option
 
       .. code-block:: python
             
-            parameters.running.validate_on_training_data = True
+            parameters.running.log_metrics_on_train_set = True
 
 MALA will evaluate the validation metrics on the training set as well as the validation set.
 
 
@@ -45,7 +45,7 @@ itself is subject to ongoing development as well.
 
     git clone https://github.com/mala-project/test-data ~/path/to/data/repo
     cd ~/path/to/data/repo
-    git checkout v1.8.1
+    git checkout 2.0.0
 
 * Export the path to that repo by ``export MALA_DATA_REPO=~/path/to/data/repo``
 
 
@@ -57,10 +57,10 @@ def initial_setup():
         data_handler.output_dimension,
     ]
 
-    test_network = mala.Network(parameters)
-    test_trainer = mala.Trainer(parameters, test_network, data_handler)
+    network = mala.Network(parameters)
+    trainer = mala.Trainer(parameters, network, data_handler)
 
-    return parameters, test_network, data_handler, test_trainer
+    return parameters, network, data_handler, trainer
 
 
 if mala.Trainer.run_exists("ex01_checkpoint", path="./"):
 
@@ -27,8 +27,8 @@
 # files into.
 parameters.running.logger = "tensorboard"
 parameters.running.logging_dir = "mala_vis"
-parameters.running.validation_metrics = ["ldos", "band_energy"]
-parameters.running.validate_every_n_epochs = 5
+parameters.running.logging_metrics = ["ldos", "band_energy"]
+parameters.running.logging_metrics_interval = 5
 
 data_handler = mala.DataHandler(parameters)
 data_handler.add_snapshot(
-Original file line number
+Diff line change
 *.out
 *.npy
 *.pkl
 +*.pk
 *.pth
 *.json