@@ -194,22 +194,64 @@ keyword, you can fine-tune the number of new snapshots being created.
194194By default, the same number of snapshots as had been provided will be created
195195(if possible).
196196
197- Using tensorboard
198- ******************
197+ Logging metrics during training
198+ *******************************
199+
200+ Training progress in MALA can be visualized via tensorboard or wandb, as also shown
201+ in the file ``advanced/ex03_tensor_board ``. Simply select a logger prior to training as
202+
203+ .. code-block :: python
204+
205+ parameters.running.logger = " tensorboard"
206+ parameters.running.logging_dir = " mala_vis"
199207
200- Training routines in MALA can be visualized via tensorboard, as also shown
201- in the file ``advanced/ex03_tensor_board ``. Simply enable tensorboard
202- visualization prior to training via
208+ or
203209
204210 .. code-block :: python
205211
206- # 0: No visualizatuon, 1: loss and learning rate, 2: like 1,
207- # but additionally weights and biases are saved
208- parameters.running.logging = 1
212+ import wandb
213+ wandb.init(
214+ project = " mala_training" ,
215+ entity = " your_wandb_entity"
216+ )
217+ parameters.running.logger = " wandb"
209218 parameters.running.logging_dir = " mala_vis"
210219
211220 where ``logging_dir `` specifies some directory in which to save the
212- MALA logging data. Afterwards, you can run the training without any
221+ MALA logging data. You can also select which metrics to record via
222+
223+ .. code-block :: python
224+
225+ parameters.validation_metrics = [" ldos" , " dos" , " density" , " total_energy" ]
226+
227+ Full list of available metrics:
228+ - "ldos": MSE of the LDOS.
229+ - "band_energy": Band energy.
230+ - "band_energy_actual_fe": Band energy computed with ground truth Fermi energy.
231+ - "total_energy": Total energy.
232+ - "total_energy_actual_fe": Total energy computed with ground truth Fermi energy.
233+ - "fermi_energy": Fermi energy.
234+ - "density": Electron density.
235+ - "density_relative": Rlectron density (Mean Absolute Percentage Error).
236+ - "dos": Density of states.
237+ - "dos_relative": Density of states (Mean Absolute Percentage Error).
238+
239+ To save time and resources you can specify the logging interval via
240+
241+ .. code-block :: python
242+
243+ parameters.running.validate_every_n_epochs = 10
244+
245+ If you want to monitor the degree to which the model overfits to the training data,
246+ you can use the option
247+
248+ .. code-block :: python
249+
250+ parameters.running.validate_on_training_data = True
251+
252+ MALA will evaluate the validation metrics on the training set as well as the validation set.
253+
254+ Afterwards, you can run the training without any
213255other modifications. Once training is finished (or during training, in case
214256you want to use tensorboard to monitor progress), you can launch tensorboard
215257via
221263 The full path for ``path_to_log_directory `` can be accessed via
222264``trainer.full_logging_path ``.
223265
266+ If you're using wandb, you can monitor the training progress on the wandb website.
224267
225268Training in parallel
226269********************
0 commit comments