Updated Main Pipeline and Reflected all updates in README

kmock930 · kmock930 · commit f0869080da6b · 2025-12-11T17:28:22.000-05:00
diff --git a/README.md b/README.md
@@ -82,8 +82,39 @@ This is a [summary](./report/CSI5186_AI_Testing_Project_Report___Fernando__Kelvi
    ```
 
 ## Execution Guide
-### Data Preparation
-1. Run `data_download.py` to firstly download the datasets needed. 
+
+### Complete Pipeline (Recommended)
+Run the complete pipeline from data preparation through experiments to final training:
+
+```bash
+# Run all models with all optimizers
+python main.py
+
+# Run specific model with all optimizers
+python main.py --model dt
+
+# Run specific model with specific optimizer
+python main.py --model cnn --optimizer rs
+
+# Force re-download and re-process datasets
+python main.py --force
+```
+
+**Available arguments:**
+- `--model`: Model to run - `["dt", "knn", "cnn"]`. If omitted, runs all models.
+- `--optimizer`: Optimizer to use - `["rs", "ga-standard", "ga-memetic", "pso"]`. If omitted, runs all optimizers for each model.
+- `--force`: Force re-download and re-processing of datasets (default: False).
+
+**Pipeline stages:**
+1. **Data Download**: Automatically downloads CIFAR-10 dataset (if not present or with `--force`)
+2. **Data Processing**: Processes and prepares datasets (if not present or with `--force`)
+3. **Hyperparameter Search**: Runs experiments for specified model(s) and optimizer(s) combinations
+4. **Final Training**: Trains models with best-found hyperparameters on full dataset and evaluates on test set
+
+### Data Preparation (Manual)
+If you prefer to run data preparation separately:
+
+1. Run `data_download.py` to download the datasets needed. 
    * Note: Data are stored into the `.cache\` folder which is gitignored.
    * Note: Should you rerun the script again, and the folder already exists with contents, please run the script with argument `--force` to enable a smooth overwriting behavior. 
 2. Run `data_process.py` to process the images in the datasets. 
@@ -137,24 +168,41 @@ python hparam_search.py
 * It passes parameters directly to model creation/training; and so, it'll be less flexible for advanced training configs (e.g., custom epochs, patience).
 * It is intended for quick experiments, visualizations, and debugging with a single optimizer.
 
-### Run a Full Experiment
-* You can run a full hyperparameter search based on this script: 
+### Run a Full Experiment (Advanced)
+If you want more control over the experiment parameters, you can run experiments directly using this script: 
 ```bash
-python scripts/run_experiment.py
+python scripts/run_experiment.py --model dt
 ```
 **Available arguments:**
-- `--model`: Model Choices - `["dt", "knn", "cnn"]`, default `dt`. **Mandatory** Argument!!!
+- `--model`: Model Choices - `["dt", "knn", "cnn"]`. **Mandatory** argument!!!
+- `--optimizer`: Optimizer to use - `["rs", "ga-standard", "ga-memetic", "pso"]`. If omitted, runs all optimizers.
 - `--runs`: Number of independent runs - default 1.
-- `--trials`: Number of Evaluations - default 5.
-- `--evaluations`: Number of fitness evaluations per run - default 50
+- `--evaluations`: Number of fitness evaluations per run - default 50.
 - `--seed`: Base seed for randomization - default 42.
 - `--n-jobs`: Number of parallel workers - default 1 for a sequential run. Use -1 for all CPUs.
 
 * It is designed for systematic, reproducible experiments across all kinds of optimizers.
 * All optimizers are supported and selectable via CLI.
 * It saves results, convergence traces, and summaries to disk for later analysis (but not on TensorBoard).
 * For CNN, it uses a TrainingConfig object for fine-grained control (learning rate, weight decay, optimizer, batch size, patience); and disables early stopping for CNN by default for fair comparison.
-* Given its flexibility and robustness, it is intended for benchmarking, comparison, and research—especially when comparing optimizer performanc.
+* Given its flexibility and robustness, it is intended for benchmarking, comparison, and research—especially when comparing optimizer performance.
+
+### Run Final Training (Advanced)
+After running experiments, you can run final training separately:
+```bash
+python scripts/run_final_training.py
+```
+**Available arguments:**
+- `--seed`: Seed for final training runs - default 42.
+- `--experiments`: Optional list of experiment names (e.g., `dt-rs`, `cnn-ga-standard`) to include. If omitted, trains all experiments found in `.cache/experiment/`.
+- `--max-parallel-cnn`: Maximum parallel CNN trainings - default 1.
+- `--max-parallel-classic`: Maximum parallel DT/KNN trainings - default 1.
+
+This script:
+- Loads best hyperparameters from each experiment's best run
+- Trains models on the full training set
+- Evaluates on the held-out test set
+- Saves trained models and results to `.cache/final_training/`
 
 ### Analyze Results
 Upon completion of an execution from the `run_experiment.py` script, you will likely get some folders under `.cache/experiment` folder. You can visualize plots for analysis based on this script: 
diff --git a/main.py b/main.py
@@ -1,6 +1,62 @@
+import argparse
+from pathlib import Path
+import sys
+ROOT = Path(__file__).resolve().parent
+SCRIPTS_DIR = ROOT / "scripts"
+BASE_DATASETS_DIR = ROOT / ".cache" / "base_datasets"
+PROCESSED_DATASETS_DIR = ROOT / ".cache" / "processed_datasets"
+BASE_CIFAR10_DIR = BASE_DATASETS_DIR / "cifar10"
+PROCESSED_CIFAR10_DIR = PROCESSED_DATASETS_DIR / "cifar10"
+sys.path.append(str(ROOT))
+sys.path.append(str(SCRIPTS_DIR))
+
+# Import Scripts
+from scripts.data_download import download_dataset
+from scripts.data_process import main as preprocess
+from scripts.run_experiment import main as run_experiment
+from scripts.run_final_training import main as run_final_training
+
 def main():
-    print("Hello from hyperparameter-tuning-search-project!")
+    parser = argparse.ArgumentParser(description="Main Pipeline: Run Experiments and Final Training")
+    parser.add_argument("--force", default=False, help="Force re-download of datasets") # DEFAULT: do not re-download if already exists
+    parser.add_argument("--model", type=str, default=None, help="Optional Model type to run experiments on. Otherwise runs all models.")
+    parser.add_argument("--optimizer", type=str, default=None, help="Optional Optimizer type to run experiments on. Otherwise runs all optimizers.")
+    args = parser.parse_args()
+    if not BASE_DATASETS_DIR.exists() or args.force:
+        download_dataset(
+            repo_id="uoft-cs/cifar10",
+            destination=BASE_CIFAR10_DIR,
+            force=args.force,
+        )
+    if not PROCESSED_CIFAR10_DIR.exists() or args.force:
+        preprocess()
+
+    # Remove '--force' and its value from sys.argv if present
+    for item in ['--force', str(args.force)]:
+        if item in sys.argv:
+            sys.argv.remove(item)
+
+    models = [args.model] if args.model else ['dt', 'knn', 'cnn']
+    
+    for modelName in models:
+        sys.argv += ['--model', modelName]
+        if args.optimizer:
+            sys.argv += ['--optimizer', args.optimizer]
+
+        run_experiment()
+
+        # Clean up arguments
+        if '--model' in sys.argv:
+            sys.argv.remove('--model')
+        if modelName in sys.argv:
+            sys.argv.remove(modelName)
+        if args.optimizer and '--optimizer' in sys.argv:
+            sys.argv.remove('--optimizer')
+        if args.optimizer and args.optimizer in sys.argv:
+            sys.argv.remove(args.optimizer)
+    
+    return run_final_training()
 
 
 if __name__ == "__main__":
-    main()
+    exit(main())