updated readme

SamoraHunter · SamoraHunter · commit 097e4eb03ac4 · 2025-09-16T18:24:15.000+01:00
diff --git a/README.md b/README.md
@@ -1,19 +1,26 @@
 # ml_binary_classification_gridsearch_hyperOpt
 
+[![Documentation Status](https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt/actions/workflows/docs.yml/badge.svg)](https://samorahunter.github.io/ml_binary_classification_gridsearch_hyperOpt/)
+
+[![CI/CD](https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt/actions/workflows/test.yml/badge.svg)](https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt/actions/workflows/test.yml)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+
 This repository contains Python code for binary classification using grid search and hyperparameter optimization techniques.
 
 # Table of Contents
 
 - [ml_binary_classification_gridsearch_hyperOpt](#ml_binary_classification_gridsearch_hyperopt)
 - [Overview](#overview)
 - [Diagrams](#diagrams)
+- [Features](#features)
 - [Getting Started](#getting-started)
   - [Prerequisites](#prerequisites)
 - [Installation](#installation)
   - [Windows](#windows)
   - [Unix/Linux](#unixlinux)
 - [Usage](#usage)
 - [Examples](#examples)
+- [Project Structure](#project-structure)
 - [Contributing](#contributing)
 - [License](#license)
 - [Appendix](#appendix)
@@ -24,6 +31,23 @@ This repository contains Python code for binary classification using grid search
 
 Binary classification is a common machine learning task where the goal is to categorize data into one of two classes. This repository provides a framework for performing binary classification using various machine learning algorithms and optimizing their hyperparameters through grid search and hyperparameter optimization techniques.
 
+## Features
+
+This framework is designed to be a comprehensive toolkit for binary classification experiments, offering a wide range of configurable options:
+
+- **Diverse Model Support:** Includes a collection of standard classifiers (e.g., Logistic Regression, SVM, RandomForest, XGBoost, LightGBM, CatBoost) and specialized time-series models from the `aeon` library (e.g., HIVE-COTE v2, MUSE, OrdinalTDE).
+- **Advanced Hyperparameter Tuning:** Supports multiple search strategies:
+  - **Grid Search:** Exhaustively search a defined parameter grid.
+  - **Random Search:** Randomly sample from the parameter space.
+  - **Bayesian Optimization:** Intelligently search the parameter space using `scikit-optimize`.
+- **Configurable Data Pipeline:** A highly modular pipeline allows for fine-grained control over data processing steps:
+  - **Feature Selection:** Toggle groups of features (e.g., demographics, blood tests, annotations).
+  - **Data Cleaning:** Handle missing values, constant columns, and correlated features.
+  - **Resampling:** Address class imbalance with oversampling (RandomOverSampler) or undersampling (RandomUnderSampler).
+  - **Scaling:** Apply standard scaling to numeric features.
+- **Automated Results Analysis:** Includes tools to automatically aggregate results from multiple runs and generate insightful plots, such as global parameter importance.
+- **Time-Series Capabilities:** Specialized pipeline mode for handling time-series data, including conversion to the required 3D format for `aeon` classifiers.
+
 ## Diagrams
 
 Below are visual diagrams representing various components of the project. All `.mmd` source files are Mermaid diagrams, and the rendered versions are available in `.svg` or `.png` formats.
@@ -129,6 +153,36 @@ After installation, activate the virtual environment to run your code or noteboo
     *   On Unix/Linux/macOS: `source ml_grid_ts_env/bin/activate`
     *   On Windows: `.\ml_grid_ts_env\Scripts\activate`
 
+### Basic Example
+
+The main entry point for running experiments is typically a script or notebook that defines the parameter space and iterates through it. Here is a conceptual example of how to run a single pipeline iteration:
+
+```python
+from ml_grid.pipeline.data import pipe
+from ml_grid.util.param_space import parameter_space
+from ml_grid.util.global_params import global_parameters
+
+# Define global settings
+global_parameters.verbose = 2
+global_parameters.error_raise = False
+
+# Load the parameter space
+param_space_df = parameter_space().get_parameter_space()
+
+# Select a single parameter configuration to run
+local_param_dict = param_space_df.iloc[0].to_dict()
+
+# Instantiate and run the pipeline
+ml_grid_object = pipe(
+    file_name='path/to/your/data.csv',
+    drop_term_list=['id', 'unwanted_col'],
+    local_param_dict=local_param_dict,
+    base_project_dir='path/to/your/project/',
+    param_space_index=0
+)
+
+# The pipeline runs on initialization. Results are logged to files.
+```
 If you are using Jupyter, you can also select the kernel created during installation (e.g., `Python (ml_grid_env)`) directly from the Jupyter interface.
 
 ## Examples
@@ -137,6 +191,8 @@ See [ml_grid/tests/unit_test_synthetic.ipynb]
 
 ## Documentation
 
+The latest documentation is hosted online and can be viewed [here](https://samorahunter.github.io/ml_binary_classification_gridsearch_hyperOpt/).
+
 This project uses Sphinx for documentation. The documentation includes usage guides and an auto-generated API reference.
 
 To build the documentation locally:
@@ -153,6 +209,25 @@ To build the documentation locally:
 
 3.  Open `docs/build/index.html` in your web browser to view the documentation.
 
+## Project Structure
+
+The repository is organized to separate concerns, making it easier to navigate and extend.
+
+```
+.
+├── assets/                 # Mermaid diagrams and other assets
+├── docs/                   # Sphinx documentation source and build files
+├── ml_grid/                # Main source code for the library
+│   ├── model_classes/      # Standard classifier wrappers
+│   ├── model_classes_time_series/ # Time-series classifier wrappers
+│   ├── pipeline/           # Core data processing and pipeline logic
+│   ├── results_processing/ # Tools for aggregating and plotting results
+│   └── util/                 # Utility functions and global parameters
+├── tests/                  # Unit and integration tests
+├── install.sh              # Installation script for Unix/Linux
+└── install.bat             # Installation script for Windows
+```
+
 ## Contributing
 If you would like to contribute to this project, please follow these steps: