Skip to content

Commit 097e4eb

Browse files
committed
updated readme
1 parent 98b8403 commit 097e4eb

1 file changed

Lines changed: 75 additions & 0 deletions

File tree

README.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,26 @@
11
# ml_binary_classification_gridsearch_hyperOpt
22

3+
[![Documentation Status](https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt/actions/workflows/docs.yml/badge.svg)](https://samorahunter.github.io/ml_binary_classification_gridsearch_hyperOpt/)
4+
5+
[![CI/CD](https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt/actions/workflows/test.yml/badge.svg)](https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt/actions/workflows/test.yml)
6+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7+
38
This repository contains Python code for binary classification using grid search and hyperparameter optimization techniques.
49

510
# Table of Contents
611

712
- [ml_binary_classification_gridsearch_hyperOpt](#ml_binary_classification_gridsearch_hyperopt)
813
- [Overview](#overview)
914
- [Diagrams](#diagrams)
15+
- [Features](#features)
1016
- [Getting Started](#getting-started)
1117
- [Prerequisites](#prerequisites)
1218
- [Installation](#installation)
1319
- [Windows](#windows)
1420
- [Unix/Linux](#unixlinux)
1521
- [Usage](#usage)
1622
- [Examples](#examples)
23+
- [Project Structure](#project-structure)
1724
- [Contributing](#contributing)
1825
- [License](#license)
1926
- [Appendix](#appendix)
@@ -24,6 +31,23 @@ This repository contains Python code for binary classification using grid search
2431

2532
Binary classification is a common machine learning task where the goal is to categorize data into one of two classes. This repository provides a framework for performing binary classification using various machine learning algorithms and optimizing their hyperparameters through grid search and hyperparameter optimization techniques.
2633

34+
## Features
35+
36+
This framework is designed to be a comprehensive toolkit for binary classification experiments, offering a wide range of configurable options:
37+
38+
- **Diverse Model Support:** Includes a collection of standard classifiers (e.g., Logistic Regression, SVM, RandomForest, XGBoost, LightGBM, CatBoost) and specialized time-series models from the `aeon` library (e.g., HIVE-COTE v2, MUSE, OrdinalTDE).
39+
- **Advanced Hyperparameter Tuning:** Supports multiple search strategies:
40+
- **Grid Search:** Exhaustively search a defined parameter grid.
41+
- **Random Search:** Randomly sample from the parameter space.
42+
- **Bayesian Optimization:** Intelligently search the parameter space using `scikit-optimize`.
43+
- **Configurable Data Pipeline:** A highly modular pipeline allows for fine-grained control over data processing steps:
44+
- **Feature Selection:** Toggle groups of features (e.g., demographics, blood tests, annotations).
45+
- **Data Cleaning:** Handle missing values, constant columns, and correlated features.
46+
- **Resampling:** Address class imbalance with oversampling (RandomOverSampler) or undersampling (RandomUnderSampler).
47+
- **Scaling:** Apply standard scaling to numeric features.
48+
- **Automated Results Analysis:** Includes tools to automatically aggregate results from multiple runs and generate insightful plots, such as global parameter importance.
49+
- **Time-Series Capabilities:** Specialized pipeline mode for handling time-series data, including conversion to the required 3D format for `aeon` classifiers.
50+
2751
## Diagrams
2852

2953
Below are visual diagrams representing various components of the project. All `.mmd` source files are Mermaid diagrams, and the rendered versions are available in `.svg` or `.png` formats.
@@ -129,6 +153,36 @@ After installation, activate the virtual environment to run your code or noteboo
129153
* On Unix/Linux/macOS: `source ml_grid_ts_env/bin/activate`
130154
* On Windows: `.\ml_grid_ts_env\Scripts\activate`
131155

156+
### Basic Example
157+
158+
The main entry point for running experiments is typically a script or notebook that defines the parameter space and iterates through it. Here is a conceptual example of how to run a single pipeline iteration:
159+
160+
```python
161+
from ml_grid.pipeline.data import pipe
162+
from ml_grid.util.param_space import parameter_space
163+
from ml_grid.util.global_params import global_parameters
164+
165+
# Define global settings
166+
global_parameters.verbose = 2
167+
global_parameters.error_raise = False
168+
169+
# Load the parameter space
170+
param_space_df = parameter_space().get_parameter_space()
171+
172+
# Select a single parameter configuration to run
173+
local_param_dict = param_space_df.iloc[0].to_dict()
174+
175+
# Instantiate and run the pipeline
176+
ml_grid_object = pipe(
177+
file_name='path/to/your/data.csv',
178+
drop_term_list=['id', 'unwanted_col'],
179+
local_param_dict=local_param_dict,
180+
base_project_dir='path/to/your/project/',
181+
param_space_index=0
182+
)
183+
184+
# The pipeline runs on initialization. Results are logged to files.
185+
```
132186
If you are using Jupyter, you can also select the kernel created during installation (e.g., `Python (ml_grid_env)`) directly from the Jupyter interface.
133187

134188
## Examples
@@ -137,6 +191,8 @@ See [ml_grid/tests/unit_test_synthetic.ipynb]
137191

138192
## Documentation
139193

194+
The latest documentation is hosted online and can be viewed [here](https://samorahunter.github.io/ml_binary_classification_gridsearch_hyperOpt/).
195+
140196
This project uses Sphinx for documentation. The documentation includes usage guides and an auto-generated API reference.
141197

142198
To build the documentation locally:
@@ -153,6 +209,25 @@ To build the documentation locally:
153209

154210
3. Open `docs/build/index.html` in your web browser to view the documentation.
155211

212+
## Project Structure
213+
214+
The repository is organized to separate concerns, making it easier to navigate and extend.
215+
216+
```
217+
.
218+
├── assets/ # Mermaid diagrams and other assets
219+
├── docs/ # Sphinx documentation source and build files
220+
├── ml_grid/ # Main source code for the library
221+
│ ├── model_classes/ # Standard classifier wrappers
222+
│ ├── model_classes_time_series/ # Time-series classifier wrappers
223+
│ ├── pipeline/ # Core data processing and pipeline logic
224+
│ ├── results_processing/ # Tools for aggregating and plotting results
225+
│ └── util/ # Utility functions and global parameters
226+
├── tests/ # Unit and integration tests
227+
├── install.sh # Installation script for Unix/Linux
228+
└── install.bat # Installation script for Windows
229+
```
230+
156231
## Contributing
157232
If you would like to contribute to this project, please follow these steps:
158233

0 commit comments

Comments
 (0)