You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -24,6 +31,23 @@ This repository contains Python code for binary classification using grid search
24
31
25
32
Binary classification is a common machine learning task where the goal is to categorize data into one of two classes. This repository provides a framework for performing binary classification using various machine learning algorithms and optimizing their hyperparameters through grid search and hyperparameter optimization techniques.
26
33
34
+
## Features
35
+
36
+
This framework is designed to be a comprehensive toolkit for binary classification experiments, offering a wide range of configurable options:
37
+
38
+
-**Diverse Model Support:** Includes a collection of standard classifiers (e.g., Logistic Regression, SVM, RandomForest, XGBoost, LightGBM, CatBoost) and specialized time-series models from the `aeon` library (e.g., HIVE-COTE v2, MUSE, OrdinalTDE).
-**Grid Search:** Exhaustively search a defined parameter grid.
41
+
-**Random Search:** Randomly sample from the parameter space.
42
+
-**Bayesian Optimization:** Intelligently search the parameter space using `scikit-optimize`.
43
+
-**Configurable Data Pipeline:** A highly modular pipeline allows for fine-grained control over data processing steps:
44
+
-**Feature Selection:** Toggle groups of features (e.g., demographics, blood tests, annotations).
45
+
-**Data Cleaning:** Handle missing values, constant columns, and correlated features.
46
+
-**Resampling:** Address class imbalance with oversampling (RandomOverSampler) or undersampling (RandomUnderSampler).
47
+
-**Scaling:** Apply standard scaling to numeric features.
48
+
-**Automated Results Analysis:** Includes tools to automatically aggregate results from multiple runs and generate insightful plots, such as global parameter importance.
49
+
-**Time-Series Capabilities:** Specialized pipeline mode for handling time-series data, including conversion to the required 3D format for `aeon` classifiers.
50
+
27
51
## Diagrams
28
52
29
53
Below are visual diagrams representing various components of the project. All `.mmd` source files are Mermaid diagrams, and the rendered versions are available in `.svg` or `.png` formats.
@@ -129,6 +153,36 @@ After installation, activate the virtual environment to run your code or noteboo
129
153
* On Unix/Linux/macOS: `source ml_grid_ts_env/bin/activate`
130
154
* On Windows: `.\ml_grid_ts_env\Scripts\activate`
131
155
156
+
### Basic Example
157
+
158
+
The main entry point for running experiments is typically a script or notebook that defines the parameter space and iterates through it. Here is a conceptual example of how to run a single pipeline iteration:
159
+
160
+
```python
161
+
from ml_grid.pipeline.data import pipe
162
+
from ml_grid.util.param_space import parameter_space
163
+
from ml_grid.util.global_params import global_parameters
# The pipeline runs on initialization. Results are logged to files.
185
+
```
132
186
If you are using Jupyter, you can also selectthe kernel created during installation (e.g., `Python (ml_grid_env)`) directly from the Jupyter interface.
133
187
134
188
## Examples
@@ -137,6 +191,8 @@ See [ml_grid/tests/unit_test_synthetic.ipynb]
137
191
138
192
## Documentation
139
193
194
+
The latest documentation is hosted online and can be viewed [here](https://samorahunter.github.io/ml_binary_classification_gridsearch_hyperOpt/).
195
+
140
196
This project uses Sphinx for documentation. The documentation includes usage guides and an auto-generated API reference.
141
197
142
198
To build the documentation locally:
@@ -153,6 +209,25 @@ To build the documentation locally:
153
209
154
210
3. Open `docs/build/index.html`in your web browser to view the documentation.
155
211
212
+
## Project Structure
213
+
214
+
The repository is organized to separate concerns, making it easier to navigate and extend.
215
+
216
+
```
217
+
.
218
+
├── assets/ # Mermaid diagrams and other assets
219
+
├── docs/ # Sphinx documentation source and build files
220
+
├── ml_grid/ # Main source code for the library
221
+
│ ├── model_classes/ # Standard classifier wrappers
0 commit comments