1+ graph TB
2+ Start ([ Grid Search CV Initialization]) --> Init [ Initialize Parameters<br />- Algorithm<br />- Parameter Space<br />- CV Strategy<br />- Global Config]
3+
4+ Init --> DataPrep [ Data Preparation]
5+
6+ DataPrep --> CheckDF { X_train is<br />DataFrame?}
7+ CheckDF --> | No | ConvertDF [ Convert to DataFrame]
8+ CheckDF --> | Yes | CheckSeries { y_train is<br />Series?}
9+ ConvertDF --> CheckSeries
10+
11+ CheckSeries --> | No | ConvertSeries [ Convert to Series<br />Align with X_train index]
12+ CheckSeries --> | Yes | SetCategory [ Set y_train as category<br />Name = 'outcome']
13+ ConvertSeries --> SetCategory
14+
15+ SetCategory --> ModelCheck { Model Type<br />Detection}
16+
17+ ModelCheck --> | GPU Model | GPUConfig [ Configure GPU<br />n_jobs=1<br />TF Memory Growth]
18+ ModelCheck --> | SVC | ScaleData [ Apply StandardScaler]
19+ ModelCheck --> | KNN/SimBSig | AdjustKNN [ Adjust n_neighbors<br />for small datasets]
20+ ModelCheck --> | CatBoost | CheckSize { Dataset<br />Size OK?}
21+ ModelCheck --> | Other | CVSetup
22+
23+ CheckSize --> | Too Small | ReturnDefault [ Return Default Score 0.5]
24+ CheckSize --> | OK | AdjustCatBoost [ Adjust subsample/rsm<br />parameters]
25+
26+ GPUConfig --> CVSetup [ CV Strategy Setup]
27+ ScaleData --> CVSetup
28+ AdjustKNN --> CVSetup
29+ AdjustCatBoost --> CVSetup
30+
31+ CVSetup --> TestMode { Test Mode<br />Enabled?}
32+ TestMode --> | Yes | FastCV [ KFold n_splits=2]
33+ TestMode --> | No | ProductionCV [ RepeatedKFold<br />n_splits=2, n_repeats=2]
34+
35+ FastCV --> ParamValidation
36+ ProductionCV --> ParamValidation
37+
38+ ParamValidation [ Parameter Validation] --> BayesCheck { Bayesian<br />Search?}
39+
40+ BayesCheck --> | Yes | WrapCategorical [ Wrap lists in<br />Categorical for skopt]
41+ BayesCheck --> | No | ValidateParams [ Validate parameters<br />against estimator]
42+
43+ WrapCategorical --> ConfigNIter
44+ ValidateParams --> ConfigNIter
45+
46+ ConfigNIter [ Configure n_iter] --> LocalOverride { Local<br />Override?}
47+ LocalOverride --> | Yes | UseLocal [ Use local n_iter]
48+ LocalOverride --> | No | UseGlobal [ Use global n_iter]
49+
50+ UseLocal --> CapIter { Exceeds<br />max_iter?}
51+ UseGlobal --> CapIter
52+ CapIter --> | Yes | CapValue [ Cap to max value]
53+ CapIter --> | No | Search
54+ CapValue --> Search
55+
56+ Search [ HyperparameterSearch<br />Instantiation] --> ResetIndices [ Reset DataFrame indices<br />to integer-based]
57+
58+ ResetIndices --> IndexCheck { Index<br />Aligned?}
59+ IndexCheck --> | No | RaiseError [ Raise AssertionError]
60+ IndexCheck --> | Yes | RunSearch [ search.run_search]
61+
62+ RunSearch --> SearchError { Search<br />Error?}
63+ SearchError --> | SVC Dual Coef | SVCDefault [ Return default 0.5]
64+ SearchError --> | Other Error | LogRaise [ Log error & re-raise]
65+ SearchError --> | Success | TestModeCheck2 { Test Mode?}
66+
67+ TestModeCheck2 --> | Yes | SkipCV [ Skip final CV<br />Return 0.5]
68+ TestModeCheck2 --> | No | CheckClasses { Classes >= 2?}
69+
70+ CheckClasses --> | No | RaiseValueError [ Raise ValueError<br />AUC not defined]
71+ CheckClasses --> | Yes | H2OCheck { H2O or<br />Keras Model?}
72+
73+ H2OCheck --> | Yes | SingleThread [ Set n_jobs=1<br />for CV]
74+ H2OCheck --> | No | MultiThread [ Use grid_n_jobs]
75+
76+ SingleThread --> CheckCache { Can reuse<br />cached CV<br />results?}
77+ MultiThread --> CheckCache
78+
79+ CheckCache --> | Yes & Not Forced | ExtractCache [ Extract scores from<br />cv_results_]
80+ CheckCache --> | No or Forced | FreshCV [ Run fresh<br />cross_validate]
81+
82+ ExtractCache --> CacheError { Extraction<br />Error?}
83+ CacheError --> | Yes | FreshCV
84+ CacheError --> | No | ProcessScores
85+
86+ FreshCV --> CVType { Model<br />Type?}
87+ CVType --> | Keras | KerasCV [ Internal CV handling<br />in fit method]
88+ CVType --> | Other | StandardCV [ cross_validate with<br />multiple metrics]
89+
90+ KerasCV --> CVErrors { CV<br />Errors?}
91+ StandardCV --> CVErrors
92+
93+ CVErrors --> | XGBoost GPU Error | FallbackCPU [ Fallback to CPU<br />tree_method='hist']
94+ CVErrors --> | AdaBoost Poor | AdaBoostDefault [ Use default scores]
95+ CVErrors --> | H2O RuntimeError | H2ODefault [ Use default scores]
96+ CVErrors --> | Other Error | GenericDefault [ Use default scores<br />Log error]
97+ CVErrors --> | Success | ProcessScores [ Process Scores]
98+
99+ FallbackCPU --> Retry [ Retry cross_validate]
100+ Retry --> RetryError { Retry<br />Error?}
101+ RetryError --> | Yes | GenericDefault
102+ RetryError --> | No | ProcessScores
103+
104+ ProcessScores --> TimeCheck { CV time ><br />threshold?}
105+ TimeCheck --> | Yes | WarnSlow [ Warn about slow CV]
106+ TimeCheck --> | No | LogTime [ Log CV completion time]
107+
108+ WarnSlow --> Predict
109+ LogTime --> Predict
110+
111+ Predict [ Predict on X_test] --> UpdateLog { Score logging<br />enabled?}
112+
113+ UpdateLog --> | Yes | SaveScores [ Update score log with:<br />- CV scores<br />- predictions<br />- best estimator<br />- timing info]
114+ UpdateLog --> | No | WarnNoLog [ Warn: no logging]
115+
116+ SaveScores --> CalcAUC [ Calculate final AUC<br />on test set]
117+ WarnNoLog --> CalcAUC
118+
119+ CalcAUC --> H2OCleanup { H2O<br />Model?}
120+ H2OCleanup --> | Yes | LeaveRunning [ Leave H2O cluster running<br />for next model]
121+ H2OCleanup --> | No | End
122+
123+ LeaveRunning --> End ([ Return AUC Score])
124+
125+ SVCDefault --> End
126+ SkipCV --> H2OCleanup
127+ ReturnDefault --> End
128+ RaiseError --> End
129+ LogRaise --> End
130+ RaiseValueError --> End
131+ AdaBoostDefault --> CalcAUC
132+ H2ODefault --> CalcAUC
133+ GenericDefault --> CalcAUC
134+
135+ style Start fill :#e1f5e1
136+ style End fill :#ffe1e1
137+ style SearchError fill :#fff3cd
138+ style CVErrors fill :#fff3cd
139+ style TestMode fill :#d1ecf1
140+ style TestModeCheck2 fill :#d1ecf1
141+ style H2OCheck fill :#f8d7da
142+ style BayesCheck fill :#d1ecf1
143+ style CheckCache fill :#d4edda
0 commit comments