-
Notifications
You must be signed in to change notification settings - Fork 21
New Workflow: LDA then XGBoost #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 3 commits
15f0e9c
0388991
8df4a91
f6d48dc
ac82df1
493c343
2961a67
224d45b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -255,6 +255,50 @@ def print_summary(self, result): | |
| logger.opt(raw=True).info("\n") | ||
|
|
||
|
|
||
| class PyProphetMultiLearner(PyProphetRunner): | ||
|
jcharkow marked this conversation as resolved.
|
||
| """ | ||
| Implements the learning and scoring workflow for PyProphet with multiple classifiers run sequentially | ||
| """ | ||
|
|
||
| def run_algo(self, part=None): | ||
| """ | ||
| Runs the learning and scoring algorithm for multiple classifiers. | ||
|
|
||
| Returns: | ||
| tuple: A tuple containing the result, scorer, and weights. | ||
| """ | ||
| if self.glyco: | ||
| raise click.ClickException( | ||
| "Multi-classifier learning is not supported for glycopeptide workflows." | ||
| ) | ||
| else: | ||
| config_lda = self.config.copy() | ||
| config_lda.runner.classifier = "LDA" | ||
|
|
||
| # remove columns that are not needed for LDA | ||
| table_lda = self.table.drop(columns=["var_precursor_charge", "var_product_charge", "var_transition_count"], errors='ignore') | ||
|
|
||
| (result_lda, scorer_lda, weights_lda) = PyProphet(config_lda).learn_and_apply(table_lda) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this run the full learning and scoring on the ss num iters and xval num iters, and then do a second pass with XGBoost with the same data for ss num iters and xval num iters? I am wondering if this results in any over fitting?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it overfits however, it might be unnecessary to do that many iterations. |
||
| self.table['main_var_lda_score'] = result_lda.scored_tables['d_score'] | ||
|
jcharkow marked this conversation as resolved.
Outdated
|
||
|
|
||
| logger.info("LDA scores computed! Now running XGBoost using the LDA score as the main score") | ||
|
|
||
| # rename the column that was the main score | ||
| found = False | ||
| for col in self.table.columns: | ||
| if col.startswith("main") and not found: | ||
| self.table = self.table.rename(columns={col:col[5:]}) | ||
| found = True | ||
|
jcharkow marked this conversation as resolved.
Outdated
|
||
|
|
||
| config_xgb = self.config.copy() | ||
| config_xgb.runner.ss_main_score = 'var_lda_score' # use lda score as the main score for XGBoost | ||
| config_xgb.runner.classifier = "XGBoost" | ||
| config_xgb.runner.ss_use_dynamic_main_score = False # since using lda score do not ned to dynamically select the main score | ||
| self.config.runner.classifier = "XGBoost" # need to change to XGBoost for saving the weights | ||
|
|
||
| (result_xgb, scorer_xgb, weights_xgb) = PyProphet(config_xgb).learn_and_apply(self.table) | ||
| return (result_xgb, scorer_xgb, weights_xgb) | ||
|
|
||
| class PyProphetLearner(PyProphetRunner): | ||
| """ | ||
| Implements the learning and scoring workflow for PyProphet. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| feature_id ms1_precursor_pep ms2_peakgroup_pep ms2_precursor_pep | ||
| 0 -9078977811506172301 0.0063 0.0022 0.0025 | ||
| 1 -9009602369958523731 0.0063 0.0022 0.0325 | ||
| 2 -8990894093332793487 0.0063 0.0022 0.0025 | ||
| 3 -8915955323477460297 0.0063 0.0022 0.0071 | ||
| 4 -8858715981476206597 0.0063 0.0022 0.0025 | ||
| .. ... ... ... ... | ||
| 95 -2912234918591861719 0.0063 0.0022 0.0025 | ||
| 96 -2872329084347808160 0.0063 0.0022 0.0025 | ||
| 97 -2789098353857361973 1.0000 0.0022 0.0025 | ||
| 98 -2788620575140019858 0.0063 0.0022 0.0025 | ||
| 99 -2741276427609241638 0.0063 0.0022 0.0325 | ||
|
|
||
| [100 rows x 4 columns] |
Uh oh!
There was an error while loading. Please reload this page.