List of what could be done

Tagged with complexity and sorted by priority

EASY - Define type for data input in mydatapreprocessing, import and use
EASY - In predict multiple change plot name to predicted column
EASY - Test on some larger files (Use warning if big data and hard settings)
COMPLEX - Define some set of datasets (good for prediction) and evaluate results on commit with tag. save results into csv and ate KPI - Add data to default compare models
COMPLEX - Create class Model. Configure models in its own settings (be able to configure various models in various way). Use instances of models (multiple concrete settings) Add save and load model method
EASY - Ability to load and save config to json to folder (mostly for GUI)
COMPLEX - Add exogenous parameters where possible
COMPLEX - Dimensionality reduction - E.G. autoencoder and PCA to shrink input multivariate data to small vectors. Choose what to compress (not predicted column).
COMPLEX - Rewrite GUI with mypythontools.pyvueeel
COMPLEX - Feature selection - Choose columns not on correlation, but based on error lowerage with simple model (neural net (better) as well as regression (faster)) E.g. https://towardsdatascience.com/deep-dive-into-catboost-functionalities-for-model-interpretation-7cdef669aeed, https://scikit-learn.org/stable/modules/feature_selection.html, https://towardsdatascience.com/feature-selection-with-pandas-e3690ad8504b
COMPLEX - Segmentation for input data subselection - Try TICC method or Matrix profile method or something new
MEDIUM - Ensamble learning - bagging - Best models average as one new model
COMPLEX - Analyze output (residuals - mean, std etc. in comparison with original data)
MEDIUM - Add residuals matrix from results_matrix and add plotly box plot for residuals of models in main.py (https://plot.ly/python/box-plots/), or box plot for error criterion (data lengths and for repetitions) on compare_models
MEDIUM - Add transformations results as next input in parallel (remove replacement as in diff transformation)
- Add datetime transformations - is workday, is holiday, day, night etc...
- Hilbert huang transformation (pyhht) - predict each part separately and then sum to have result
COMPLEX Incremental learning (In sklearn partial_fit). Solve how to cope with statsmodels.
MEDIUM - Be able to use validation mode in predict to get real error criteria for compare
MEDIUM - In sklearn model - auto select model (use pipeline?)
EASY - In optimization, compare models on more results to reduce chance.
MEDIUM - Fast boosted hyperparametrs optimization method option - (optimize each parameter separately - then again...)
COMPLEX - New models
- LigthGBM
- Prophet
- CatBoostClassifierm CatBoostRegressor
- HONU
- statsmodels.tsa.vector_ar.var_model import VAR
- Random walk with same std and mean and trend as very few last data
- Add some boost method (lgboost library..., own adaboost for multivariate data)
- ETS method
- Bayes classification - sklearn
COMPLEX - Improve / edit existing models
- - Add stochastic gradient method to autoreg (adam or adagrad)
MEDIUM - Add scientific documentation for own models into docstrings (equations and diagrams) for other models add links
MEDIUM - Try multiprocessing.manager
EASY - Try dataframe data input for statsmodels - date as index
EASY - Test whether plotly is slower than matplotlib and document it in plots docstrings.
EASY - Store tensorflow mlp and lstm separately and by default.
EASY - Remove unnecessary copies (only data_for_predictions_df and using .values) and check if input data stays the same after all models computations - the same in preprocessing - use new inplace param
EASY - Dask for big data (dask.np.aray and dask dataframe) - chunk datasource (optionally in settings)
EASY - Test Numba optimization (probably just own models - LNU autoreg model and conjugate grad)
EASY - Check if can some lists replace with sets (faster)
Rewrite logging settings into config method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of what could be done

FilesExpand file tree

TODO.md

Latest commit

History

TODO.md

File metadata and controls

List of what could be done