Intermediate Data Analysis Course in Python

This course introduces a few techniques of supervised machine learning as a first extension from classical statistics and present them using a simple but effective library: scikit-learn.

While I choose this library mostly for didactic purposes (easyness in the installation, excellent documentation) its usage should be considered for small or exploratory analysis before moving to more powerful (and complex) libraries like Tensorflow, Keras, Pytorch etc.

The other focus I would like to underline is the different approach that Machine Learning has respect to classical statistics: the creation of models aimed at predicting or classifying unknown data from a known dataset. Iterative algorithms and regularization are oftentimes neglected in basic courses, as well as the concept of overfit.

Finally my aim is to present simple but effective algorithms which should be evaluated as an efficient alternative to more powerful and expensive ones

Summary

manipulate large amount of data: Polars, DataFusion, Spark this introductory module does not present machine learning algrithms, but introduces three libraries which can effectively support the analysis of large datasets notebook
linear regression for large data, L1 and L2 constrained convergency the aim here is to introduce the concept of regularization as a way to avoid overfitting. notebook
Logistic regression this is a very simple yet powerful classification algorithm. It can be seen as an ancestor of more complex structures like neural networks notebook
Decision Trees for classification and regression due to limitation of the library categorical features are transformed into continuous. notebook
K Nearest Neighbours this classification algorithm allows for better boundaries respect to the decision trees with a cost in the computation notebook
Singular Value Decomposition, Principal Component Analysis this module focus on feature dimensionality reduction, which is oftentime critical for many algorithms notebook
Introduction to neural network: perceptrons This is an introduction to forward perceptrons which contains basic information about these simple networks; the aim is to introduce basic concepts notebook
Autoencoders These algorithms, based on a symmetrical pair of peceptron networks (encoder and decoder) were originally used for feature reduction in early image analysis research but proved to be effective joining phisical models with measurements notebook

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cancer.parquet		cancer.parquet
housing.pkl		housing.pkl
intermediate_unit1_large_data.ipynb		intermediate_unit1_large_data.ipynb
intermediate_unit2_linear_regression.ipynb		intermediate_unit2_linear_regression.ipynb
intermediate_unit3_logistic_regression.ipynb		intermediate_unit3_logistic_regression.ipynb
intermediate_unit4_decision_trees.ipynb		intermediate_unit4_decision_trees.ipynb
intermediate_unit5_nearest_neighbor.ipynb		intermediate_unit5_nearest_neighbor.ipynb
intermediate_unit6_svd_pca.ipynb		intermediate_unit6_svd_pca.ipynb
intermediate_unit7_neural_net.ipynb		intermediate_unit7_neural_net.ipynb
intermediate_unit8_autoencoders.ipynb		intermediate_unit8_autoencoders.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intermediate Data Analysis Course in Python

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Intermediate Data Analysis Course in Python

Summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages