Skip to content

noiseOnTheNet/intermediate_data_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intermediate Data Analysis Course in Python

This course introduces a few techniques of supervised machine learning as a first extension from classical statistics and present them using a simple but effective library: scikit-learn.

While I choose this library mostly for didactic purposes (easyness in the installation, excellent documentation) its usage should be considered for small or exploratory analysis before moving to more powerful (and complex) libraries like Tensorflow, Keras, Pytorch etc.

The other focus I would like to underline is the different approach that Machine Learning has respect to classical statistics: the creation of models aimed at predicting or classifying unknown data from a known dataset. Iterative algorithms and regularization are oftentimes neglected in basic courses, as well as the concept of overfit.

Finally my aim is to present simple but effective algorithms which should be evaluated as an efficient alternative to more powerful and expensive ones

Summary

  1. manipulate large amount of data: Polars, DataFusion, Spark this introductory module does not present machine learning algrithms, but introduces three libraries which can effectively support the analysis of large datasets notebook
  2. linear regression for large data, L1 and L2 constrained convergency the aim here is to introduce the concept of regularization as a way to avoid overfitting. notebook
  3. Logistic regression this is a very simple yet powerful classification algorithm. It can be seen as an ancestor of more complex structures like neural networks notebook
  4. Decision Trees for classification and regression due to limitation of the library categorical features are transformed into continuous. notebook
  5. K Nearest Neighbours this classification algorithm allows for better boundaries respect to the decision trees with a cost in the computation notebook
  6. Singular Value Decomposition, Principal Component Analysis this module focus on feature dimensionality reduction, which is oftentime critical for many algorithms notebook
  7. Introduction to neural network: perceptrons This is an introduction to forward perceptrons which contains basic information about these simple networks; the aim is to introduce basic concepts notebook
  8. Autoencoders These algorithms, based on a symmetrical pair of peceptron networks (encoder and decoder) were originally used for feature reduction in early image analysis research but proved to be effective joining phisical models with measurements notebook

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors