A comprehensive, hands-on curriculum for mastering Machine Learning with Python
From mathematical foundations to production deployment
Getting Started • Learning Path • Architecture • Technologies
- Why This Project?
- Project Goals
- System Architecture
- Learning Path
- Technology Stack
- Project Timeline
- Getting Started
- Project Structure
- Current Progress
Learning Machine Learning is challenging due to:
| Challenge | Impact | Our Solution |
|---|---|---|
| Fragmented Resources | Learners jump between tutorials without cohesion | Unified, progressive curriculum |
| Theory-Practice Gap | Math concepts don't connect to code | Every concept has implementation |
| No Production Focus | Tutorials don't cover real-world deployment | End-to-end projects with deployment |
| Outdated Content | Many resources use deprecated libraries | Modern stack (PyTorch 2.0+, Python 3.12) |
| Missing Testing | No emphasis on code quality | TDD approach with pytest |
This study guide provides a structured, progressive path from Python basics to deploying production ML systems:
flowchart LR
subgraph Foundation["🔧 Foundation"]
A[NumPy]
B[Pandas]
C[Visualization]
end
subgraph Classical["📊 Classical ML"]
D[Supervised]
E[Unsupervised]
end
subgraph Deep["🧠 Deep Learning"]
F[Neural Nets]
G[CNNs]
H[RNNs]
end
subgraph Applied["🚀 Applied"]
I[NLP]
J[Computer Vision]
K[Projects]
end
Foundation --> Classical
Classical --> Deep
Deep --> Applied
style Foundation fill:#1a1a2e,stroke:#00d4ff,color:#fff
style Classical fill:#1a1a2e,stroke:#00ff88,color:#fff
style Deep fill:#1a1a2e,stroke:#ff6b6b,color:#fff
style Applied fill:#1a1a2e,stroke:#ffd93d,color:#fff
mindmap
root((ML Mastery))
Foundations
NumPy Arrays
Pandas DataFrames
Data Visualization
Statistics
Classical ML
Regression
Classification
Clustering
Dimensionality Reduction
Deep Learning
Neural Networks
CNNs
RNNs/LSTMs
Transformers
Applications
NLP
Computer Vision
Time Series
Deployment
| Metric | Target | Measurement |
|---|---|---|
| Notebooks Completed | 50+ | Interactive Jupyter notebooks |
| Unit Tests | 90%+ coverage | pytest with coverage reports |
| Projects Built | 5+ end-to-end | From data to deployment |
| Code Quality | 100% type-hinted | mypy + pylint passing |
flowchart TB
subgraph Input["📥 Input Layer"]
direction TB
NB["📓 Jupyter Notebooks"]
DATA["📁 Datasets"]
CFG["⚙️ Configs"]
end
subgraph Core["⚙️ Core Processing"]
direction TB
SRC["🐍 Source Code"]
UTILS["🔧 Utilities"]
MODELS["🤖 Models"]
VIZ["📊 Visualization"]
end
subgraph Quality["✅ Quality Assurance"]
direction TB
TESTS["🧪 Tests"]
LINT["📝 Linting"]
DOCS["📚 Documentation"]
end
subgraph Deploy["�� Deployment"]
direction TB
DOCKER["🐳 Docker"]
API["🌐 API"]
end
Input --> Core
Core --> Quality
Quality --> Deploy
style Input fill:#2d3436,stroke:#00cec9,color:#fff
style Core fill:#2d3436,stroke:#6c5ce7,color:#fff
style Quality fill:#2d3436,stroke:#00b894,color:#fff
style Deploy fill:#2d3436,stroke:#e17055,color:#fff
flowchart TD
subgraph Root["📁 python-ML-learn"]
direction TB
subgraph Learning["📚 Learning Modules"]
F01["01-fundamentals/"]
F02["02-supervised-learning/"]
F03["03-unsupervised-learning/"]
F04["04-deep-learning/"]
F05["05-nlp/"]
F06["06-computer-vision/"]
F07["07-projects/"]
end
subgraph Source["💻 Source Code"]
SRC_UTILS["src/utils/"]
SRC_MODELS["src/models/"]
SRC_DATA["src/data_processing/"]
SRC_VIZ["src/visualization/"]
end
subgraph Support["🔧 Support"]
TESTS["tests/"]
DOCS["docs/"]
DOCKER["docker/"]
MEMORY["memory-bank/"]
end
end
style Root fill:#1e272e,stroke:#fff,color:#fff
style Learning fill:#2d3436,stroke:#74b9ff,color:#fff
style Source fill:#2d3436,stroke:#a29bfe,color:#fff
style Support fill:#2d3436,stroke:#55efc4,color:#fff
flowchart LR
subgraph Data["📊 Data Pipeline"]
RAW["Raw Data"]
CLEAN["Cleaned Data"]
FEAT["Features"]
end
subgraph Model["🤖 Model Pipeline"]
TRAIN["Training"]
VAL["Validation"]
TEST["Testing"]
end
subgraph Output["📈 Output"]
PRED["Predictions"]
METRICS["Metrics"]
VIZ["Visualizations"]
end
RAW --> CLEAN
CLEAN --> FEAT
FEAT --> TRAIN
TRAIN --> VAL
VAL --> TEST
TEST --> PRED
TEST --> METRICS
METRICS --> VIZ
style Data fill:#2d3436,stroke:#00cec9,color:#fff
style Model fill:#2d3436,stroke:#6c5ce7,color:#fff
style Output fill:#2d3436,stroke:#fdcb6e,color:#fff
flowchart TB
subgraph P1["Phase 1: Foundation"]
direction LR
P1A["Week 1-2"]
P1B["Infrastructure<br/>& Setup"]
P1A --> P1B
end
subgraph P2["Phase 2: Fundamentals"]
direction LR
P2A["Week 3-5"]
P2B["NumPy, Pandas<br/>Statistics, Viz"]
P2A --> P2B
end
subgraph P3["Phase 3: Supervised"]
direction LR
P3A["Week 6-8"]
P3B["Regression<br/>Classification"]
P3A --> P3B
end
subgraph P4["Phase 4: Unsupervised"]
direction LR
P4A["Week 9-10"]
P4B["Clustering<br/>PCA, t-SNE"]
P4A --> P4B
end
subgraph P5["Phase 5: Deep Learning"]
direction LR
P5A["Week 11-13"]
P5B["Neural Nets<br/>CNN, RNN"]
P5A --> P5B
end
subgraph P6["Phase 6-9: Advanced"]
direction LR
P6A["Week 14-26"]
P6B["NLP, CV<br/>Projects, MLOps"]
P6A --> P6B
end
P1 --> P2 --> P3 --> P4 --> P5 --> P6
style P1 fill:#1e3a5f,stroke:#3498db,color:#fff
style P2 fill:#1e3a5f,stroke:#2ecc71,color:#fff
style P3 fill:#1e3a5f,stroke:#9b59b6,color:#fff
style P4 fill:#1e3a5f,stroke:#e74c3c,color:#fff
style P5 fill:#1e3a5f,stroke:#f39c12,color:#fff
style P6 fill:#1e3a5f,stroke:#1abc9c,color:#fff
📘 Phase 1: Foundation (Weeks 1-2)
| Topic | Description | Deliverable |
|---|---|---|
| Project Structure | Modular src layout | Folder hierarchy |
| Development Environment | VS Code + extensions | .vscode/settings.json |
| Docker Setup | Reproducible environment | Dockerfile, docker-compose.yml |
| Testing Framework | pytest configuration | conftest.py, pytest.ini |
Status: ✅ Complete
📗 Phase 2: Core ML Fundamentals (Weeks 3-5)
| Topic | Key Concepts | Notebook |
|---|---|---|
| NumPy | Arrays, broadcasting, linear algebra | 01_numpy_fundamentals.ipynb |
| Pandas | DataFrames, cleaning, aggregation | 02_pandas_data_manipulation.ipynb |
| Visualization | matplotlib, seaborn, plotly | 03_data_visualization.ipynb |
| Statistics | Distributions, hypothesis testing | 04_statistics_for_ml.ipynb |
| Scikit-learn Intro | Pipelines, preprocessing, models | 05_sklearn_introduction.ipynb |
Status: ✅ Complete (5 notebooks, 114 tests)
📙 Phase 3: Supervised Learning (Weeks 6-8)
| Algorithm | Mathematical Foundation | Implementation |
|---|---|---|
| Linear Regression | From scratch + sklearn | |
| Logistic Regression | Sigmoid, cross-entropy | Binary & multiclass |
| Decision Trees | Gini impurity, entropy | Visualization included |
| Random Forests | Bagging, feature importance | Hyperparameter tuning |
| SVM | Kernel trick, margin maximization | Multiple kernels |
| Gradient Boosting | Sequential ensembles | XGBoost, LightGBM |
Status: ✅ Complete (5 notebooks, 15 tests)
📕 Phase 4: Unsupervised Learning (Weeks 9-10)
| Algorithm | Purpose | Implementation |
|---|---|---|
| K-Means | Centroid-based clustering | From scratch + sklearn |
| Hierarchical | Agglomerative clustering | Dendrograms |
| DBSCAN | Density-based clustering | Parameter tuning |
| PCA | Dimensionality reduction | From scratch + sklearn |
| t-SNE | Visualization | Perplexity tuning |
| Anomaly Detection | Outlier detection | Isolation Forest, LOF, One-Class SVM |
Status: ✅ Complete (3 notebooks, 37 tests)
📕 Phase 5-9: Advanced Topics (Weeks 11-26)
| Phase | Topics | Hours |
|---|---|---|
| 5. Deep Learning | Neural nets, CNN, RNN, PyTorch | 90 |
| 6. NLP | Embeddings, BERT, Transformers | 70 |
| 7. Computer Vision | Object detection, segmentation | 70 |
| 8. Projects | End-to-end ML systems | 100+ |
| 9. MLOps | Deployment, monitoring, CI/CD | 40 |
flowchart TB
subgraph Languages["🐍 Languages & Runtime"]
PY["Python 3.12+"]
JUP["Jupyter"]
end
subgraph DataScience["📊 Data Science"]
NP["NumPy"]
PD["Pandas"]
SP["SciPy"]
end
subgraph Visualization["📈 Visualization"]
MPL["Matplotlib"]
SNS["Seaborn"]
PLT["Plotly"]
end
subgraph ML["🤖 Machine Learning"]
SK["scikit-learn"]
XG["XGBoost"]
LG["LightGBM"]
end
subgraph DL["🧠 Deep Learning"]
PT["PyTorch"]
TF["TensorFlow"]
HF["Transformers"]
end
subgraph DevOps["🔧 DevOps"]
DOC["Docker"]
GIT["Git"]
TEST["pytest"]
end
Languages --> DataScience
Languages --> Visualization
DataScience --> ML
ML --> DL
DL --> DevOps
style Languages fill:#2c3e50,stroke:#3498db,color:#fff
style DataScience fill:#2c3e50,stroke:#2ecc71,color:#fff
style Visualization fill:#2c3e50,stroke:#9b59b6,color:#fff
style ML fill:#2c3e50,stroke:#e74c3c,color:#fff
style DL fill:#2c3e50,stroke:#f39c12,color:#fff
style DevOps fill:#2c3e50,stroke:#1abc9c,color:#fff
| Technology | Version | Purpose | Why Chosen |
|---|---|---|---|
| Python | 3.12+ | Core language | Industry standard, rich ecosystem |
| NumPy | 2.4+ | Numerical computing | 10-100x faster than pure Python, vectorization |
| Pandas | 2.0+ | Data manipulation | Intuitive DataFrame API, SQL-like operations |
| scikit-learn | 1.3+ | Classical ML | Consistent API, comprehensive algorithms |
| PyTorch | 2.0+ | Deep learning | Dynamic graphs, Pythonic, research-friendly |
| TensorFlow | 2.13+ | Deep learning | Production-ready, TensorBoard, Keras API |
| Matplotlib | 3.7+ | Plotting | Highly customizable, publication quality |
| Seaborn | 0.12+ | Statistical viz | Beautiful defaults, statistical plots |
| Docker | Latest | Containerization | Reproducible environments |
| pytest | 7.4+ | Testing | Simple syntax, powerful fixtures |
flowchart LR
subgraph NumPy["NumPy Ecosystem"]
ARR["ndarray<br/>N-dimensional arrays"]
UFUNC["ufuncs<br/>Element-wise ops"]
LINALG["linalg<br/>Matrix operations"]
RAND["random<br/>Statistical sampling"]
end
subgraph Benefits["Why NumPy?"]
SPEED["⚡ 10-100x Faster"]
MEM["💾 Memory Efficient"]
BROAD["📡 Broadcasting"]
INTER["🔗 Interoperability"]
end
NumPy --> Benefits
style NumPy fill:#2c3e50,stroke:#013243,color:#fff
style Benefits fill:#2c3e50,stroke:#4dabf7,color:#fff
Definition: NumPy is the fundamental package for scientific computing in Python.
Motivation: Python lists are slow for numerical operations. NumPy provides:
- Contiguous memory allocation
- Vectorized operations (no Python loops)
- C-level execution speed
Mechanism:
# Python list (slow)
result = [x ** 2 for x in range(1000000)] # ~200ms
# NumPy array (fast)
arr = np.arange(1000000)
result = arr ** 2 # ~2ms (100x faster!)Impact: Enables processing of large datasets that would be impractical with pure Python.
gantt
title ML Study Guide - 26 Week Timeline
dateFormat YYYY-MM-DD
section Phase 1
Infrastructure Setup :done, p1, 2025-12-16, 2w
section Phase 2
NumPy Fundamentals :done, p2a, after p1, 3d
Pandas & Data :active, p2b, after p2a, 1w
Visualization : p2c, after p2b, 5d
Statistics : p2d, after p2c, 4d
Feature Engineering : p2e, after p2d, 5d
section Phase 3
Linear Regression : p3a, after p2e, 5d
Logistic Regression : p3b, after p3a, 5d
Decision Trees : p3c, after p3b, 6d
SVM & Boosting : p3d, after p3c, 1w
section Phase 4
Clustering : p4a, after p3d, 1w
Dimensionality Reduction: p4b, after p4a, 1w
section Phase 5
Neural Networks : p5a, after p4b, 2w
CNN & RNN : p5b, after p5a, 2w
section Phase 6-9
NLP : p6, after p5b, 3w
Computer Vision : p7, after p6, 3w
Projects : p8, after p7, 4w
MLOps : p9, after p8, 2w
| Milestone | Target | Status | Progress |
|---|---|---|---|
| M1: Infrastructure | Week 2 | ✅ Complete | ████████████ 100% |
| M2: Fundamentals | Week 5 | ✅ Complete | ████████████ 100% |
| M3: Supervised | Week 8 | ✅ Complete | ████████████ 100% |
| M4: Unsupervised | Week 10 | ✅ Complete | ████████████ 100% |
| M5: Deep Learning | Week 13 | ✅ Complete | ████████████ 100% |
| M6: NLP | Week 16 | ✅ Complete | ████████████ 100% |
| M7: Computer Vision | Week 19 | ✅ Complete | ████████████ 100% |
| M8: Projects | Week 24 | ✅ Complete | ████████████ 100% |
| M9: MLOps | Week 26 | ⭕ Not Started | ░░░░░░░░░░░░ 0% |
| Requirement | Version | Check Command |
|---|---|---|
| Python | 3.8+ | python --version |
| pip | Latest | pip --version |
| Git | Latest | git --version |
| Docker (optional) | Latest | docker --version |
# 1. Clone the repository
git clone https://github.com/yourusername/python-ML-learn.git
cd python-ML-learn
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Start Jupyter Lab
jupyter lab# Build and run with Docker Compose
cd docker
docker-compose up -d
# Access Jupyter Lab at http://localhost:8888# Run tests to verify setup
python -m pytest tests/ -v
# Expected output: All tests passingpython-ML-learn/
├── 📓 01-fundamentals/ # NumPy, Pandas, Visualization (5 notebooks)
│ ├── 01_numpy_fundamentals.ipynb
│ ├── 02_pandas_data_manipulation.ipynb
│ ├── 03_data_visualization.ipynb
│ ├── 04_statistics_for_ml.ipynb
│ └── 05_sklearn_introduction.ipynb
├── 📓 02-supervised-learning/ # Regression, Classification (5 notebooks)
│ ├── 01_linear_regression.ipynb
│ ├── 02_logistic_regression.ipynb
│ ├── 03_decision_trees_random_forests.ipynb
│ ├── 04_svm.ipynb
│ └── 05_gradient_boosting.ipynb
├── 📓 03-unsupervised-learning/ # Clustering, PCA (3 notebooks)
│ ├── 01_clustering.ipynb
│ ├── 02_dimensionality_reduction.ipynb
│ └── 03_anomaly_detection.ipynb
├── 📓 04-deep-learning/ # Neural Networks, CNN, RNN (5 notebooks)
│ ├── 01_neural_network_fundamentals.ipynb
│ ├── 02_pytorch_introduction.ipynb
│ ├── 03_convolutional_neural_networks.ipynb
│ ├── 04_recurrent_neural_networks.ipynb
│ └── 05_training_techniques.ipynb
├── 📓 05-nlp/ # Text Processing, Transformers (5 notebooks)
│ ├── 01_text_preprocessing.ipynb
│ ├── 02_text_vectorization.ipynb
│ ├── 03_word_embeddings.ipynb
│ ├── 04_text_classification.ipynb
│ └── 05_transformers_introduction.ipynb
├── 📓 06-computer-vision/ # Object Detection, Segmentation (5 notebooks)
│ ├── 01_image_fundamentals.ipynb
│ ├── 02_cnn_architectures.ipynb
│ ├── 03_transfer_learning.ipynb
│ ├── 04_object_detection.ipynb
│ └── 05_image_segmentation.ipynb
├── 📓 07-projects/ # End-to-End Projects (5 notebooks)
│ ├── 01_house_price_prediction.ipynb
│ ├── 02_customer_churn_prediction.ipynb
│ ├── 03_image_classification_app.ipynb
│ ├── 04_sentiment_analysis_pipeline.ipynb
│ └── 05_recommendation_system.ipynb
├── 📓 08-mlops/ # MLOps & Production (5 notebooks)
│ ├── 01_model_serving_fastapi.ipynb
│ ├── 02_docker_containerization.ipynb
│ ├── 03_experiment_tracking.ipynb
│ ├── 04_cicd_pipelines.ipynb
│ └── 05_model_monitoring.ipynb
│
├── 💻 src/ # Source Code
│ ├── utils/ # Utility functions
│ │ ├── timer.py # Performance timing
│ │ ├── numpy_helpers.py # NumPy utilities
│ │ ├── pandas_helpers.py # Pandas utilities
│ │ ├── stats_helpers.py # Statistical functions
│ │ ├── sklearn_helpers.py # Scikit-learn utilities
│ │ └── visualization_helpers.py # Plotting utilities
│ ├── ml_core/ # ML helper modules
│ │ ├── supervised.py # Supervised learning helpers
│ │ ├── unsupervised.py # Unsupervised learning helpers
│ │ ├── deep_learning.py # Deep learning helpers
│ │ ├── nlp.py # NLP helpers
│ │ └── computer_vision.py # Computer vision helpers
│ ├── models/ # ML model implementations
│ ├── data_processing/ # Data pipelines
│ └── visualization/ # Plotting utilities
│
├── 🧪 tests/ # Test Suite
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
│
├── 📚 docs/ # Documentation
│ └── project-plan.md # Detailed project plan
│
├── 🗃️ memory-bank/ # Project Memory
│ ├── change-log.md # Version history
│ └── architecture-decisions/ # ADRs
│
├── 🐳 docker/ # Docker Configuration
│ ├── Dockerfile
│ └── docker-compose.yml
│
├── 📊 data/ # Datasets
│ ├── raw/ # Original data
│ └── processed/ # Cleaned data
│
├── ⚙️ configs/ # Configuration files
├── 📜 requirements.txt # Python dependencies
└── 📖 README.md # This file
All 9 phases of the Machine Learning curriculum have been completed!
pie title Project Completion by Phase
"Phase 1 - Infrastructure" : 100
"Phase 2 - Fundamentals" : 100
"Phase 3 - Supervised" : 100
"Phase 4 - Unsupervised" : 100
"Phase 5 - Deep Learning" : 100
"Phase 6 - NLP" : 100
"Phase 7 - Computer Vision" : 100
"Phase 8 - Projects" : 100
"Phase 9 - MLOps" : 100
| Module | Tests | Coverage | Status |
|---|---|---|---|
utils/timer.py |
14 | 95% | ✅ |
utils/numpy_helpers.py |
24 | 100% | ✅ |
utils/pandas_helpers.py |
21 | 100% | ✅ |
utils/stats_helpers.py |
33 | 100% | ✅ |
utils/sklearn_helpers.py |
29 | 100% | ✅ |
utils/visualization_helpers.py |
31 | 100% | ✅ |
ml_core/supervised.py |
15 | 100% | ✅ |
ml_core/unsupervised.py |
37 | 100% | ✅ |
ml_core/deep_learning.py |
42 | 100% | ✅ |
ml_core/nlp.py |
53 | 100% | ✅ |
ml_core/computer_vision.py |
63 | 100% | ✅ |
Total Tests: 362 passing ✅
| Date | Version | Changes |
|---|---|---|
| 2025-12-22 | v2.0.0 | 🎉 Phase 9: MLOps & Production (model serving, Docker, CI/CD, monitoring) |
| 2025-12-22 | v1.12.0 | Phase 8: End-to-End Projects (5 comprehensive ML projects) |
| 2025-07-09 | v1.11.0 | Phase 7: Computer Vision (image fundamentals, CNN, detection, segmentation) |
| 2025-07-08 | v1.10.0 | Phase 6: NLP (text preprocessing, embeddings, transformers) |
| 2025-07-08 | v1.9.0 | Phase 5: Deep learning (PyTorch, CNN, RNN, training techniques) |
| 2025-07-08 | v1.8.0 | Phase 4: Unsupervised learning (clustering, PCA, anomaly detection) |
| 2025-07-08 | v1.7.0 | Phase 3: Supervised learning (regression, classification, SVM, boosting) |
| 2025-07-08 | v1.6.0 | Phase 2: Fundamentals complete (5 notebooks, helper modules) |
| 2025-07-08 | v1.0.0 | Initial project structure, Docker setup |
- 📐 Understand the Math: Don't skip mathematical intuition
- 💻 Code from Scratch: Implement algorithms before using libraries
- 📊 Visualize Everything: Use plots to understand behavior
- �� Read Comments: Code is heavily documented
- 🔁 Practice Daily: Consistency is key
- 🧪 Write Tests: Verify your implementations
flowchart LR
subgraph Daily["Daily (2-3 hours)"]
D1["📖 Theory<br/>30 min"]
D2["💻 Coding<br/>90 min"]
D3["📝 Review<br/>30 min"]
end
subgraph Weekly["Weekly"]
W1["📓 1-2 Notebooks"]
W2["🧪 Unit Tests"]
W3["📊 Mini Project"]
end
Daily --> Weekly
style Daily fill:#2c3e50,stroke:#3498db,color:#fff
style Weekly fill:#2c3e50,stroke:#2ecc71,color:#fff
| Resource | Link | Description |
|---|---|---|
| NumPy | numpy.org | Array computing |
| Pandas | pandas.pydata.org | Data analysis |
| scikit-learn | scikit-learn.org | Machine learning |
| PyTorch | pytorch.org | Deep learning |
| TensorFlow | tensorflow.org | Deep learning |
- �� Kaggle Learn - Free micro-courses
- 📊 Papers With Code - Research implementations
- 🎥 3Blue1Brown - Visual math explanations
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Write tests for new code
- Submit a pull request
MIT License - Feel free to use for personal learning.
Made with ❤️ for Machine Learning Enthusiasts
⭐ Star this repo if you find it helpful!