The project explores recommendation modeling using:
- Algebraic SVD
- Surprise SVD
- Surprise NMF
- User similarity analysis in latent space
- Data poisoning attacks on recommender systems
Data_CollabFilter.xlsx— input ratings dataset*.ipynb— Google Colab / Jupyter notebooks used for the analysis*.pdf— exported notebook/report filesREADME.md— project overview and usage instructions
Performed algebraic SVD on the user-item matrix using 2 and 5 latent factors.
For each case:
- computed matrices P, Sigma, and Q
- reconstructed the matrix
- calculated RMSE
- identified the top 3 cells contributing most to RMSE
- compared the improvement from 2 to 5 factors
Used the Surprise package to train SVD recommender models with 2 and 5 latent factors.
For each case:
- extracted latent matrices P and Q
- generated the latent interaction matrix P × Qᵀ
- generated the full predicted rating matrix
- computed:
- RMSE on known ratings
- RMSE on all cells (after filling missing entries with 0)
- generated top 3 recommendations for each user
- compared recommendation differences between 2 and 5 factors
Used the Surprise package’s NMF model with 2 and 5 latent factors.
For each case:
- extracted matrices W and H
- generated the latent interaction matrix W × Hᵀ
- generated the full predicted rating matrix
- computed:
- RMSE on known ratings
- RMSE on all cells
- generated top 3 recommendations for each user
- compared NMF against SVD
Using the 2-factor Surprise SVD result:
- found three users whose top-3 recommendations overlap the most
- extracted their latent user vectors
- computed:
- Euclidean distance
- Cosine similarity
- analyzed which similarity measure better reflects their recommendation overlap
Simulated poisoning of the recommendation system by adding fake users designed to push:
item7 → item8, item9, item10
Two cases were tested:
- adding one fictitious user
- adding three fictitious users
Then:
- retrained the Surprise SVD model
- measured how top recommendations changed
- analyzed how effective the poisoning attack was
- Increasing latent factors improved algebraic SVD reconstruction quality.
- For Surprise SVD, increasing factors from 2 to 5 produced only a small improvement.
- For Surprise NMF, increasing factors from 2 to 5 produced a much larger improvement.
- NMF outperformed SVD in RMSE for this dataset.
- Recommendation overlap can be analyzed using latent user vectors.
- Recommender systems based on matrix factorization are vulnerable to data poisoning, although the attack in this assignment only partially achieved the intended effect.
This project was developed in Google Colab using Python.
Suggested packages:
pip install pandas numpy scikit-learn openpyxl
pip install "numpy<2" scikit-surpriseNote:
scikit-surprisemay require NumPy < 2 in Colab/runtime environments.
- Upload the dataset file
Data_CollabFilter.xlsx - Open the notebook in Google Colab
- Run the cells in order:
- data loading and preprocessing
- Task 1: Algebraic SVD
- Task 2: Surprise SVD
- Task 3: Surprise NMF
- Task 4: Similarity analysis
- Task 5: Poisoning experiment
This project demonstrates:
- matrix factorization for recommender systems
- reconstruction and prediction error analysis
- recommendation generation from latent factor models
- comparison of SVD and NMF methods
- latent-space user similarity analysis
- recommender system robustness and poisoning attacks
Mansurbek Satarov