37 lines (23 loc) · 1.75 KB

Data Mining

Project Structure

The repository is divided into three main sections:

Classification - Predicting income levels using various machine learning models while ensuring fairness and high predictive accuracy.
Clustering - Exploring clustering techniques on a dataset of 2500 news articles categorize the articles based on their content.
Pattern Mining - Analyzing patterns in a dataset concerning income levels to uncover socio-economic characteristics influencing male and female working conditions.

Each section includes a detailed report as a PDF, source code in Python, and datasets used for the analyses.

Reports

1. Classification

Objective: Evaluate and compare different machine learning models in terms of accuracy and fairness.
Key Models Used: Decision trees, KNN, random forest, and ensemble methods.
Main Findings: Identification of the best model that balances fairness with predictive accuracy.

View the Classification Report

2. Clustering

Objective: Apply clustering methods to categorize news articles into distinct groups.
Techniques Used: KMeans, DBSCAN, and various dimensionality reduction methods.
Main Findings: Effective categorization of articles into coherent groups that reflect their content.

View the Clustering Report

3. Pattern Mining

Objective: Identify patterns that distinguish between different socio-economic groups.
Approach: Data preprocessing and analysis using pattern mining techniques.
Main Findings: Insights into the socio-economic characteristics that differentiate male and female working conditions.

View the Pattern Mining Report