Skip to content

Latest commit

 

History

History
37 lines (23 loc) · 1.75 KB

File metadata and controls

37 lines (23 loc) · 1.75 KB

Data Mining

Project Structure

The repository is divided into three main sections:

  1. Classification - Predicting income levels using various machine learning models while ensuring fairness and high predictive accuracy.
  2. Clustering - Exploring clustering techniques on a dataset of 2500 news articles categorize the articles based on their content.
  3. Pattern Mining - Analyzing patterns in a dataset concerning income levels to uncover socio-economic characteristics influencing male and female working conditions.

Each section includes a detailed report as a PDF, source code in Python, and datasets used for the analyses.

Reports

1. Classification

  • Objective: Evaluate and compare different machine learning models in terms of accuracy and fairness.
  • Key Models Used: Decision trees, KNN, random forest, and ensemble methods.
  • Main Findings: Identification of the best model that balances fairness with predictive accuracy.

View the Classification Report

2. Clustering

  • Objective: Apply clustering methods to categorize news articles into distinct groups.
  • Techniques Used: KMeans, DBSCAN, and various dimensionality reduction methods.
  • Main Findings: Effective categorization of articles into coherent groups that reflect their content.

View the Clustering Report

3. Pattern Mining

  • Objective: Identify patterns that distinguish between different socio-economic groups.
  • Approach: Data preprocessing and analysis using pattern mining techniques.
  • Main Findings: Insights into the socio-economic characteristics that differentiate male and female working conditions.

View the Pattern Mining Report