The Aadhar Migration Prediction System is a machine learning-based tool that analyzes demographic update patterns from UIDAI Aadhar data and correlates them with current news trends to predict future migration cycles across Indian states.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Data Loader │ │ News Fetcher │ │ Analyzer │
│ (data_loader) │ │ (news_fetcher) │ │ (analysis) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
│ Aadhar CSV Data │ Google News RSS │
│ │ │
└───────────┬───────────┴───────────────────────┘
│
▼
┌──────────────┐
│ main.py │
│ (Orchestrator)│
└──────┬───────┘
│
▼
┌──────────────┐
│ Report │
│ (.md output) │
└──────────────┘
- Reads all CSV files from the
Data/folder - Each CSV contains demographic update records for a state
- Extracts: date, state name, total updates count
- Combines all state data into a single DataFrame
- For each detected state, fetches news from Google News RSS
- Search keywords:
jobs,migration,hiring,industrial growth,layoffs - Extracts: title, published date, region, query category
- Rate-limited to 1 request/second to respect server limits
- Aggregates Aadhar data by state and month
- Aggregates news counts by state, month, and category
- Merges both datasets on state + month
- Creates features:
{category}_news_countfor each news category
- Uses a custom NumPy-based Linear Regression implementation
- Target variable:
total_updates(Aadhar demographic changes) - Features: News counts per category
- Trains on 80% of data, tests on 20%
- Outputs: coefficients, RMSE, factor interpretations
- Aggregates current news counts per state
- Uses trained model to predict migration magnitude
- Converts predictions to probability percentages (relative share)
- Ranks states by migration probability
- Creates a detailed markdown report with:
- Executive summary
- Data overview
- State-wise analysis
- Model insights and reasoning
- Predictions with interpretations
- Recommendations
News activity in a region serves as a leading indicator of migration patterns.
Economic news (hiring, industrial growth) attracts migrants seeking opportunities, while negative news (layoffs) may trigger outward migration.
| News Category | Expected Effect on Migration |
|---|---|
| Hiring News | Positive - Job opportunities attract migrants |
| Industrial Growth | Positive - Economic development draws workers |
| Jobs News | Mixed - May indicate competition or opportunity |
| Layoffs News | Negative - Economic distress discourages migration |
| Migration News | Lagging - Reports on already-happening events |
The linear regression model learns weights for each news category:
- Positive coefficient = More news → More predicted migration
- Negative coefficient = More news → Less predicted migration
- Magnitude = Strength of the relationship
State Probability = (State's Predicted Value / Total Predicted Value) × 100%
This gives a relative ranking showing which states are most likely to experience migration activity compared to others.
UIDAI Hackathon/
├── Data/ # Aadhar CSV files (one per state)
│ ├── Andhra Pradesh.csv
│ ├── Delhi.csv
│ └── ...
├── main.py # Main orchestrator & report generator
├── data_loader.py # Loads and parses Aadhar CSVs
├── news_fetcher.py # Fetches news from Google RSS
├── analysis.py # ML model and prediction logic
├── migration_report.md # Generated output report
└── README.md # This documentation
python main.pypython main.py --test_runpython main.py --data_dir "C:/path/to/csv/files"python main.py --output "custom_report.md"-
Live Data Variability: News is fetched in real-time; different runs may yield different results as news updates.
-
Correlation ≠ Causation: The model finds statistical relationships, not causal links between news and migration.
-
Data Quality: Predictions are only as good as the input data. Missing or incomplete Aadhar data affects accuracy.
-
Linear Model: The simple linear regression may not capture complex non-linear relationships. More sophisticated models (Random Forest, XGBoost) could improve accuracy.
-
News Relevance: Google News results may include tangentially related articles that add noise to predictions.
- Add data caching to ensure reproducible results
- Implement more advanced ML models (ensemble methods)
- Add sentiment analysis on news articles
- Include historical weather and economic indicators
- Create interactive visualization dashboard
- Python 3.x
- pandas
- numpy
- feedparser (for RSS parsing)
Install with:
pip install pandas numpy feedparser