A SQL project analyzing the data engineer job market using real world job posting data. It demonstrates my ability to write production-quality analytical SQL, design efficient queries, and turn business questions into data-driven insights.
- Project scope: Built 3 analytical queries that answer key questions about the data engineer job market
- Data modeling: Used multi-table joins across fact and dimension tables to extract insights
- Analytics: Applied aggregations, filtering, and sorting to find top skills by demand, salary, and overall value
- Outcomes: Delivered actionable insights on SQL/Python dominance, cloud trends, and salary patterns
Job market analysts need to answer questions like:
- Most in-demand: Which skills are most in-demand for data engineers?
- Highest paid: Which skills command the highest salaries?
- Best trade-off: What is the optimal skill set balancing demand and compensation?
This project analyzes a data warehouse built using a star schema design. The warehouse structure consists of:
- Fact Table:
job_postings_fact- Central table containing job posting details (job titles, locations, salaries, dates, etc.) - Dimension Tables:
company_dim- Company information linked to job postingsskills_dim- Skills catalog with skill names and types
- Bridge Table:
skills_job_dim- Resolves the many-to-many relationship between job postings and skills
By querying across these interconnected tables, I extracted insights about skill demand, salary patterns, and optimal skill combinations for data engineering roles.
- Query Engine: DuckDB
- Language: SQL
- Data Model: Star schema with fact + dimension + bridge tables
- Development: VS Code for SQL editing + Terminal for DuckDB CLI
- Version Control: Git/GitHub for versioned SQL scripts
- Top Demanded Skills – Identifies the 10 most in-demand skills for remote data engineer positions
- Top Paying Skills – Analyzes the 25 highest-paying skills with salary and demand metrics
- Optimal Skills – Calculates an optimal score using natural log of demand combined with median salary to identify the most valuable skills to learn
- Core languages: SQL and Python each appear in ~29,000 job postings, making them the most demanded skills
- Cloud platforms: AWS and Azure are critical for modern data engineering roles-
- Infra & tooling: Kubernetes, Docker, and Terraform are associated with premium salaries
- Big data tools: Apache Spark shows strong demand with competitive compensation

