Skip to content
View ivanshamaev's full-sized avatar
🎓
learning new stack
🎓
learning new stack

Block or report ivanshamaev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ivanshamaev/README.md

Ivan Shamaev banner

CV Website DataTalks Telegram LinkedIn

15+ years in IT · Data Engineering · DWH · AI Agent Engineering · BI · Moscow


I design and build data platforms, warehouse pipelines, and AI-powered tooling that turn raw data into trusted decisions. My path started in enterprise BI consulting — Oracle Hyperion, QlikView, financial planning systems — then evolved through modern data stacks into DWH architecture and, most recently, into production AI agents that understand warehouse metadata and generate SQL from natural language.

Data Engineering

Batch pipelines, multi-layer DWH modeling, ETL/ELT orchestration with Airflow and dbt, query optimization, data quality, and reliable data products for finance, product, and operations.
AI Agent Engineering

RAG-based agents for DWH: metadata indexing, table relationship graphs, natural language to SQL, LLM integration with Claude and open-source models, vector DB pipelines.
Analytics Enablement

Bridging platform and business: data marts, BI tooling (Superset, DataLens, Qlik, Grafana), observability dashboards, C-level reporting, and cross-functional collaboration.

Current Focus

At Ozon Tech (Nov 2024 – present) as Senior DWH Developer:

  • Designing batch pipelines on Vertica and Trino; building DDS and data mart layers with attention to data quality and query performance
  • Building a RAG-based AI agent for DWH — metadata-aware, capable of SQL auto-generation from natural language
  • ETL monitoring dashboards in Grafana with 100% domain failure coverage
  • Migrating workloads from Vertica to Trino; mentoring colleagues on AI tooling
sources → ingestion → orchestration → DWH / lakehouse → marts → BI / AI agents

Work Experience

Period Company Role
Nov 2024 – present Ozon Tech Senior DWH Developer
Dec 2023 – Nov 2024 eapteka.ru TeamLead BI & Acting DWH TeamLead
Aug 2020 – Nov 2023 TheSoul Publishing (Cyprus) Senior BI Developer
May 2017 – Jul 2020 Luding Leading Business Analyst – QlikView
Nov 2014 – May 2017 Dixy Oracle Hyperion & QlikView Consultant
Sep 2013 – Nov 2014 АльфаСтрахование Systems Analyst – Hyperion Planning
Oct 2011 – Aug 2013 Glowbyte Consulting Financial Solutions Consultant
Key achievements by role

Ozon Tech — Developed a RAG-based AI agent integrating DWH metadata and automatic SQL generation; built Grafana ETL monitoring with full domain coverage; optimized and migrated Vertica → Trino.

еАптека — Led BigQuery → Yandex Cloud migration, rebuilt pipelines with ClickHouse + Airflow + dbt; C-level dashboards for sales and product analytics; hired and onboarded new team members.

TheSoul Publishing — Deployed Apache Superset as a Qlik replacement (significant licence cost reduction); built plan-vs-fact automation saving 90% of reporting time; 3-layer Facebook API ETL pipeline; custom Superset plugins in React/TypeScript.

Luding — Sales funnel analysis, multi-API PHP extractors, C# Windows Service for NPrinting; awarded Best Employee Q4 2019.


Toolbox

Data Engineering & Orchestration

Python SQL Airflow dbt PySpark Kafka

Databases & DWH

Vertica Trino ClickHouse Iceberg HDFS PostgreSQL MSSQL GreenPlum

AI & LLM

RAG AI Agents VectorDB Claude Prompt Engineering

BI & Visualization

Superset Grafana DataLens Qlik Metabase

Infrastructure

Docker GitLab CI/CD Yandex Cloud S3 Linux nginx

DWH Modeling

Data Vault 2.0 · Anchor Modeling · Kimball


Key Projects

AI Agent for DWH (RAG)
Ozon Tech, 2025
RAG system integrating DWH metadata, table relationships, and automatic SQL generation from natural language. Stack: Python, LLM (Claude), VectorDB, Trino.
ETL Monitoring Dashboard
Ozon Tech, 2025
Grafana dashboards providing 100% coverage of DWH domain failure points and pipeline status across all batch workflows.
BigQuery → Yandex Cloud Migration
еАптека, 2024
End-to-end cloud migration with pipeline rebuild on ClickHouse + Airflow + dbt. Included team training and DataLens performance fixes.
Plan-vs-Fact Automation
TheSoul Publishing, 2021
Reporting automation that reduced preparation time by 90%. Integrated with multiple data sources.
Apache Superset Implementation
TheSoul Publishing, 2021
Selected and deployed Superset as a Qlik replacement. Built custom React/TypeScript plugins, GitLab CI/CD automation, and team onboarding.
Facebook API ETL Pipeline
TheSoul Publishing, 2020
3-layer DataLake architecture (raw → staging → marts) with daily updates for social media analytics.

Education & Certifications

Bauman Moscow State Technical University (2006–2012)
Engineer — Automation of Technological Processes and Production

Certifications & Courses (2018–2025)

2025 (in progress)

  • ClickHouse for Analysts — Stepik
  • Python Generation: OOP — Stepik

2024

  • Apache Airflow for Analysts — Stepik
  • dbt Fundamentals — getdbt.com
  • A/B Testing — Stepik
  • Statistics Fundamentals — Stepik
  • PySpark — Stepik
  • Python for Professionals — Stepik
  • Data Engineer Professional Course — Stepik

2023

  • SQL for Data Analysis — Stepik
  • SQL Window Functions — Stepik
  • Python Programming & Advanced Python — Stepik
  • Data Science — Stepik

2018

  • qRUG Conference Speaker on Qlik — ATK Consulting

Writing and Community

  • Personal blog with practical articles on data engineering, DWH, and analytics: ivan-shamaev.ru
  • Co-author at datatalks.ru — data engineering and analytics community
  • Telegram channel @data_engineer_path — DWH, SQL, Airflow, open-source tooling, notes from production
  • Speaker at qRUG 2018 conference on Qlik

Pinned Loading

  1. trino-iceberg-minio trino-iceberg-minio Public

    Тестовый проект по Trino + Iceberg + Rest Catalog + Minio s3

    4 1

  2. clickhouse-docker-compose clickhouse-docker-compose Public

    This GitHub repository offers a Docker Compose configuration for easy deployment and management of ClickHouse clusters.

    2 1

  3. data-vault-2.0-Northwind data-vault-2.0-Northwind Public

    Пример Data Vault 2.0 для PostgreSQL

    PLpgSQL

  4. python-algorithms-data-engineer python-algorithms-data-engineer Public

    Python Algorithms for Data Engineers | Алгоритмы Python для инженеров данных. Этот репозиторий содержит коллекцию Jupyter notebooks, с решениями различных алгоритмических задач. Notebooks собраны с…

    Jupyter Notebook