Data Engineer SQL Roadmap

From schema design to production-grade data pipelines

0/93

Beginner

0/26 exercises

Build your SQL foundations

🔍

B1Not Started

SELECT Fundamentals

Filter, sort, and limit rows from a single table using WHERE, IN, BETWEEN, LIKE, ORDER BY, and LIMIT.

WHEREINBETWEENLIKE+2

📊

B2Not Started

Aggregation & GROUP BY

Summarize data with aggregate functions and group results using GROUP BY and HAVING.

COUNTSUMAVGMIN+3

🔗

B3Not Started

Basic Joins

Combine rows from two tables with INNER JOIN and LEFT JOIN, and apply WHERE and GROUP BY on joined data.

INNER JOINLEFT JOINJOIN + WHEREJOIN + GROUP BY

❓

B4Not Started

NULL Handling & CASE

Handle missing data with IS NULL and COALESCE, and add conditional logic with CASE WHEN expressions.

IS NULLCOALESCECASE WHEN

Intermediate

0/31 exercises

Compose complex queries

🔀

I1Not Started

Multi-table Joins

Join three or more tables, use self-joins, and apply CROSS JOIN for combinations.

3+ table JOINSelf-joinCROSS JOIN

🧩

I2Not Started

Subqueries & CTEs

Break complex queries into named steps with CTEs and use scalar, row, and correlated subqueries.

SubqueryCTECorrelated subquery

🏗️

DE-I1Not Started

DDL & Schema Design

Create tables with constraints, define primary/foreign keys, and design normalized schemas.

CREATE TABLEPRIMARY KEYFOREIGN KEYNormalization

📥

DE-I2Not Started

DML & Data Loading

Insert, update, and delete data with INSERT INTO, UPDATE, DELETE, and COPY/bulk loading patterns.

INSERTUPDATEDELETECOPY+1

🎯

DE-I3Not Started

Data Types & Constraints

Choose appropriate data types, apply CHECK constraints, and handle type casting with CAST.

Data typesCHECKCASTNOT NULL+1

Advanced

0/36 exercises

Master analytical patterns

⭐

DE-A1Not Started

Star Schema Design

Design and query star schemas with fact tables (measures + FKs) joined to dimension tables (attributes).

Fact tablesDimension tablesStar schema queries

📆

DE-A2Not Started

SCD Type 2

Implement slowly changing dimensions with valid_from, valid_to, and is_current for historical tracking.

SCD Type 2valid_from/valid_tois_current

🔄

DE-A3Not Started

MERGE & Upsert

Perform atomic insert-or-update operations for idempotent pipeline loading using MERGE and ON CONFLICT.

MERGEUPSERTON CONFLICTIdempotent loads

📈

DE-A4Not Started

Incremental Loads

Implement watermark patterns (max updated_at) and partition overwrite (DELETE+INSERT) for incremental data loading.

Watermark patternPartition overwriteIncremental ETL

🔃

DE-A5Not Started

Recursive CTEs

Traverse hierarchies, generate series, and solve graph problems using recursive common table expressions.

Recursive CTEHierarchy traversalGraph queries

✅

DE-A6Not Started

Data Quality Checks

Write SQL-based data quality checks: NULL completeness, uniqueness, referential integrity, and freshness.

NULL checksUniquenessReferential integrityFreshness

⚡

DE-A7Not Started

Query Optimization

Analyze execution plans, optimize joins, and apply indexing strategies for large datasets.

EXPLAINJoin optimizationIndexingPartitioning

📦

DE-A8Not Started

Nested Data & JSON

Query nested data structures using DuckDB's STRUCT, LIST, and JSON functions for semi-structured data.

STRUCTLISTJSONUNNEST