Data Engineer SQL Roadmap

From schema design to production-grade data pipelines

0/93

Beginner

0/26 exercises

Build your SQL foundations

πŸ”
B1Not Started

SELECT Fundamentals

Filter, sort, and limit rows from a single table using WHERE, IN, BETWEEN, LIKE, ORDER BY, and LIMIT.

WHEREINBETWEENLIKE+2
πŸ“Š
B2Not Started

Aggregation & GROUP BY

Summarize data with aggregate functions and group results using GROUP BY and HAVING.

COUNTSUMAVGMIN+3
πŸ”—
B3Not Started

Basic Joins

Combine rows from two tables with INNER JOIN and LEFT JOIN, and apply WHERE and GROUP BY on joined data.

INNER JOINLEFT JOINJOIN + WHEREJOIN + GROUP BY
❓
B4Not Started

NULL Handling & CASE

Handle missing data with IS NULL and COALESCE, and add conditional logic with CASE WHEN expressions.

IS NULLCOALESCECASE WHEN

Intermediate

0/31 exercises

Compose complex queries

πŸ”€
I1Not Started

Multi-table Joins

Join three or more tables, use self-joins, and apply CROSS JOIN for combinations.

3+ table JOINSelf-joinCROSS JOIN
🧩
I2Not Started

Subqueries & CTEs

Break complex queries into named steps with CTEs and use scalar, row, and correlated subqueries.

SubqueryCTECorrelated subquery
πŸ—οΈ
DE-I1Not Started

DDL & Schema Design

Create tables with constraints, define primary/foreign keys, and design normalized schemas.

CREATE TABLEPRIMARY KEYFOREIGN KEYNormalization
πŸ“₯
DE-I2Not Started

DML & Data Loading

Insert, update, and delete data with INSERT INTO, UPDATE, DELETE, and COPY/bulk loading patterns.

INSERTUPDATEDELETECOPY+1
🎯
DE-I3Not Started

Data Types & Constraints

Choose appropriate data types, apply CHECK constraints, and handle type casting with CAST.

Data typesCHECKCASTNOT NULL+1

Advanced

0/36 exercises

Master analytical patterns

⭐
DE-A1Not Started

Star Schema Design

Design and query star schemas with fact tables (measures + FKs) joined to dimension tables (attributes).

Fact tablesDimension tablesStar schema queries
πŸ“†
DE-A2Not Started

SCD Type 2

Implement slowly changing dimensions with valid_from, valid_to, and is_current for historical tracking.

SCD Type 2valid_from/valid_tois_current
πŸ”„
DE-A3Not Started

MERGE & Upsert

Perform atomic insert-or-update operations for idempotent pipeline loading using MERGE and ON CONFLICT.

MERGEUPSERTON CONFLICTIdempotent loads
πŸ“ˆ
DE-A4Not Started

Incremental Loads

Implement watermark patterns (max updated_at) and partition overwrite (DELETE+INSERT) for incremental data loading.

Watermark patternPartition overwriteIncremental ETL
πŸ”ƒ
DE-A5Not Started

Recursive CTEs

Traverse hierarchies, generate series, and solve graph problems using recursive common table expressions.

Recursive CTEHierarchy traversalGraph queries
βœ…
DE-A6Not Started

Data Quality Checks

Write SQL-based data quality checks: NULL completeness, uniqueness, referential integrity, and freshness.

NULL checksUniquenessReferential integrityFreshness
⚑
DE-A7Not Started

Query Optimization

Analyze execution plans, optimize joins, and apply indexing strategies for large datasets.

EXPLAINJoin optimizationIndexingPartitioning
πŸ“¦
DE-A8Not Started

Nested Data & JSON

Query nested data structures using DuckDB's STRUCT, LIST, and JSON functions for semi-structured data.

STRUCTLISTJSONUNNEST