Advanced SQL Window Functions for Data Science

(Phase 1.5 – Week 3 Deep Dive | 1 Week | 6–8 hrs/day) Goal: Master Window Functions — the secret weapon of top Data Scientists & Analysts. Used in 70% of hard SQL interviews Enables rankings, running totals, cohorts, funnels, time series Replaces complex self-joins & subqueries

Advanced SQL Window Functions for Data Science

(Phase 1.5 – Week 3 Deep Dive | 1 Week | 6–8 hrs/day)

Goal: Master Window Functions — the secret weapon of top Data Scientists & Analysts.

Used in 70% of hard SQL interviews
Enables rankings, running totals, cohorts, funnels, time series
Replaces complex self-joins & subqueries

Why Window Functions?

Without Window	With Window
3+ subqueries + CTEs	1 clean query
Hard to read	Elegant & fast
Slow performance	Optimized by DB engine

Core Syntax

function() OVER (
    [PARTITION BY col1, col2]
    [ORDER BY col3]
    [ROWS/RANGE frame_spec]
)

Clause	Purpose
`PARTITION BY`	Group (like `GROUP BY`)
`ORDER BY`	Define sequence
`ROWS BETWEEN`	Sliding window (e.g., last 3 rows)

1. Ranking Functions

Function	Use	Gaps?	Same Rank
`ROW_NUMBER()`	Unique ID	No	Different
`RANK()`	Competition ranking	Yes	Same → gap
`DENSE_RANK()`	No gaps	No	Same → no gap
`NTILE(n)`	Quartiles, deciles	—	—

Example: Rank Tracks by Length per Genre

SELECT 
    t.Name,
    g.Name AS genre,
    t.Milliseconds,
    ROW_NUMBER() OVER (PARTITION BY g.Name ORDER BY t.Milliseconds DESC) AS row_num,
    RANK() OVER (PARTITION BY g.Name ORDER BY t.Milliseconds DESC) AS rank_gap,
    DENSE_RANK() OVER (PARTITION BY g.Name ORDER BY t.Milliseconds DESC) AS dense_rank
FROM Track t
JOIN Genre g ON t.GenreId = g.GenreId
WHERE g.Name = 'Rock'
LIMIT 10;

Name	Milliseconds	row_num	rank_gap	dense_rank
Occupation	5286953	1	1	1
Dazed and Confused	4193280	2	2	2
Dazed and Confused	4193280	3	2	2

2. Aggregate Window Functions

Running Total (Cumulative Sum)

-- Monthly revenue with running total
SELECT 
    strftime('%Y-%m', InvoiceDate) AS month,
    SUM(Total) AS monthly_revenue,
    SUM(SUM(Total)) OVER (ORDER BY strftime('%Y-%m')) AS running_total
FROM Invoice
GROUP BY month
ORDER BY month;

Moving Average (Last 3 Months)

SELECT 
    month,
    monthly_revenue,
    AVG(monthly_revenue) OVER (
        ORDER BY month 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3m
FROM (
    SELECT strftime('%Y-%m') AS month, SUM(Total) AS monthly_revenue
    FROM Invoice
    GROUP BY month
) sub;

3. Lead / Lag – Time Series & Funnels

-- Next purchase date for each customer
SELECT 
    CustomerId,
    InvoiceDate,
    LEAD(InvoiceDate) OVER (PARTITION BY CustomerId ORDER BY InvoiceDate) AS next_purchase,
    JULIANDAY(LEAD(InvoiceDate) OVER (PARTITION BY CustomerId ORDER BY InvoiceDate)) 
        - JULIANDAY(InvoiceDate) AS days_to_next
FROM Invoice
ORDER BY CustomerId, InvoiceDate;

Use Case:

Customer retention, churn prediction, session analysis

4. First / Last Value

-- First and last purchase per customer
SELECT DISTINCT
    CustomerId,
    FIRST_VALUE(InvoiceDate) OVER (PARTITION BY CustomerId ORDER BY InvoiceDate) AS first_purchase,
    LAST_VALUE(InvoiceDate) OVER (
        PARTITION BY CustomerId 
        ORDER BY InvoiceDate 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_purchase
FROM Invoice;

Warning: LAST_VALUE needs full frame!

5. Frame Specification (Advanced)

Frame	Meaning
`UNBOUNDED PRECEDING`	From start
`CURRENT ROW`	This row
`UNBOUNDED FOLLOWING`	To end
`n PRECEDING`	Last n rows

-- 3-month trailing average
AVG(revenue) OVER (
    ORDER BY month
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)

Real-World Business Problems (Solved with Window)

1. Customer Lifetime Value (CLV) with Cohorts

WITH first_purchase AS (
    SELECT 
        CustomerId,
        MIN(InvoiceDate) AS cohort_date
    FROM Invoice
    GROUP BY CustomerId
),
monthly_spend AS (
    SELECT 
        c.CustomerId,
        strftime('%Y-%m', i.InvoiceDate) AS invoice_month,
        strftime('%Y-%m', fp.cohort_date) AS cohort_month,
        SUM(i.Total) AS spend
    FROM Invoice i
    JOIN first_purchase fp ON i.CustomerId = fp.CustomerId
    GROUP BY c.CustomerId, invoice_month, cohort_month
)
SELECT 
    cohort_month,
    invoice_month,
    COUNT(DISTINCT CustomerId) AS customers,
    SUM(spend) AS revenue,
    ROUND(AVG(spend), 2) AS arpu
FROM monthly_spend
GROUP BY cohort_month, invoice_month
ORDER BY cohort_month, invoice_month;

2. Sessionization (Web Analytics)

-- Assign session ID when gap > 30 min
WITH flagged AS (
    SELECT 
        user_id,
        event_time,
        LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time,
        CASE 
            WHEN JULIANDAY(event_time) - JULIANDAY(LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time)) > 30/1440 
            THEN 1 ELSE 0 
        END AS new_session
    FROM events
),
sessioned AS (
    SELECT *,
           SUM(new_session) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
    FROM flagged
)
SELECT user_id, session_id, MIN(event_time), MAX(event_time)
FROM sessioned
GROUP BY user_id, session_id;

3. Top N per Group (Interview Classic)

-- Top 3 longest tracks per genre
WITH ranked AS (
    SELECT 
        t.*,
        g.Name AS genre,
        ROW_NUMBER() OVER (PARTITION BY g.Name ORDER BY Milliseconds DESC) AS rn
    FROM Track t
    JOIN Genre g ON t.GenreId = g.GenreId
)
SELECT TrackId, Name, genre, Milliseconds
FROM ranked
WHERE rn <= 3
ORDER BY genre, rn;

Interview Questions (Solve These!)

#	Question	Hint
1	Find 2nd highest salary per department	`DENSE_RANK()`
2	Running total of sales	`SUM() OVER (ORDER BY)`
3	Customers with 3+ consecutive months of purchase	`LAG()` + flag
4	% of total revenue per invoice	`SUM() OVER()`
5	Identify churned users (no activity in 30 days)	`LEAD()`

Practice Plan (7 Days)

Day	Task
1	`ROW_NUMBER`, `RANK`, `DENSE_RANK` → 20 queries
2	`SUM`, `AVG` over windows → running totals
3	`LEAD`/`LAG` → time gaps, retention
4	`FIRST_VALUE`/`LAST_VALUE` + frame
5	Top N per group → solve 10 variations
6	Cohort analysis on Chinook
7	Mock interview → explain 3 queries aloud

Resources

Resource	Link
Mode Analytics – Window Functions	mode.com/sql-tutorial/sql-window-functions
PostgreSQL Docs	postgresql.org/docs/current/tutorial-window.html
LeetCode: Window Problems	Search "window" in SQL
DB Fiddle	db-fiddle.com

Final Project: Advanced Analytics Dashboard (SQL Only)

Repo: yourname/sql-window-mastery

-- 01_running_revenue.sql
-- 02_top3_per_genre.sql
-- 03_customer_retention_cohort.sql
-- 04_sessionization_example.sql
-- README.md

`README.md` Snippet

# SQL Window Functions Mastery

## Key Queries
- **Running Revenue**: Cumulative sales over time
- **Top 3 Tracks per Genre**: Using `ROW_NUMBER()`
- **Cohort Retention**: Monthly active users by signup cohort
- **Sessionization**: Group events into sessions

## How to Run
```sql
-- SQLite
sqlite3 chinook.db < 01_running_revenue.sql

---

## Cheat Sheet (Copy-Paste)

```sql
-- RANKING
ROW_NUMBER() OVER (PARTITION BY x ORDER BY y)
RANK() OVER (PARTITION BY x ORDER BY y DESC)
DENSE_RANK() OVER (PARTITION BY x ORDER BY y DESC)

-- AGGREGATE
SUM(col) OVER (PARTITION BY cat ORDER BY date) AS running_total
AVG(col) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS ma_3

-- LEAD/LAG
LAG(col, 1) OVER (PARTITION BY user ORDER BY time) AS prev_val
LEAD(col) OVER (PARTITION BY user ORDER BY time) AS next_val

-- FRAME
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING

Assessment: Can You Write This?

Query	Yes/No
Rank employees by sales, no gaps	☐
3-month moving average	☐
Next purchase date per customer	☐
Top 2 invoices per day	☐
Cohort table (month 0,1,2…)	☐

All Yes → You’re a Window Function Ninja!

Next: Phase 2 – Statistics & Math

Now you can extract and rank — time to explain with stats.

Start Now:
1. Open Chinook.db
2. Run:

SELECT 
    CustomerId,
    InvoiceDate,
    Total,
    SUM(Total) OVER (PARTITION BY CustomerId ORDER BY InvoiceDate) AS running_spend
FROM Invoice
ORDER BY CustomerId, InvoiceDate
LIMIT 10;

Tag me when you push your window function repo!
You’re now in the top 10% of SQL users.

Last updated: Nov 09, 2025

Advanced SQL Window Functions for Data Science

(Phase 1.5 – Week 3 Deep Dive | 1 Week | 6–8 hrs/day)

Goal: Master Window Functions — the secret weapon of top Data Scientists & Analysts.

Used in 70% of hard SQL interviews
Enables rankings, running totals, cohorts, funnels, time series
Replaces complex self-joins & subqueries

Why Window Functions?

Without Window	With Window
3+ subqueries + CTEs	1 clean query
Hard to read	Elegant & fast
Slow performance	Optimized by DB engine

Core Syntax

function() OVER (
    [PARTITION BY col1, col2]
    [ORDER BY col3]
    [ROWS/RANGE frame_spec]
)

Clause	Purpose
`PARTITION BY`	Group (like `GROUP BY`)
`ORDER BY`	Define sequence
`ROWS BETWEEN`	Sliding window (e.g., last 3 rows)

1. Ranking Functions

Function	Use	Gaps?	Same Rank
`ROW_NUMBER()`	Unique ID	No	Different
`RANK()`	Competition ranking	Yes	Same → gap
`DENSE_RANK()`	No gaps	No	Same → no gap
`NTILE(n)`	Quartiles, deciles	—	—

Example: Rank Tracks by Length per Genre

SELECT 
    t.Name,
    g.Name AS genre,
    t.Milliseconds,
    ROW_NUMBER() OVER (PARTITION BY g.Name ORDER BY t.Milliseconds DESC) AS row_num,
    RANK() OVER (PARTITION BY g.Name ORDER BY t.Milliseconds DESC) AS rank_gap,
    DENSE_RANK() OVER (PARTITION BY g.Name ORDER BY t.Milliseconds DESC) AS dense_rank
FROM Track t
JOIN Genre g ON t.GenreId = g.GenreId
WHERE g.Name = 'Rock'
LIMIT 10;

Name	Milliseconds	row_num	rank_gap	dense_rank
Occupation	5286953	1	1	1
Dazed and Confused	4193280	2	2	2
Dazed and Confused	4193280	3	2	2

2. Aggregate Window Functions

Running Total (Cumulative Sum)

-- Monthly revenue with running total
SELECT 
    strftime('%Y-%m', InvoiceDate) AS month,
    SUM(Total) AS monthly_revenue,
    SUM(SUM(Total)) OVER (ORDER BY strftime('%Y-%m')) AS running_total
FROM Invoice
GROUP BY month
ORDER BY month;

Moving Average (Last 3 Months)

SELECT 
    month,
    monthly_revenue,
    AVG(monthly_revenue) OVER (
        ORDER BY month 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3m
FROM (
    SELECT strftime('%Y-%m') AS month, SUM(Total) AS monthly_revenue
    FROM Invoice
    GROUP BY month
) sub;

3. Lead / Lag – Time Series & Funnels

-- Next purchase date for each customer
SELECT 
    CustomerId,
    InvoiceDate,
    LEAD(InvoiceDate) OVER (PARTITION BY CustomerId ORDER BY InvoiceDate) AS next_purchase,
    JULIANDAY(LEAD(InvoiceDate) OVER (PARTITION BY CustomerId ORDER BY InvoiceDate)) 
        - JULIANDAY(InvoiceDate) AS days_to_next
FROM Invoice
ORDER BY CustomerId, InvoiceDate;

Use Case:

Customer retention, churn prediction, session analysis

4. First / Last Value

-- First and last purchase per customer
SELECT DISTINCT
    CustomerId,
    FIRST_VALUE(InvoiceDate) OVER (PARTITION BY CustomerId ORDER BY InvoiceDate) AS first_purchase,
    LAST_VALUE(InvoiceDate) OVER (
        PARTITION BY CustomerId 
        ORDER BY InvoiceDate 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_purchase
FROM Invoice;

Warning: LAST_VALUE needs full frame!

5. Frame Specification (Advanced)

Frame	Meaning
`UNBOUNDED PRECEDING`	From start
`CURRENT ROW`	This row
`UNBOUNDED FOLLOWING`	To end
`n PRECEDING`	Last n rows

-- 3-month trailing average
AVG(revenue) OVER (
    ORDER BY month
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)

Real-World Business Problems (Solved with Window)

1. Customer Lifetime Value (CLV) with Cohorts

WITH first_purchase AS (
    SELECT 
        CustomerId,
        MIN(InvoiceDate) AS cohort_date
    FROM Invoice
    GROUP BY CustomerId
),
monthly_spend AS (
    SELECT 
        c.CustomerId,
        strftime('%Y-%m', i.InvoiceDate) AS invoice_month,
        strftime('%Y-%m', fp.cohort_date) AS cohort_month,
        SUM(i.Total) AS spend
    FROM Invoice i
    JOIN first_purchase fp ON i.CustomerId = fp.CustomerId
    GROUP BY c.CustomerId, invoice_month, cohort_month
)
SELECT 
    cohort_month,
    invoice_month,
    COUNT(DISTINCT CustomerId) AS customers,
    SUM(spend) AS revenue,
    ROUND(AVG(spend), 2) AS arpu
FROM monthly_spend
GROUP BY cohort_month, invoice_month
ORDER BY cohort_month, invoice_month;

2. Sessionization (Web Analytics)

-- Assign session ID when gap > 30 min
WITH flagged AS (
    SELECT 
        user_id,
        event_time,
        LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time,
        CASE 
            WHEN JULIANDAY(event_time) - JULIANDAY(LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time)) > 30/1440 
            THEN 1 ELSE 0 
        END AS new_session
    FROM events
),
sessioned AS (
    SELECT *,
           SUM(new_session) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
    FROM flagged
)
SELECT user_id, session_id, MIN(event_time), MAX(event_time)
FROM sessioned
GROUP BY user_id, session_id;

3. Top N per Group (Interview Classic)

-- Top 3 longest tracks per genre
WITH ranked AS (
    SELECT 
        t.*,
        g.Name AS genre,
        ROW_NUMBER() OVER (PARTITION BY g.Name ORDER BY Milliseconds DESC) AS rn
    FROM Track t
    JOIN Genre g ON t.GenreId = g.GenreId
)
SELECT TrackId, Name, genre, Milliseconds
FROM ranked
WHERE rn <= 3
ORDER BY genre, rn;

Interview Questions (Solve These!)

#	Question	Hint
1	Find 2nd highest salary per department	`DENSE_RANK()`
2	Running total of sales	`SUM() OVER (ORDER BY)`
3	Customers with 3+ consecutive months of purchase	`LAG()` + flag
4	% of total revenue per invoice	`SUM() OVER()`
5	Identify churned users (no activity in 30 days)	`LEAD()`

Practice Plan (7 Days)

Day	Task
1	`ROW_NUMBER`, `RANK`, `DENSE_RANK` → 20 queries
2	`SUM`, `AVG` over windows → running totals
3	`LEAD`/`LAG` → time gaps, retention
4	`FIRST_VALUE`/`LAST_VALUE` + frame
5	Top N per group → solve 10 variations
6	Cohort analysis on Chinook
7	Mock interview → explain 3 queries aloud

Resources

Resource	Link
Mode Analytics – Window Functions	mode.com/sql-tutorial/sql-window-functions
PostgreSQL Docs	postgresql.org/docs/current/tutorial-window.html
LeetCode: Window Problems	Search "window" in SQL
DB Fiddle	db-fiddle.com

Final Project: Advanced Analytics Dashboard (SQL Only)

Repo: yourname/sql-window-mastery

-- 01_running_revenue.sql
-- 02_top3_per_genre.sql
-- 03_customer_retention_cohort.sql
-- 04_sessionization_example.sql
-- README.md

`README.md` Snippet

# SQL Window Functions Mastery

## Key Queries
- **Running Revenue**: Cumulative sales over time
- **Top 3 Tracks per Genre**: Using `ROW_NUMBER()`
- **Cohort Retention**: Monthly active users by signup cohort
- **Sessionization**: Group events into sessions

## How to Run
```sql
-- SQLite
sqlite3 chinook.db < 01_running_revenue.sql

---

## Cheat Sheet (Copy-Paste)

```sql
-- RANKING
ROW_NUMBER() OVER (PARTITION BY x ORDER BY y)
RANK() OVER (PARTITION BY x ORDER BY y DESC)
DENSE_RANK() OVER (PARTITION BY x ORDER BY y DESC)

-- AGGREGATE
SUM(col) OVER (PARTITION BY cat ORDER BY date) AS running_total
AVG(col) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS ma_3

-- LEAD/LAG
LAG(col, 1) OVER (PARTITION BY user ORDER BY time) AS prev_val
LEAD(col) OVER (PARTITION BY user ORDER BY time) AS next_val

-- FRAME
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING

Assessment: Can You Write This?

Query	Yes/No
Rank employees by sales, no gaps	☐
3-month moving average	☐
Next purchase date per customer	☐
Top 2 invoices per day	☐
Cohort table (month 0,1,2…)	☐

All Yes → You’re a Window Function Ninja!

Next: Phase 2 – Statistics & Math

Now you can extract and rank — time to explain with stats.

Start Now:
1. Open Chinook.db
2. Run:

SELECT 
    CustomerId,
    InvoiceDate,
    Total,
    SUM(Total) OVER (PARTITION BY CustomerId ORDER BY InvoiceDate) AS running_spend
FROM Invoice
ORDER BY CustomerId, InvoiceDate
LIMIT 10;

Tag me when you push your window function repo!
You’re now in the top 10% of SQL users.

Last updated: Nov 09, 2025

Advanced SQL Window Functions for Data Science