Detailed SQL for Data Analysis
(Phase 1.5 | 4 Weeks | 4–6 hrs/day) Goal: Master SQL for data extraction, aggregation, and insight generation — the #1 skill for Data Analyst & Data Scientist roles.
Detailed SQL for Data Analysis
Detailed SQL for Data Analysis
Detailed SQL for Data Analysis
(Phase 1.5 | 4 Weeks | 4–6 hrs/day)
Goal: Master SQL for data extraction, aggregation, and insight generation — the #1 skill for Data Analyst & Data Scientist roles.
Used by: 95% of companies | Appears in: 80% of DS interviews | Salary boost: +$10–20K
Why SQL?
| Business Need | SQL Does |
|---|---|
| "Show me sales last quarter" | WHERE date BETWEEN ... |
| "Top 10 customers by revenue" | ORDER BY revenue DESC LIMIT 10 |
| "YoY growth %" | Window functions |
| "Funnel drop-off" | CTEs + COUNT(DISTINCT) |
Week-by-Week Roadmap
| Week | Focus | Hours |
|---|---|---|
| 1 | SQL Basics + SELECT | 25 |
| 2 | Filtering, Aggregation, Joins | 30 |
| 3 | Subqueries, CTEs, Window Functions | 30 |
| 4 | Real-World Project + Interview Prep | 25 |
Tools Setup (Day 1)
| Tool | Install |
|---|---|
| DB Browser for SQLite | sqlitebrowser.org |
| PostgreSQL (Optional) | postgresql.org |
| Online Sandbox | SQLFiddle, DB Fiddle |
Practice DB: Download Chinook Database (Music store with 11 tables)
Week 1: SQL Basics & SELECT
Core Syntax
SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column
LIMIT 10;
Practice Queries (Chinook DB)
| # | Question | Query |
|---|---|---|
| 1 | List all albums | SELECT * FROM Album LIMIT 5; |
| 2 | Find tracks longer than 5 min | SELECT Name, Milliseconds FROM Track WHERE Milliseconds > 300000; |
| 3 | Artists from Brazil | SELECT * FROM Artist WHERE Country = 'Brazil'; |
Key Concepts
| Concept | Example |
|---|---|
DISTINCT |
SELECT DISTINCT Country FROM Customer; |
LIKE |
SELECT Name FROM Track WHERE Name LIKE '%love%'; |
IN |
WHERE GenreId IN (1, 3, 5) |
BETWEEN |
WHERE UnitPrice BETWEEN 0.99 AND 1.99 |
Daily Task: Write 10 queries/day → save in week1.sql
Week 2: Aggregation & Joins
Aggregation
SELECT
GenreId,
COUNT(*) AS track_count,
AVG(Milliseconds)/60000 AS avg_min
FROM Track
GROUP BY GenreId
HAVING COUNT(*) > 100
ORDER BY track_count DESC;
JOIN Types
| Type | Use |
|---|---|
INNER JOIN |
Matching rows |
LEFT JOIN |
All from left |
FULL OUTER |
All from both |
Example: Customer + Invoice
SELECT
c.FirstName || ' ' || c.LastName AS customer,
SUM(i.Total) AS total_spent
FROM Customer c
LEFT JOIN Invoice i ON c.CustomerId = i.CustomerId
GROUP BY c.CustomerId
ORDER BY total_spent DESC
LIMIT 5;
Project: Top 5 Customers by Country
Output: Country, Customer Name, Total Spent
Week 3: Subqueries, CTEs, Window Functions
1. Subquery
SELECT Name, Composer
FROM Track
WHERE GenreId = (
SELECT GenreId FROM Genre WHERE Name = 'Rock'
);
2. CTE (Common Table Expression)
WITH rock_tracks AS (
SELECT * FROM Track
WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock')
)
SELECT Composer, COUNT(*)
FROM rock_tracks
GROUP BY Composer
ORDER BY COUNT(*) DESC;
3. Window Functions (GAME CHANGER)
-- Rank tracks by length per genre
SELECT
Name,
GenreId,
Milliseconds,
RANK() OVER (PARTITION BY GenreId ORDER BY Milliseconds DESC) AS rank_longest
FROM Track;
Advanced Query:
"For each customer, show their total spend and % of country total"
WITH customer_spend AS (
SELECT
c.CustomerId,
c.Country,
SUM(i.Total) AS spend
FROM Customer c
JOIN Invoice i ON c.CustomerId = i.CustomerId
GROUP BY c.CustomerId
),
country_total AS (
SELECT Country, SUM(spend) AS country_spend
FROM customer_spend
GROUP BY Country
)
SELECT
cs.CustomerId,
cs.Country,
cs.spend,
ROUND(100.0 * cs.spend / ct.country_spend, 2) AS pct_of_country
FROM customer_spend cs
JOIN country_total ct ON cs.Country = ct.Country
ORDER BY pct_of_country DESC;
Week 4: Real-World Project + Interview Prep
Final Project: Music Store Business Report
Deliverables (GitHub Repo: yourname/sql-music-analysis)
sql-music-analysis/
├── queries/
│ ├── 01_top_customers.sql
│ ├── 02_genre_performance.sql
│ ├── 03_employee_sales.sql
│ └── report.md
├── data/
│ └── chinook.db
└── README.md
Key Queries to Write
| # | Business Question | SQL File |
|---|---|---|
| 1 | Top 10 customers by total spend | 01_top_customers.sql |
| 2 | Best-selling genre per country | 02_genre_performance.sql |
| 3 | Employee sales performance (with manager) | 03_employee_sales.sql |
| 4 | Tracks never sold | 04_unsold_tracks.sql |
| 5 | Monthly revenue trend | 05_revenue_trend.sql |
report.md Example
# Music Store SQL Analysis
## Key Insights
- **Rock** is the top genre (54% of sales)
- Top customer: **Helena Holý** ($49.62 from Czech Republic)
- **Jane Peacock** is the top-performing employee
- 2 tracks have **never been sold**
## How to Run
```sql
-- Open in DB Browser for SQLite
-- Run queries/01_top_customers.sql
---
## Interview-Ready SQL (LeetCode Style)
| Difficulty | Problem | Link |
|----------|-------|------|
| Easy | [175. Combine Two Tables](https://leetcode.com/problems/combine-two-tables/) | `LEFT JOIN` |
| Easy | [181. Employees Earning More Than Managers](https://leetcode.com/problems/employees-earning-more-than-their-managers/) | Self-join |
| Medium | [176. Second Highest Salary](https://leetcode.com/problems/second-highest-salary/) | `LIMIT OFFSET` |
| Medium | [177. Nth Highest Salary](https://leetcode.com/problems/nth-highest-salary/) | CTE + `DENSE_RANK()` |
| Hard | [185. Department Top Three Salaries](https://leetcode.com/problems/department-top-three-salaries/) | Window + Filter |
**Goal**: Solve **50 LeetCode SQL problems** (focus on **Easy 20 + Medium 30**)
---
## SQL Cheat Sheet (Copy-Paste)
```sql
-- SELECT
SELECT *, ROUND(col, 2)
FROM table
WHERE col IS NOT NULL
AND col LIKE '%pattern%'
AND col IN (1,2,3)
ORDER BY col DESC
LIMIT 10;
-- AGGREGATION
SELECT category,
COUNT(*),
SUM(sales),
AVG(price),
MAX(date)
FROM table
GROUP BY category
HAVING COUNT(*) > 5;
-- JOINS
SELECT *
FROM A
INNER JOIN B ON A.id = B.id
LEFT JOIN C ON B.id = C.id;
-- WINDOW
SELECT
col,
ROW_NUMBER() OVER (PARTITION BY group ORDER BY val) AS rn,
RANK() OVER (PARTITION BY group ORDER BY val DESC) AS rank
FROM table;
-- CTE
WITH cte AS (SELECT ...)
SELECT * FROM cte WHERE ...;
Practice Platforms
| Platform | Link |
|---|---|
| Mode Analytics | mode.com/sql-tutorial |
| StrataScratch | stratascratch.com |
| HackerRank SQL | hackerrank.com/domains/sql |
| LeetCode | leetcode.com/problemset/database/ |
| SQLZoo | sqlzoo.net |
Weekly Schedule
| Day | Task |
|---|---|
| Mon–Wed | Learn + write 15 queries |
| Thu | Solve 5 LeetCode (Medium) |
| Fri | Build project query |
| Sat | Review + explain aloud |
| Sun | Rest / blog post |
Assessment: Can You Write This?
| Query | Yes/No |
|---|---|
| Top 3 tracks per genre by length | ☐ |
| Customers who spent > average | ☐ |
| Monthly revenue with YoY % | ☐ |
| Rank employees by sales | ☐ |
All Yes → You’re SQL-ready for interviews!
Pro Tips
- Always
EXPLAINyour query in interviews“I used a CTE to avoid duplicating the subquery…”
- Write comments in production SQL
- Use aliases (
c.FirstName→c."FirstName") - Index matters → know when to suggest
INDEX ON Customer(Country)
Next: Phase 2 – Statistics & Math
Now that you can get the data… learn to understand it.
Free Resources Summary
| Resource | Link |
|---|---|
| Chinook DB | github.com/lerocha/chinook-database |
| Mode SQL Tutorial | mode.com/sql-tutorial |
| LeetCode SQL 50 | leetcode.com/study-plan/sql |
| SQL Cheat Sheet | Save this gist! |
Start Now:
1. Download Chinook.db
2. Open in DB Browser
3. Run:
SELECT Name, Milliseconds/60000.0 AS minutes
FROM Track
ORDER BY minutes DESC
LIMIT 1;
You just found the longest song!
Tag me when you push your SQL repo!
Let’s make you SQL-fluent in 4 weeks.
Detailed SQL for Data Analysis
(Phase 1.5 | 4 Weeks | 4–6 hrs/day) Goal: Master SQL for data extraction, aggregation, and insight generation — the #1 skill for Data Analyst & Data Scientist roles.
Detailed SQL for Data Analysis
Detailed SQL for Data Analysis
Detailed SQL for Data Analysis
(Phase 1.5 | 4 Weeks | 4–6 hrs/day)
Goal: Master SQL for data extraction, aggregation, and insight generation — the #1 skill for Data Analyst & Data Scientist roles.
Used by: 95% of companies | Appears in: 80% of DS interviews | Salary boost: +$10–20K
Why SQL?
| Business Need | SQL Does |
|---|---|
| "Show me sales last quarter" | WHERE date BETWEEN ... |
| "Top 10 customers by revenue" | ORDER BY revenue DESC LIMIT 10 |
| "YoY growth %" | Window functions |
| "Funnel drop-off" | CTEs + COUNT(DISTINCT) |
Week-by-Week Roadmap
| Week | Focus | Hours |
|---|---|---|
| 1 | SQL Basics + SELECT | 25 |
| 2 | Filtering, Aggregation, Joins | 30 |
| 3 | Subqueries, CTEs, Window Functions | 30 |
| 4 | Real-World Project + Interview Prep | 25 |
Tools Setup (Day 1)
| Tool | Install |
|---|---|
| DB Browser for SQLite | sqlitebrowser.org |
| PostgreSQL (Optional) | postgresql.org |
| Online Sandbox | SQLFiddle, DB Fiddle |
Practice DB: Download Chinook Database (Music store with 11 tables)
Week 1: SQL Basics & SELECT
Core Syntax
SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column
LIMIT 10;
Practice Queries (Chinook DB)
| # | Question | Query |
|---|---|---|
| 1 | List all albums | SELECT * FROM Album LIMIT 5; |
| 2 | Find tracks longer than 5 min | SELECT Name, Milliseconds FROM Track WHERE Milliseconds > 300000; |
| 3 | Artists from Brazil | SELECT * FROM Artist WHERE Country = 'Brazil'; |
Key Concepts
| Concept | Example |
|---|---|
DISTINCT |
SELECT DISTINCT Country FROM Customer; |
LIKE |
SELECT Name FROM Track WHERE Name LIKE '%love%'; |
IN |
WHERE GenreId IN (1, 3, 5) |
BETWEEN |
WHERE UnitPrice BETWEEN 0.99 AND 1.99 |
Daily Task: Write 10 queries/day → save in week1.sql
Week 2: Aggregation & Joins
Aggregation
SELECT
GenreId,
COUNT(*) AS track_count,
AVG(Milliseconds)/60000 AS avg_min
FROM Track
GROUP BY GenreId
HAVING COUNT(*) > 100
ORDER BY track_count DESC;
JOIN Types
| Type | Use |
|---|---|
INNER JOIN |
Matching rows |
LEFT JOIN |
All from left |
FULL OUTER |
All from both |
Example: Customer + Invoice
SELECT
c.FirstName || ' ' || c.LastName AS customer,
SUM(i.Total) AS total_spent
FROM Customer c
LEFT JOIN Invoice i ON c.CustomerId = i.CustomerId
GROUP BY c.CustomerId
ORDER BY total_spent DESC
LIMIT 5;
Project: Top 5 Customers by Country
Output: Country, Customer Name, Total Spent
Week 3: Subqueries, CTEs, Window Functions
1. Subquery
SELECT Name, Composer
FROM Track
WHERE GenreId = (
SELECT GenreId FROM Genre WHERE Name = 'Rock'
);
2. CTE (Common Table Expression)
WITH rock_tracks AS (
SELECT * FROM Track
WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock')
)
SELECT Composer, COUNT(*)
FROM rock_tracks
GROUP BY Composer
ORDER BY COUNT(*) DESC;
3. Window Functions (GAME CHANGER)
-- Rank tracks by length per genre
SELECT
Name,
GenreId,
Milliseconds,
RANK() OVER (PARTITION BY GenreId ORDER BY Milliseconds DESC) AS rank_longest
FROM Track;
Advanced Query:
"For each customer, show their total spend and % of country total"
WITH customer_spend AS (
SELECT
c.CustomerId,
c.Country,
SUM(i.Total) AS spend
FROM Customer c
JOIN Invoice i ON c.CustomerId = i.CustomerId
GROUP BY c.CustomerId
),
country_total AS (
SELECT Country, SUM(spend) AS country_spend
FROM customer_spend
GROUP BY Country
)
SELECT
cs.CustomerId,
cs.Country,
cs.spend,
ROUND(100.0 * cs.spend / ct.country_spend, 2) AS pct_of_country
FROM customer_spend cs
JOIN country_total ct ON cs.Country = ct.Country
ORDER BY pct_of_country DESC;
Week 4: Real-World Project + Interview Prep
Final Project: Music Store Business Report
Deliverables (GitHub Repo: yourname/sql-music-analysis)
sql-music-analysis/
├── queries/
│ ├── 01_top_customers.sql
│ ├── 02_genre_performance.sql
│ ├── 03_employee_sales.sql
│ └── report.md
├── data/
│ └── chinook.db
└── README.md
Key Queries to Write
| # | Business Question | SQL File |
|---|---|---|
| 1 | Top 10 customers by total spend | 01_top_customers.sql |
| 2 | Best-selling genre per country | 02_genre_performance.sql |
| 3 | Employee sales performance (with manager) | 03_employee_sales.sql |
| 4 | Tracks never sold | 04_unsold_tracks.sql |
| 5 | Monthly revenue trend | 05_revenue_trend.sql |
report.md Example
# Music Store SQL Analysis
## Key Insights
- **Rock** is the top genre (54% of sales)
- Top customer: **Helena Holý** ($49.62 from Czech Republic)
- **Jane Peacock** is the top-performing employee
- 2 tracks have **never been sold**
## How to Run
```sql
-- Open in DB Browser for SQLite
-- Run queries/01_top_customers.sql
---
## Interview-Ready SQL (LeetCode Style)
| Difficulty | Problem | Link |
|----------|-------|------|
| Easy | [175. Combine Two Tables](https://leetcode.com/problems/combine-two-tables/) | `LEFT JOIN` |
| Easy | [181. Employees Earning More Than Managers](https://leetcode.com/problems/employees-earning-more-than-their-managers/) | Self-join |
| Medium | [176. Second Highest Salary](https://leetcode.com/problems/second-highest-salary/) | `LIMIT OFFSET` |
| Medium | [177. Nth Highest Salary](https://leetcode.com/problems/nth-highest-salary/) | CTE + `DENSE_RANK()` |
| Hard | [185. Department Top Three Salaries](https://leetcode.com/problems/department-top-three-salaries/) | Window + Filter |
**Goal**: Solve **50 LeetCode SQL problems** (focus on **Easy 20 + Medium 30**)
---
## SQL Cheat Sheet (Copy-Paste)
```sql
-- SELECT
SELECT *, ROUND(col, 2)
FROM table
WHERE col IS NOT NULL
AND col LIKE '%pattern%'
AND col IN (1,2,3)
ORDER BY col DESC
LIMIT 10;
-- AGGREGATION
SELECT category,
COUNT(*),
SUM(sales),
AVG(price),
MAX(date)
FROM table
GROUP BY category
HAVING COUNT(*) > 5;
-- JOINS
SELECT *
FROM A
INNER JOIN B ON A.id = B.id
LEFT JOIN C ON B.id = C.id;
-- WINDOW
SELECT
col,
ROW_NUMBER() OVER (PARTITION BY group ORDER BY val) AS rn,
RANK() OVER (PARTITION BY group ORDER BY val DESC) AS rank
FROM table;
-- CTE
WITH cte AS (SELECT ...)
SELECT * FROM cte WHERE ...;
Practice Platforms
| Platform | Link |
|---|---|
| Mode Analytics | mode.com/sql-tutorial |
| StrataScratch | stratascratch.com |
| HackerRank SQL | hackerrank.com/domains/sql |
| LeetCode | leetcode.com/problemset/database/ |
| SQLZoo | sqlzoo.net |
Weekly Schedule
| Day | Task |
|---|---|
| Mon–Wed | Learn + write 15 queries |
| Thu | Solve 5 LeetCode (Medium) |
| Fri | Build project query |
| Sat | Review + explain aloud |
| Sun | Rest / blog post |
Assessment: Can You Write This?
| Query | Yes/No |
|---|---|
| Top 3 tracks per genre by length | ☐ |
| Customers who spent > average | ☐ |
| Monthly revenue with YoY % | ☐ |
| Rank employees by sales | ☐ |
All Yes → You’re SQL-ready for interviews!
Pro Tips
- Always
EXPLAINyour query in interviews“I used a CTE to avoid duplicating the subquery…”
- Write comments in production SQL
- Use aliases (
c.FirstName→c."FirstName") - Index matters → know when to suggest
INDEX ON Customer(Country)
Next: Phase 2 – Statistics & Math
Now that you can get the data… learn to understand it.
Free Resources Summary
| Resource | Link |
|---|---|
| Chinook DB | github.com/lerocha/chinook-database |
| Mode SQL Tutorial | mode.com/sql-tutorial |
| LeetCode SQL 50 | leetcode.com/study-plan/sql |
| SQL Cheat Sheet | Save this gist! |
Start Now:
1. Download Chinook.db
2. Open in DB Browser
3. Run:
SELECT Name, Milliseconds/60000.0 AS minutes
FROM Track
ORDER BY minutes DESC
LIMIT 1;
You just found the longest song!
Tag me when you push your SQL repo!
Let’s make you SQL-fluent in 4 weeks.