Data Visualization
Goal: Tell Stories with Data Tools: Matplotlib, Seaborn, Tableau Public
Data Visualization
Data Visualization
Phase 3: Data Visualization (Month 4)
Goal: Tell Stories with Data
Why?
- 80% of DS interviews ask: "Walk me through your plot"
- 1 chart > 1000 rows
- Land $10K+ in salary for storytelling
| Week | Focus | Hours |
|---|---|---|
| 1 | Python Plotting (Matplotlib/Seaborn) | 35 |
| 2 | EDA + Storytelling | 35 |
| 3 | Tableau Public Mastery | 35 |
| 4 | Capstone: Executive Dashboard | 30 |
Week 1: Python Plotting – Matplotlib & Seaborn
Core Libraries
pip install matplotlib seaborn plotly
Essential Plot Types
| Plot | Use | Code |
|---|---|---|
| Line | Trends | sns.lineplot(x, y) |
| Bar | Compare categories | sns.barplot(x, y) |
| Histogram | Distribution | sns.histplot(data) |
| Box | Outliers, quartiles | sns.boxplot(x, y) |
| Scatter | Correlation | sns.scatterplot(x, y) |
| Heatmap | Correlation matrix | sns.heatmap(corr) |
Pro Code Template
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load data
df = pd.read_csv("titanic.csv")
# Style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
sns.barplot(data=df, x="Pclass", y="Survived", hue="Sex", ax=ax, errorbar=None)
# Labels
ax.set_title("Survival Rate by Class & Gender", fontsize=16, fontweight='bold')
ax.set_xlabel("Passenger Class", fontsize=12)
ax.set_ylabel("Survival Rate", fontsize=12)
ax.legend(title="Gender")
# Annotate
for p in ax.patches:
ax.annotate(f'{p.get_height():.1%}',
(p.get_x() + p.get_width()/2, p.get_height()),
ha='center', va='bottom', fontsize=10)
plt.tight_layout()
plt.savefig("survival_by_class_gender.png", dpi=300)
plt.show()
Resources:
- Python Graph Gallery – python-graph-gallery.com
- Seaborn Docs – seaborn.pydata.org
Week 2: EDA + Storytelling Framework
5-Second Rule: Can a busy exec understand in 5 sec?
Storytelling Framework (McKinsey Style)
graph TD
A[Context] --> B[Insight]
B --> C[Action]
| Step | Example |
|---|---|
| Context | "Titanic had 2224 passengers" |
| Insight | "Women in 1st class: 97% survived" |
| Action | "Prioritize women & children in evacuation" |
EDA Checklist
df.describe()
df.isnull().sum()
sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
sns.pairplot(df, hue="Survived")
Project: Titanic Survival Story
3 plots + 1 insight per plot →
eda_titanic.ipynb
Week 3: Tableau Public – Drag, Drop, Wow
Install: Tableau Public (Free)
Core Skills
| Skill | How |
|---|---|
| Connect | CSV, Google Sheets |
| Calculated Field | IF [Pclass] = 1 THEN "Rich" ELSE "Poor" END |
| Parameters | Dynamic filters |
| Dashboard | 3+ sheets + actions |
| Story | Sequence of insights |
Build 3 Dashboards
| # | Dashboard | Dataset |
|---|---|---|
| 1 | Sales Performance | Sample Superstore |
| 2 | Customer Segmentation | RFM Analysis |
| 3 | Funnel Analysis | E-commerce funnel |
Publish: public.tableau.com → Share link
Week 4: Capstone – Executive Dashboard
Project: "Global Happiness Report 2023"
Dataset: World Happiness Report
Deliverables (GitHub: yourname/data-viz-capstone)
data-viz-capstone/
├── python/
│ ├── eda_happiness.ipynb
│ └── plots/
│ ├── happiness_vs_gdp.png
│ └── top10_happiest.png
├── tableau/
│ ├── Happiness_Dashboard.twb
│ └── Happiness_Dashboard.png
├── streamlit/
│ └── app.py
└── README.md
1. Python: Key Insights
# Top 10 happiest countries
top10 = df.nlargest(10, 'Happiness Score')
sns.barplot(data=top10, x='Happiness Score', y='Country', palette='viridis')
plt.title("Top 10 Happiest Countries (2023)")
plt.xlabel("Happiness Score")
plt.savefig("plots/top10_happiest.png", dpi=300, bbox_inches='tight')
2. Tableau: Interactive Dashboard
Sheets:
1. Map (Happiness by Country)
2. Scatter (GDP vs Happiness)
3. Bar (Top/Bottom 10)
4. Trend (Happiness over years)
Actions:
- Filter: Region
- Highlight: Click country
Publish: tableau.com/your-viz
3. Streamlit: Live App (Bonus)
# streamlit/app.py
import streamlit as st
import plotly.express as px
st.title("World Happiness Dashboard")
df = pd.read_csv("../data/happiness.csv")
region = st.selectbox("Select Region", df['Region'].unique())
filtered = df[df['Region'] == region]
fig = px.scatter(filtered, x="GDP per capita", y="Happiness Score",
size="Population", color="Country", hover_name="Country",
title=f"Happiness vs GDP in {region}")
st.plotly_chart(fig)
streamlit run streamlit/app.py
README.md (Portfolio Gold)
# World Happiness Dashboard
**Live**: [streamlit.app/happiness](https://yourname-happiness.streamlit.app)
**Tableau**: [public.tableau.com](https://public.tableau.com/views/WorldHappiness2023/Dashboard)
**Python EDA**: [notebook](python/eda_happiness.ipynb)
## Key Insights
| Insight | Action |
|-------|--------|
| GDP explains 75% of happiness | Invest in economy |
| Social support > Freedom | Build community programs |
| Nordic countries dominate top 10 | Study their policies |
## Tech
- Python: Matplotlib, Seaborn, Plotly
- Tableau Public: Interactive dashboard
- Streamlit: Live web app
Interview-Ready Plots
| Question | Your Plot |
|---|---|
| "Show correlation" | sns.heatmap(corr, annot=True) |
| "Outliers?" | sns.boxplot() |
| "Trend over time?" | sns.lineplot() |
| "Compare groups?" | sns.catplot() |
Assessment: Can You Build This?
| Task | Yes/No |
|---|---|
| Python: 5-plot EDA | ☐ |
| Tableau: Interactive dashboard | ☐ |
| Streamlit: Live filter | ☐ |
| 3 insights with actions | ☐ |
| Published + shared | ☐ |
All Yes → You’re visualization-ready!
Free Resources Summary
| Tool | Link |
|---|---|
| Python Graph Gallery | python-graph-gallery.com |
| Seaborn Examples | seaborn.pydata.org/examples |
| Tableau Public | public.tableau.com |
| Sample Superstore | tableau.com/sample-data |
| Streamlit Docs | docs.streamlit.io |
Pro Tips
- Never use default colors →
sns.set_palette("colorblind") - Annotate everything →
%,n=,p<0.01 - Export high-res →
dpi=300 - Tell a story → Context → Insight → Action
- Add to resume:
"Built interactive Tableau dashboard with 10K+ views"
Next: Phase 4 – Machine Learning Core
You can show data → now predict it.
Start Now:
1. Download World Happiness Report
2. Open Jupyter:
import seaborn as sns
df = pd.read_csv("happiness.csv")
sns.scatterplot(data=df, x="GDP per capita", y="Happiness Score", hue="Region")
- Save plot → Push to GitHub
Tag me when you publish your Tableau viz!
You now communicate like a senior analyst.
Data Visualization
Goal: Tell Stories with Data Tools: Matplotlib, Seaborn, Tableau Public
Data Visualization
Data Visualization
Phase 3: Data Visualization (Month 4)
Goal: Tell Stories with Data
Why?
- 80% of DS interviews ask: "Walk me through your plot"
- 1 chart > 1000 rows
- Land $10K+ in salary for storytelling
| Week | Focus | Hours |
|---|---|---|
| 1 | Python Plotting (Matplotlib/Seaborn) | 35 |
| 2 | EDA + Storytelling | 35 |
| 3 | Tableau Public Mastery | 35 |
| 4 | Capstone: Executive Dashboard | 30 |
Week 1: Python Plotting – Matplotlib & Seaborn
Core Libraries
pip install matplotlib seaborn plotly
Essential Plot Types
| Plot | Use | Code |
|---|---|---|
| Line | Trends | sns.lineplot(x, y) |
| Bar | Compare categories | sns.barplot(x, y) |
| Histogram | Distribution | sns.histplot(data) |
| Box | Outliers, quartiles | sns.boxplot(x, y) |
| Scatter | Correlation | sns.scatterplot(x, y) |
| Heatmap | Correlation matrix | sns.heatmap(corr) |
Pro Code Template
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load data
df = pd.read_csv("titanic.csv")
# Style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
sns.barplot(data=df, x="Pclass", y="Survived", hue="Sex", ax=ax, errorbar=None)
# Labels
ax.set_title("Survival Rate by Class & Gender", fontsize=16, fontweight='bold')
ax.set_xlabel("Passenger Class", fontsize=12)
ax.set_ylabel("Survival Rate", fontsize=12)
ax.legend(title="Gender")
# Annotate
for p in ax.patches:
ax.annotate(f'{p.get_height():.1%}',
(p.get_x() + p.get_width()/2, p.get_height()),
ha='center', va='bottom', fontsize=10)
plt.tight_layout()
plt.savefig("survival_by_class_gender.png", dpi=300)
plt.show()
Resources:
- Python Graph Gallery – python-graph-gallery.com
- Seaborn Docs – seaborn.pydata.org
Week 2: EDA + Storytelling Framework
5-Second Rule: Can a busy exec understand in 5 sec?
Storytelling Framework (McKinsey Style)
graph TD
A[Context] --> B[Insight]
B --> C[Action]
| Step | Example |
|---|---|
| Context | "Titanic had 2224 passengers" |
| Insight | "Women in 1st class: 97% survived" |
| Action | "Prioritize women & children in evacuation" |
EDA Checklist
df.describe()
df.isnull().sum()
sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
sns.pairplot(df, hue="Survived")
Project: Titanic Survival Story
3 plots + 1 insight per plot →
eda_titanic.ipynb
Week 3: Tableau Public – Drag, Drop, Wow
Install: Tableau Public (Free)
Core Skills
| Skill | How |
|---|---|
| Connect | CSV, Google Sheets |
| Calculated Field | IF [Pclass] = 1 THEN "Rich" ELSE "Poor" END |
| Parameters | Dynamic filters |
| Dashboard | 3+ sheets + actions |
| Story | Sequence of insights |
Build 3 Dashboards
| # | Dashboard | Dataset |
|---|---|---|
| 1 | Sales Performance | Sample Superstore |
| 2 | Customer Segmentation | RFM Analysis |
| 3 | Funnel Analysis | E-commerce funnel |
Publish: public.tableau.com → Share link
Week 4: Capstone – Executive Dashboard
Project: "Global Happiness Report 2023"
Dataset: World Happiness Report
Deliverables (GitHub: yourname/data-viz-capstone)
data-viz-capstone/
├── python/
│ ├── eda_happiness.ipynb
│ └── plots/
│ ├── happiness_vs_gdp.png
│ └── top10_happiest.png
├── tableau/
│ ├── Happiness_Dashboard.twb
│ └── Happiness_Dashboard.png
├── streamlit/
│ └── app.py
└── README.md
1. Python: Key Insights
# Top 10 happiest countries
top10 = df.nlargest(10, 'Happiness Score')
sns.barplot(data=top10, x='Happiness Score', y='Country', palette='viridis')
plt.title("Top 10 Happiest Countries (2023)")
plt.xlabel("Happiness Score")
plt.savefig("plots/top10_happiest.png", dpi=300, bbox_inches='tight')
2. Tableau: Interactive Dashboard
Sheets:
1. Map (Happiness by Country)
2. Scatter (GDP vs Happiness)
3. Bar (Top/Bottom 10)
4. Trend (Happiness over years)
Actions:
- Filter: Region
- Highlight: Click country
Publish: tableau.com/your-viz
3. Streamlit: Live App (Bonus)
# streamlit/app.py
import streamlit as st
import plotly.express as px
st.title("World Happiness Dashboard")
df = pd.read_csv("../data/happiness.csv")
region = st.selectbox("Select Region", df['Region'].unique())
filtered = df[df['Region'] == region]
fig = px.scatter(filtered, x="GDP per capita", y="Happiness Score",
size="Population", color="Country", hover_name="Country",
title=f"Happiness vs GDP in {region}")
st.plotly_chart(fig)
streamlit run streamlit/app.py
README.md (Portfolio Gold)
# World Happiness Dashboard
**Live**: [streamlit.app/happiness](https://yourname-happiness.streamlit.app)
**Tableau**: [public.tableau.com](https://public.tableau.com/views/WorldHappiness2023/Dashboard)
**Python EDA**: [notebook](python/eda_happiness.ipynb)
## Key Insights
| Insight | Action |
|-------|--------|
| GDP explains 75% of happiness | Invest in economy |
| Social support > Freedom | Build community programs |
| Nordic countries dominate top 10 | Study their policies |
## Tech
- Python: Matplotlib, Seaborn, Plotly
- Tableau Public: Interactive dashboard
- Streamlit: Live web app
Interview-Ready Plots
| Question | Your Plot |
|---|---|
| "Show correlation" | sns.heatmap(corr, annot=True) |
| "Outliers?" | sns.boxplot() |
| "Trend over time?" | sns.lineplot() |
| "Compare groups?" | sns.catplot() |
Assessment: Can You Build This?
| Task | Yes/No |
|---|---|
| Python: 5-plot EDA | ☐ |
| Tableau: Interactive dashboard | ☐ |
| Streamlit: Live filter | ☐ |
| 3 insights with actions | ☐ |
| Published + shared | ☐ |
All Yes → You’re visualization-ready!
Free Resources Summary
| Tool | Link |
|---|---|
| Python Graph Gallery | python-graph-gallery.com |
| Seaborn Examples | seaborn.pydata.org/examples |
| Tableau Public | public.tableau.com |
| Sample Superstore | tableau.com/sample-data |
| Streamlit Docs | docs.streamlit.io |
Pro Tips
- Never use default colors →
sns.set_palette("colorblind") - Annotate everything →
%,n=,p<0.01 - Export high-res →
dpi=300 - Tell a story → Context → Insight → Action
- Add to resume:
"Built interactive Tableau dashboard with 10K+ views"
Next: Phase 4 – Machine Learning Core
You can show data → now predict it.
Start Now:
1. Download World Happiness Report
2. Open Jupyter:
import seaborn as sns
df = pd.read_csv("happiness.csv")
sns.scatterplot(data=df, x="GDP per capita", y="Happiness Score", hue="Region")
- Save plot → Push to GitHub
Tag me when you publish your Tableau viz!
You now communicate like a senior analyst.