Statistics & Math
Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.
Detailed Phase 1: Python Foundations for Data Science
Detailed Phase 1: Python Foundations for Data Science
Detailed Phase 1: Python Foundations for Data Science
Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.
Week-by-Week Breakdown
| Week | Focus | Hours |
|---|---|---|
| 1 | Python Basics | 25 |
| 2 | Control Flow + Functions | 25 |
| 3 | Data Structures Deep Dive | 25 |
| 4 | File I/O + Error Handling | 20 |
| 5 | NumPy Mastery | 25 |
| 6 | Pandas Core | 30 |
| 7 | Data Cleaning & EDA | 30 |
| 8 | Mini-Project + GitHub | 20 |
Week 1: Python Basics
Topics
| Topic | Details |
|---|---|
| Variables | int, float, str, bool |
| Basic Operations | + - * / // % ** |
| Type Conversion | int(), float(), str() |
| Strings | Indexing, slicing, .split(), .join(), f-strings |
| Print & Input | print(), input() |
Practice (Daily)
# Day 1
name = input("Enter name: ")
age = int(input("Age: "))
print(f"Hello {name}, you will be {age + 5} in 5 years!")
Resources
Mini-Task: Build a tip calculator
Input: bill, tip %, people → Output: each person pays $X.XX
Week 2: Control Flow & Functions
| Topic | Syntax |
|---|---|
if/elif/else |
if x > 0: ... |
| Loops | for i in range(10):, while x < 5: |
| List Comprehensions | [x**2 for x in range(5)] |
| Functions | def func_name(params): |
*args, **kwargs |
Optional later |
Practice
def grade_score(score):
if score >= 90: return "A"
elif score >= 80: return "B"
# ...
Resources
- Automate the Boring Stuff – Ch 4–6
- Real Python – Functions
Project: FizzBuzz + Prime Checker
Write two functions:
1.fizzbuzz(n)→ prints 1 to n with rules
2.is_prime(n)→ returns True/False
Week 3: Data Structures Deep Dive
| Structure | Use Case |
|---|---|
list |
Ordered, mutable |
tuple |
Immutable, faster |
dict |
Key-value pairs |
set |
Unique, unordered |
Key Methods
# List
lst = [1, 2, 3]
lst.append(4), lst.pop(), lst[1:3]
# Dict
d = {"name": "Alex", "age": 25}
d.keys(), d.values(), d.items()
# Set
a = {1,2,3}; b = {3,4,5}; a & b # intersection
Practice
# Count word frequency
text = "the cat and the dog and the bird"
words = text.split()
freq = {}
for w in words:
freq[w] = freq.get(w, 0) + 1
Project: To-Do List CLI App
Add, remove, list tasks → save to
.txt
Week 4: File Handling + Error Handling
| Topic | Code |
|---|---|
| Read/Write | with open('file.txt', 'r') as f: |
| CSV | import csv |
| JSON | import json |
| Try/Except | try: ... except ValueError: |
Example: Read CSV
import csv
with open('data.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
print(row['name'], row['age'])
Resources
- Automate the Boring Stuff – Ch 8–9
- Real Python – File I/O
Mini-Project: Student Gradebook
Read
grades.csv→ calculate average → writesummary.txt
Week 5: NumPy – Numerical Python
| Concept | Code |
|---|---|
| Arrays | np.array([1,2,3]) |
| Shape | .shape, .reshape() |
| Math | np.mean(), np.std() |
| Indexing | Boolean, fancy |
| Broadcasting | arr + 5 |
Practice
import numpy as np
arr = np.random.randn(1000)
print(f"Mean: {arr.mean():.2f}, Std: {arr.std():.2f}")
Resources
- NumPy Official Quickstart
- Kaggle: Python Course → NumPy
Task:
Generate 1000 random heights (normal dist: μ=170, σ=10) → find % > 180 cm
Week 6: Pandas – Data Manipulation
| Core Object | Use |
|---|---|
Series |
1D labeled array |
DataFrame |
2D table |
Essential Methods
| Task | Code |
|---|---|
| Read CSV | pd.read_csv('file.csv') |
| View | .head(), .info(), .describe() |
| Select | df['col'], df.loc[], df.iloc[] |
| Filter | df[df['age'] > 30] |
| GroupBy | df.groupby('city').mean() |
| Merge | pd.merge(df1, df2, on='id') |
Example
import pandas as pd
df = pd.read_csv("titanic.csv")
df = df.dropna(subset=['Age'])
adults = df[df['Age'] > 18]
survival_rate = adults['Survived'].mean()
Resources
- 10 Minutes to Pandas
- Kaggle: Pandas Course (Free)
Practice Dataset: Titanic
Week 7: Data Cleaning & EDA
| Task | Code |
|---|---|
| Missing Values | df.isnull().sum(), df.fillna(), df.dropna() |
| Duplicates | df.duplicated(), df.drop_duplicates() |
| Outliers | Z-score or IQR method |
| Type Fix | df['age'] = df['age'].astype(int) |
| New Columns | df['family_size'] = df['sibsp'] + df['parch'] + 1 |
EDA Checklist
df.describe()
df['column'].value_counts()
df.corr()
sns.heatmap(df.corr(), annot=True)
Project: Titanic Survival Analysis
Clean data → EDA → answer:
- Survival rate by gender?
- Did age affect survival?
- Fare vs survival?
Week 8: Mini-Project + GitHub
Final Project: Titanic Data Explorer
Deliverables:
1. Jupyter Notebook: titanic_analysis.ipynb
2. Cleaned dataset: titanic_clean.csv
3. GitHub Repo: yourname/titanic-ds
4. README.md with:
- Problem statement
- Key findings (3 bullet points)
- Charts (embed or link)
- How to run
GitHub Setup
git init
git add .
git commit -m "Titanic EDA complete"
git remote add origin https://github.com/yourname/titanic-ds.git
git push -u origin main
README Template
# Titanic Survival Analysis
## Key Insights
- Women survived at 74% vs men at 19%
- 1st class: 63% survival
- Children (<12) had highest survival
## How to Run
```bash
pip install pandas matplotlib seaborn
jupyter notebook titanic_analysis.ipynb
Visualizations

---
## Tools to Install (Week 1)
```bash
# Anaconda (recommended)
https://www.anaconda.com/products/distribution
# Or via pip
pip install pandas numpy matplotlib seaborn jupyter
Daily Learning Template (60 mins)
| Time | Activity |
|---|---|
| 10 min | Review yesterday |
| 30 min | Watch/read new topic |
| 15 min | Code along |
| 5 min | Write notes (Notion/Obsidian) |
Assessment: Can You Do This?
| Task | Yes/No |
|---|---|
| Read CSV into DataFrame | ☐ |
| Filter passengers > 30 years | ☐ |
| Group by class and compute mean fare | ☐ |
| Plot survival rate by gender | ☐ |
| Save cleaned data to new CSV | ☐ |
If all Yes → You passed Phase 1!
Next: Phase 2 – Statistics & Math
“Garbage in, garbage out.” Learn why models work.
Free Resources Cheat Sheet
| Resource | Link |
|---|---|
| Automate the Boring Stuff | automatetheboringstuff.com |
| Kaggle Python | kaggle.com/learn/python |
| Kaggle Pandas | kaggle.com/learn/pandas |
| Pandas 10min | pandas.pydata.org/10min |
| NumPy Quickstart | numpy.org/quickstart |
Pro Tip: Build a “Cheat Sheet”
Create python_cheat_sheet.md:
# Python for DS
## Pandas
df.head() → first 5 rows
df['col'].mean()
df.groupby('cat').size()
Update daily.
Start Now:
1. Open terminal
2. jupyter notebook
3. Create week1_day1.ipynb
4. Write: print("I will be a Data Scientist")
Tag me on LinkedIn when you push your first repo!
Let’s make Phase 1 legendary.
Statistics & Math
Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.
Detailed Phase 1: Python Foundations for Data Science
Detailed Phase 1: Python Foundations for Data Science
Detailed Phase 1: Python Foundations for Data Science
Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.
Week-by-Week Breakdown
| Week | Focus | Hours |
|---|---|---|
| 1 | Python Basics | 25 |
| 2 | Control Flow + Functions | 25 |
| 3 | Data Structures Deep Dive | 25 |
| 4 | File I/O + Error Handling | 20 |
| 5 | NumPy Mastery | 25 |
| 6 | Pandas Core | 30 |
| 7 | Data Cleaning & EDA | 30 |
| 8 | Mini-Project + GitHub | 20 |
Week 1: Python Basics
Topics
| Topic | Details |
|---|---|
| Variables | int, float, str, bool |
| Basic Operations | + - * / // % ** |
| Type Conversion | int(), float(), str() |
| Strings | Indexing, slicing, .split(), .join(), f-strings |
| Print & Input | print(), input() |
Practice (Daily)
# Day 1
name = input("Enter name: ")
age = int(input("Age: "))
print(f"Hello {name}, you will be {age + 5} in 5 years!")
Resources
Mini-Task: Build a tip calculator
Input: bill, tip %, people → Output: each person pays $X.XX
Week 2: Control Flow & Functions
| Topic | Syntax |
|---|---|
if/elif/else |
if x > 0: ... |
| Loops | for i in range(10):, while x < 5: |
| List Comprehensions | [x**2 for x in range(5)] |
| Functions | def func_name(params): |
*args, **kwargs |
Optional later |
Practice
def grade_score(score):
if score >= 90: return "A"
elif score >= 80: return "B"
# ...
Resources
- Automate the Boring Stuff – Ch 4–6
- Real Python – Functions
Project: FizzBuzz + Prime Checker
Write two functions:
1.fizzbuzz(n)→ prints 1 to n with rules
2.is_prime(n)→ returns True/False
Week 3: Data Structures Deep Dive
| Structure | Use Case |
|---|---|
list |
Ordered, mutable |
tuple |
Immutable, faster |
dict |
Key-value pairs |
set |
Unique, unordered |
Key Methods
# List
lst = [1, 2, 3]
lst.append(4), lst.pop(), lst[1:3]
# Dict
d = {"name": "Alex", "age": 25}
d.keys(), d.values(), d.items()
# Set
a = {1,2,3}; b = {3,4,5}; a & b # intersection
Practice
# Count word frequency
text = "the cat and the dog and the bird"
words = text.split()
freq = {}
for w in words:
freq[w] = freq.get(w, 0) + 1
Project: To-Do List CLI App
Add, remove, list tasks → save to
.txt
Week 4: File Handling + Error Handling
| Topic | Code |
|---|---|
| Read/Write | with open('file.txt', 'r') as f: |
| CSV | import csv |
| JSON | import json |
| Try/Except | try: ... except ValueError: |
Example: Read CSV
import csv
with open('data.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
print(row['name'], row['age'])
Resources
- Automate the Boring Stuff – Ch 8–9
- Real Python – File I/O
Mini-Project: Student Gradebook
Read
grades.csv→ calculate average → writesummary.txt
Week 5: NumPy – Numerical Python
| Concept | Code |
|---|---|
| Arrays | np.array([1,2,3]) |
| Shape | .shape, .reshape() |
| Math | np.mean(), np.std() |
| Indexing | Boolean, fancy |
| Broadcasting | arr + 5 |
Practice
import numpy as np
arr = np.random.randn(1000)
print(f"Mean: {arr.mean():.2f}, Std: {arr.std():.2f}")
Resources
- NumPy Official Quickstart
- Kaggle: Python Course → NumPy
Task:
Generate 1000 random heights (normal dist: μ=170, σ=10) → find % > 180 cm
Week 6: Pandas – Data Manipulation
| Core Object | Use |
|---|---|
Series |
1D labeled array |
DataFrame |
2D table |
Essential Methods
| Task | Code |
|---|---|
| Read CSV | pd.read_csv('file.csv') |
| View | .head(), .info(), .describe() |
| Select | df['col'], df.loc[], df.iloc[] |
| Filter | df[df['age'] > 30] |
| GroupBy | df.groupby('city').mean() |
| Merge | pd.merge(df1, df2, on='id') |
Example
import pandas as pd
df = pd.read_csv("titanic.csv")
df = df.dropna(subset=['Age'])
adults = df[df['Age'] > 18]
survival_rate = adults['Survived'].mean()
Resources
- 10 Minutes to Pandas
- Kaggle: Pandas Course (Free)
Practice Dataset: Titanic
Week 7: Data Cleaning & EDA
| Task | Code |
|---|---|
| Missing Values | df.isnull().sum(), df.fillna(), df.dropna() |
| Duplicates | df.duplicated(), df.drop_duplicates() |
| Outliers | Z-score or IQR method |
| Type Fix | df['age'] = df['age'].astype(int) |
| New Columns | df['family_size'] = df['sibsp'] + df['parch'] + 1 |
EDA Checklist
df.describe()
df['column'].value_counts()
df.corr()
sns.heatmap(df.corr(), annot=True)
Project: Titanic Survival Analysis
Clean data → EDA → answer:
- Survival rate by gender?
- Did age affect survival?
- Fare vs survival?
Week 8: Mini-Project + GitHub
Final Project: Titanic Data Explorer
Deliverables:
1. Jupyter Notebook: titanic_analysis.ipynb
2. Cleaned dataset: titanic_clean.csv
3. GitHub Repo: yourname/titanic-ds
4. README.md with:
- Problem statement
- Key findings (3 bullet points)
- Charts (embed or link)
- How to run
GitHub Setup
git init
git add .
git commit -m "Titanic EDA complete"
git remote add origin https://github.com/yourname/titanic-ds.git
git push -u origin main
README Template
# Titanic Survival Analysis
## Key Insights
- Women survived at 74% vs men at 19%
- 1st class: 63% survival
- Children (<12) had highest survival
## How to Run
```bash
pip install pandas matplotlib seaborn
jupyter notebook titanic_analysis.ipynb
Visualizations

---
## Tools to Install (Week 1)
```bash
# Anaconda (recommended)
https://www.anaconda.com/products/distribution
# Or via pip
pip install pandas numpy matplotlib seaborn jupyter
Daily Learning Template (60 mins)
| Time | Activity |
|---|---|
| 10 min | Review yesterday |
| 30 min | Watch/read new topic |
| 15 min | Code along |
| 5 min | Write notes (Notion/Obsidian) |
Assessment: Can You Do This?
| Task | Yes/No |
|---|---|
| Read CSV into DataFrame | ☐ |
| Filter passengers > 30 years | ☐ |
| Group by class and compute mean fare | ☐ |
| Plot survival rate by gender | ☐ |
| Save cleaned data to new CSV | ☐ |
If all Yes → You passed Phase 1!
Next: Phase 2 – Statistics & Math
“Garbage in, garbage out.” Learn why models work.
Free Resources Cheat Sheet
| Resource | Link |
|---|---|
| Automate the Boring Stuff | automatetheboringstuff.com |
| Kaggle Python | kaggle.com/learn/python |
| Kaggle Pandas | kaggle.com/learn/pandas |
| Pandas 10min | pandas.pydata.org/10min |
| NumPy Quickstart | numpy.org/quickstart |
Pro Tip: Build a “Cheat Sheet”
Create python_cheat_sheet.md:
# Python for DS
## Pandas
df.head() → first 5 rows
df['col'].mean()
df.groupby('cat').size()
Update daily.
Start Now:
1. Open terminal
2. jupyter notebook
3. Create week1_day1.ipynb
4. Write: print("I will be a Data Scientist")
Tag me on LinkedIn when you push your first repo!
Let’s make Phase 1 legendary.