Statistics & Math

Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.

Detailed Phase 1: Python Foundations for Data Science

Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.

Week-by-Week Breakdown

Week	Focus	Hours
1	Python Basics	25
2	Control Flow + Functions	25
3	Data Structures Deep Dive	25
4	File I/O + Error Handling	20
5	NumPy Mastery	25
6	Pandas Core	30
7	Data Cleaning & EDA	30
8	Mini-Project + GitHub	20

Week 1: Python Basics

Topics

Topic	Details
Variables	`int`, `float`, `str`, `bool`
Basic Operations	`+ - * / // % **`
Type Conversion	`int()`, `float()`, `str()`
Strings	Indexing, slicing, `.split()`, `.join()`, f-strings
Print & Input	`print()`, `input()`

Practice (Daily)

# Day 1
name = input("Enter name: ")
age = int(input("Age: "))
print(f"Hello {name}, you will be {age + 5} in 5 years!")

Resources

Mini-Task: Build a tip calculator

Input: bill, tip %, people → Output: each person pays $X.XX

Week 2: Control Flow & Functions

Topic	Syntax
`if/elif/else`	`if x > 0: ...`
Loops	`for i in range(10):`, `while x < 5:`
List Comprehensions	`[x**2 for x in range(5)]`
Functions	`def func_name(params):`
`args`, `*kwargs`	Optional later

Practice

def grade_score(score):
    if score >= 90: return "A"
    elif score >= 80: return "B"
    # ...

Resources

Automate the Boring Stuff – Ch 4–6
Real Python – Functions

Project: FizzBuzz + Prime Checker

Write two functions:
1. fizzbuzz(n) → prints 1 to n with rules
2. is_prime(n) → returns True/False

Week 3: Data Structures Deep Dive

Structure	Use Case
`list`	Ordered, mutable
`tuple`	Immutable, faster
`dict`	Key-value pairs
`set`	Unique, unordered

Key Methods

# List
lst = [1, 2, 3]
lst.append(4), lst.pop(), lst[1:3]

# Dict
d = {"name": "Alex", "age": 25}
d.keys(), d.values(), d.items()

# Set
a = {1,2,3}; b = {3,4,5}; a & b  # intersection

Practice

# Count word frequency
text = "the cat and the dog and the bird"
words = text.split()
freq = {}
for w in words:
    freq[w] = freq.get(w, 0) + 1

Project: To-Do List CLI App

Add, remove, list tasks → save to .txt

Week 4: File Handling + Error Handling

Topic	Code
Read/Write	`with open('file.txt', 'r') as f:`
CSV	`import csv`
JSON	`import json`
Try/Except	`try: ... except ValueError:`

Example: Read CSV

import csv
with open('data.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['name'], row['age'])

Resources

Automate the Boring Stuff – Ch 8–9
Real Python – File I/O

Mini-Project: Student Gradebook

Read grades.csv → calculate average → write summary.txt

Week 5: NumPy – Numerical Python

Concept	Code
Arrays	`np.array([1,2,3])`
Shape	`.shape`, `.reshape()`
Math	`np.mean()`, `np.std()`
Indexing	Boolean, fancy
Broadcasting	`arr + 5`

Practice

import numpy as np
arr = np.random.randn(1000)
print(f"Mean: {arr.mean():.2f}, Std: {arr.std():.2f}")

Resources

NumPy Official Quickstart
Kaggle: Python Course → NumPy

Task:

Generate 1000 random heights (normal dist: μ=170, σ=10) → find % > 180 cm

Week 6: Pandas – Data Manipulation

Core Object	Use
`Series`	1D labeled array
`DataFrame`	2D table

Essential Methods

Task	Code
Read CSV	`pd.read_csv('file.csv')`
View	`.head()`, `.info()`, `.describe()`
Select	`df['col']`, `df.loc[]`, `df.iloc[]`
Filter	`df[df['age'] > 30]`
GroupBy	`df.groupby('city').mean()`
Merge	`pd.merge(df1, df2, on='id')`

Example

import pandas as pd
df = pd.read_csv("titanic.csv")
df = df.dropna(subset=['Age'])
adults = df[df['Age'] > 18]
survival_rate = adults['Survived'].mean()

Resources

10 Minutes to Pandas
Kaggle: Pandas Course (Free)

Practice Dataset: Titanic

Week 7: Data Cleaning & EDA

Task	Code
Missing Values	`df.isnull().sum()`, `df.fillna()`, `df.dropna()`
Duplicates	`df.duplicated()`, `df.drop_duplicates()`
Outliers	Z-score or IQR method
Type Fix	`df['age'] = df['age'].astype(int)`
New Columns	`df['family_size'] = df['sibsp'] + df['parch'] + 1`

EDA Checklist

df.describe()
df['column'].value_counts()
df.corr()
sns.heatmap(df.corr(), annot=True)

Project: Titanic Survival Analysis

Clean data → EDA → answer:
- Survival rate by gender?
- Did age affect survival?
- Fare vs survival?

Week 8: Mini-Project + GitHub

Final Project: Titanic Data Explorer

Deliverables:
1. Jupyter Notebook: titanic_analysis.ipynb
2. Cleaned dataset: titanic_clean.csv
3. GitHub Repo: yourname/titanic-ds
4. README.md with:
- Problem statement
- Key findings (3 bullet points)
- Charts (embed or link)
- How to run

GitHub Setup

git init
git add .
git commit -m "Titanic EDA complete"
git remote add origin https://github.com/yourname/titanic-ds.git
git push -u origin main

README Template

# Titanic Survival Analysis

## Key Insights
- Women survived at 74% vs men at 19%
- 1st class: 63% survival
- Children (<12) had highest survival

## How to Run
```bash
pip install pandas matplotlib seaborn
jupyter notebook titanic_analysis.ipynb

Visualizations

Survival by Gender

---

## Tools to Install (Week 1)
```bash
# Anaconda (recommended)
https://www.anaconda.com/products/distribution

# Or via pip
pip install pandas numpy matplotlib seaborn jupyter

Daily Learning Template (60 mins)

Time	Activity
10 min	Review yesterday
30 min	Watch/read new topic
15 min	Code along
5 min	Write notes (Notion/Obsidian)

Assessment: Can You Do This?

Task	Yes/No
Read CSV into DataFrame	☐
Filter passengers > 30 years	☐
Group by class and compute mean fare	☐
Plot survival rate by gender	☐
Save cleaned data to new CSV	☐

If all Yes → You passed Phase 1!

Next: Phase 2 – Statistics & Math

“Garbage in, garbage out.” Learn why models work.

Free Resources Cheat Sheet

Resource	Link
Automate the Boring Stuff	automatetheboringstuff.com
Kaggle Python	kaggle.com/learn/python
Kaggle Pandas	kaggle.com/learn/pandas
Pandas 10min	pandas.pydata.org/10min
NumPy Quickstart	numpy.org/quickstart

Pro Tip: Build a “Cheat Sheet”

Create python_cheat_sheet.md:

# Python for DS

## Pandas
df.head() → first 5 rows
df['col'].mean()
df.groupby('cat').size()

Update daily.

Start Now:
1. Open terminal
2. jupyter notebook
3. Create week1_day1.ipynb
4. Write: print("I will be a Data Scientist")

Tag me on LinkedIn when you push your first repo!
Let’s make Phase 1 legendary.

Last updated: Nov 09, 2025

Statistics & Math

Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.

Detailed Phase 1: Python Foundations for Data Science

Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.

Week-by-Week Breakdown

Week	Focus	Hours
1	Python Basics	25
2	Control Flow + Functions	25
3	Data Structures Deep Dive	25
4	File I/O + Error Handling	20
5	NumPy Mastery	25
6	Pandas Core	30
7	Data Cleaning & EDA	30
8	Mini-Project + GitHub	20

Week 1: Python Basics

Topics

Topic	Details
Variables	`int`, `float`, `str`, `bool`
Basic Operations	`+ - * / // % **`
Type Conversion	`int()`, `float()`, `str()`
Strings	Indexing, slicing, `.split()`, `.join()`, f-strings
Print & Input	`print()`, `input()`

Practice (Daily)

# Day 1
name = input("Enter name: ")
age = int(input("Age: "))
print(f"Hello {name}, you will be {age + 5} in 5 years!")

Resources

Mini-Task: Build a tip calculator

Input: bill, tip %, people → Output: each person pays $X.XX

Week 2: Control Flow & Functions

Topic	Syntax
`if/elif/else`	`if x > 0: ...`
Loops	`for i in range(10):`, `while x < 5:`
List Comprehensions	`[x**2 for x in range(5)]`
Functions	`def func_name(params):`
`args`, `*kwargs`	Optional later

Practice

def grade_score(score):
    if score >= 90: return "A"
    elif score >= 80: return "B"
    # ...

Resources

Automate the Boring Stuff – Ch 4–6
Real Python – Functions

Project: FizzBuzz + Prime Checker

Write two functions:
1. fizzbuzz(n) → prints 1 to n with rules
2. is_prime(n) → returns True/False

Week 3: Data Structures Deep Dive

Structure	Use Case
`list`	Ordered, mutable
`tuple`	Immutable, faster
`dict`	Key-value pairs
`set`	Unique, unordered

Key Methods

# List
lst = [1, 2, 3]
lst.append(4), lst.pop(), lst[1:3]

# Dict
d = {"name": "Alex", "age": 25}
d.keys(), d.values(), d.items()

# Set
a = {1,2,3}; b = {3,4,5}; a & b  # intersection

Practice

# Count word frequency
text = "the cat and the dog and the bird"
words = text.split()
freq = {}
for w in words:
    freq[w] = freq.get(w, 0) + 1

Project: To-Do List CLI App

Add, remove, list tasks → save to .txt

Week 4: File Handling + Error Handling

Topic	Code
Read/Write	`with open('file.txt', 'r') as f:`
CSV	`import csv`
JSON	`import json`
Try/Except	`try: ... except ValueError:`

Example: Read CSV

import csv
with open('data.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['name'], row['age'])

Resources

Automate the Boring Stuff – Ch 8–9
Real Python – File I/O

Mini-Project: Student Gradebook

Read grades.csv → calculate average → write summary.txt

Week 5: NumPy – Numerical Python

Concept	Code
Arrays	`np.array([1,2,3])`
Shape	`.shape`, `.reshape()`
Math	`np.mean()`, `np.std()`
Indexing	Boolean, fancy
Broadcasting	`arr + 5`

Practice

import numpy as np
arr = np.random.randn(1000)
print(f"Mean: {arr.mean():.2f}, Std: {arr.std():.2f}")

Resources

NumPy Official Quickstart
Kaggle: Python Course → NumPy

Task:

Generate 1000 random heights (normal dist: μ=170, σ=10) → find % > 180 cm

Week 6: Pandas – Data Manipulation

Core Object	Use
`Series`	1D labeled array
`DataFrame`	2D table

Essential Methods

Task	Code
Read CSV	`pd.read_csv('file.csv')`
View	`.head()`, `.info()`, `.describe()`
Select	`df['col']`, `df.loc[]`, `df.iloc[]`
Filter	`df[df['age'] > 30]`
GroupBy	`df.groupby('city').mean()`
Merge	`pd.merge(df1, df2, on='id')`

Example

import pandas as pd
df = pd.read_csv("titanic.csv")
df = df.dropna(subset=['Age'])
adults = df[df['Age'] > 18]
survival_rate = adults['Survived'].mean()

Resources

10 Minutes to Pandas
Kaggle: Pandas Course (Free)

Practice Dataset: Titanic

Week 7: Data Cleaning & EDA

Task	Code
Missing Values	`df.isnull().sum()`, `df.fillna()`, `df.dropna()`
Duplicates	`df.duplicated()`, `df.drop_duplicates()`
Outliers	Z-score or IQR method
Type Fix	`df['age'] = df['age'].astype(int)`
New Columns	`df['family_size'] = df['sibsp'] + df['parch'] + 1`

EDA Checklist

df.describe()
df['column'].value_counts()
df.corr()
sns.heatmap(df.corr(), annot=True)

Project: Titanic Survival Analysis

Clean data → EDA → answer:
- Survival rate by gender?
- Did age affect survival?
- Fare vs survival?

Week 8: Mini-Project + GitHub

Final Project: Titanic Data Explorer

GitHub Setup

git init
git add .
git commit -m "Titanic EDA complete"
git remote add origin https://github.com/yourname/titanic-ds.git
git push -u origin main

README Template

# Titanic Survival Analysis

## Key Insights
- Women survived at 74% vs men at 19%
- 1st class: 63% survival
- Children (<12) had highest survival

## How to Run
```bash
pip install pandas matplotlib seaborn
jupyter notebook titanic_analysis.ipynb

Visualizations

Survival by Gender

---

## Tools to Install (Week 1)
```bash
# Anaconda (recommended)
https://www.anaconda.com/products/distribution

# Or via pip
pip install pandas numpy matplotlib seaborn jupyter

Daily Learning Template (60 mins)

Time	Activity
10 min	Review yesterday
30 min	Watch/read new topic
15 min	Code along
5 min	Write notes (Notion/Obsidian)

Assessment: Can You Do This?

Task	Yes/No
Read CSV into DataFrame	☐
Filter passengers > 30 years	☐
Group by class and compute mean fare	☐
Plot survival rate by gender	☐
Save cleaned data to new CSV	☐

If all Yes → You passed Phase 1!

Next: Phase 2 – Statistics & Math

“Garbage in, garbage out.” Learn why models work.

Free Resources Cheat Sheet

Resource	Link
Automate the Boring Stuff	automatetheboringstuff.com
Kaggle Python	kaggle.com/learn/python
Kaggle Pandas	kaggle.com/learn/pandas
Pandas 10min	pandas.pydata.org/10min
NumPy Quickstart	numpy.org/quickstart

Pro Tip: Build a “Cheat Sheet”

Create python_cheat_sheet.md:

# Python for DS

## Pandas
df.head() → first 5 rows
df['col'].mean()
df.groupby('cat').size()

Update daily.

Start Now:
1. Open terminal
2. jupyter notebook
3. Create week1_day1.ipynb
4. Write: print("I will be a Data Scientist")

Tag me on LinkedIn when you push your first repo!
Let’s make Phase 1 legendary.

Last updated: Nov 09, 2025