Statistics & Math

Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.

Detailed Phase 1: Python Foundations for Data Science

Detailed Phase 1: Python Foundations for Data Science

Detailed Phase 1: Python Foundations for Data Science

Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.


Week-by-Week Breakdown

Week Focus Hours
1 Python Basics 25
2 Control Flow + Functions 25
3 Data Structures Deep Dive 25
4 File I/O + Error Handling 20
5 NumPy Mastery 25
6 Pandas Core 30
7 Data Cleaning & EDA 30
8 Mini-Project + GitHub 20

Week 1: Python Basics

Topics

Topic Details
Variables int, float, str, bool
Basic Operations + - * / // % **
Type Conversion int(), float(), str()
Strings Indexing, slicing, .split(), .join(), f-strings
Print & Input print(), input()

Practice (Daily)

# Day 1
name = input("Enter name: ")
age = int(input("Age: "))
print(f"Hello {name}, you will be {age + 5} in 5 years!")

Resources

Mini-Task: Build a tip calculator

Input: bill, tip %, people → Output: each person pays $X.XX


Week 2: Control Flow & Functions

Topic Syntax
if/elif/else if x > 0: ...
Loops for i in range(10):, while x < 5:
List Comprehensions [x**2 for x in range(5)]
Functions def func_name(params):
*args, **kwargs Optional later

Practice

def grade_score(score):
    if score >= 90: return "A"
    elif score >= 80: return "B"
    # ...

Resources

Project: FizzBuzz + Prime Checker

Write two functions:
1. fizzbuzz(n) → prints 1 to n with rules
2. is_prime(n) → returns True/False


Week 3: Data Structures Deep Dive

Structure Use Case
list Ordered, mutable
tuple Immutable, faster
dict Key-value pairs
set Unique, unordered

Key Methods

# List
lst = [1, 2, 3]
lst.append(4), lst.pop(), lst[1:3]

# Dict
d = {"name": "Alex", "age": 25}
d.keys(), d.values(), d.items()

# Set
a = {1,2,3}; b = {3,4,5}; a & b  # intersection

Practice

# Count word frequency
text = "the cat and the dog and the bird"
words = text.split()
freq = {}
for w in words:
    freq[w] = freq.get(w, 0) + 1

Project: To-Do List CLI App

Add, remove, list tasks → save to .txt


Week 4: File Handling + Error Handling

Topic Code
Read/Write with open('file.txt', 'r') as f:
CSV import csv
JSON import json
Try/Except try: ... except ValueError:

Example: Read CSV

import csv
with open('data.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['name'], row['age'])

Resources

Mini-Project: Student Gradebook

Read grades.csv → calculate average → write summary.txt


Week 5: NumPy – Numerical Python

Concept Code
Arrays np.array([1,2,3])
Shape .shape, .reshape()
Math np.mean(), np.std()
Indexing Boolean, fancy
Broadcasting arr + 5

Practice

import numpy as np
arr = np.random.randn(1000)
print(f"Mean: {arr.mean():.2f}, Std: {arr.std():.2f}")

Resources

Task:

Generate 1000 random heights (normal dist: μ=170, σ=10) → find % > 180 cm


Week 6: Pandas – Data Manipulation

Core Object Use
Series 1D labeled array
DataFrame 2D table

Essential Methods

Task Code
Read CSV pd.read_csv('file.csv')
View .head(), .info(), .describe()
Select df['col'], df.loc[], df.iloc[]
Filter df[df['age'] > 30]
GroupBy df.groupby('city').mean()
Merge pd.merge(df1, df2, on='id')

Example

import pandas as pd
df = pd.read_csv("titanic.csv")
df = df.dropna(subset=['Age'])
adults = df[df['Age'] > 18]
survival_rate = adults['Survived'].mean()

Resources

Practice Dataset: Titanic


Week 7: Data Cleaning & EDA

Task Code
Missing Values df.isnull().sum(), df.fillna(), df.dropna()
Duplicates df.duplicated(), df.drop_duplicates()
Outliers Z-score or IQR method
Type Fix df['age'] = df['age'].astype(int)
New Columns df['family_size'] = df['sibsp'] + df['parch'] + 1

EDA Checklist

df.describe()
df['column'].value_counts()
df.corr()
sns.heatmap(df.corr(), annot=True)

Project: Titanic Survival Analysis

Clean data → EDA → answer:
- Survival rate by gender?
- Did age affect survival?
- Fare vs survival?


Week 8: Mini-Project + GitHub

Final Project: Titanic Data Explorer

Deliverables:
1. Jupyter Notebook: titanic_analysis.ipynb
2. Cleaned dataset: titanic_clean.csv
3. GitHub Repo: yourname/titanic-ds
4. README.md with:
- Problem statement
- Key findings (3 bullet points)
- Charts (embed or link)
- How to run

GitHub Setup

git init
git add .
git commit -m "Titanic EDA complete"
git remote add origin https://github.com/yourname/titanic-ds.git
git push -u origin main

README Template

# Titanic Survival Analysis

## Key Insights
- Women survived at 74% vs men at 19%
- 1st class: 63% survival
- Children (<12) had highest survival

## How to Run
```bash
pip install pandas matplotlib seaborn
jupyter notebook titanic_analysis.ipynb

Visualizations

Survival by Gender

---

## Tools to Install (Week 1)
```bash
# Anaconda (recommended)
https://www.anaconda.com/products/distribution

# Or via pip
pip install pandas numpy matplotlib seaborn jupyter

Daily Learning Template (60 mins)

Time Activity
10 min Review yesterday
30 min Watch/read new topic
15 min Code along
5 min Write notes (Notion/Obsidian)

Assessment: Can You Do This?

Task Yes/No
Read CSV into DataFrame
Filter passengers > 30 years
Group by class and compute mean fare
Plot survival rate by gender
Save cleaned data to new CSV

If all Yes → You passed Phase 1!


Next: Phase 2 – Statistics & Math

“Garbage in, garbage out.” Learn why models work.


Free Resources Cheat Sheet

Resource Link
Automate the Boring Stuff automatetheboringstuff.com
Kaggle Python kaggle.com/learn/python
Kaggle Pandas kaggle.com/learn/pandas
Pandas 10min pandas.pydata.org/10min
NumPy Quickstart numpy.org/quickstart

Pro Tip: Build a “Cheat Sheet”

Create python_cheat_sheet.md:

# Python for DS

## Pandas
df.head() → first 5 rows
df['col'].mean()
df.groupby('cat').size()

Update daily.


Start Now:
1. Open terminal
2. jupyter notebook
3. Create week1_day1.ipynb
4. Write: print("I will be a Data Scientist")

Tag me on LinkedIn when you push your first repo!
Let’s make Phase 1 legendary.

Last updated: Nov 09, 2025

Statistics & Math

Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.

Detailed Phase 1: Python Foundations for Data Science

Detailed Phase 1: Python Foundations for Data Science

Detailed Phase 1: Python Foundations for Data Science

Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.


Week-by-Week Breakdown

Week Focus Hours
1 Python Basics 25
2 Control Flow + Functions 25
3 Data Structures Deep Dive 25
4 File I/O + Error Handling 20
5 NumPy Mastery 25
6 Pandas Core 30
7 Data Cleaning & EDA 30
8 Mini-Project + GitHub 20

Week 1: Python Basics

Topics

Topic Details
Variables int, float, str, bool
Basic Operations + - * / // % **
Type Conversion int(), float(), str()
Strings Indexing, slicing, .split(), .join(), f-strings
Print & Input print(), input()

Practice (Daily)

# Day 1
name = input("Enter name: ")
age = int(input("Age: "))
print(f"Hello {name}, you will be {age + 5} in 5 years!")

Resources

Mini-Task: Build a tip calculator

Input: bill, tip %, people → Output: each person pays $X.XX


Week 2: Control Flow & Functions

Topic Syntax
if/elif/else if x > 0: ...
Loops for i in range(10):, while x < 5:
List Comprehensions [x**2 for x in range(5)]
Functions def func_name(params):
*args, **kwargs Optional later

Practice

def grade_score(score):
    if score >= 90: return "A"
    elif score >= 80: return "B"
    # ...

Resources

Project: FizzBuzz + Prime Checker

Write two functions:
1. fizzbuzz(n) → prints 1 to n with rules
2. is_prime(n) → returns True/False


Week 3: Data Structures Deep Dive

Structure Use Case
list Ordered, mutable
tuple Immutable, faster
dict Key-value pairs
set Unique, unordered

Key Methods

# List
lst = [1, 2, 3]
lst.append(4), lst.pop(), lst[1:3]

# Dict
d = {"name": "Alex", "age": 25}
d.keys(), d.values(), d.items()

# Set
a = {1,2,3}; b = {3,4,5}; a & b  # intersection

Practice

# Count word frequency
text = "the cat and the dog and the bird"
words = text.split()
freq = {}
for w in words:
    freq[w] = freq.get(w, 0) + 1

Project: To-Do List CLI App

Add, remove, list tasks → save to .txt


Week 4: File Handling + Error Handling

Topic Code
Read/Write with open('file.txt', 'r') as f:
CSV import csv
JSON import json
Try/Except try: ... except ValueError:

Example: Read CSV

import csv
with open('data.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['name'], row['age'])

Resources

Mini-Project: Student Gradebook

Read grades.csv → calculate average → write summary.txt


Week 5: NumPy – Numerical Python

Concept Code
Arrays np.array([1,2,3])
Shape .shape, .reshape()
Math np.mean(), np.std()
Indexing Boolean, fancy
Broadcasting arr + 5

Practice

import numpy as np
arr = np.random.randn(1000)
print(f"Mean: {arr.mean():.2f}, Std: {arr.std():.2f}")

Resources

Task:

Generate 1000 random heights (normal dist: μ=170, σ=10) → find % > 180 cm


Week 6: Pandas – Data Manipulation

Core Object Use
Series 1D labeled array
DataFrame 2D table

Essential Methods

Task Code
Read CSV pd.read_csv('file.csv')
View .head(), .info(), .describe()
Select df['col'], df.loc[], df.iloc[]
Filter df[df['age'] > 30]
GroupBy df.groupby('city').mean()
Merge pd.merge(df1, df2, on='id')

Example

import pandas as pd
df = pd.read_csv("titanic.csv")
df = df.dropna(subset=['Age'])
adults = df[df['Age'] > 18]
survival_rate = adults['Survived'].mean()

Resources

Practice Dataset: Titanic


Week 7: Data Cleaning & EDA

Task Code
Missing Values df.isnull().sum(), df.fillna(), df.dropna()
Duplicates df.duplicated(), df.drop_duplicates()
Outliers Z-score or IQR method
Type Fix df['age'] = df['age'].astype(int)
New Columns df['family_size'] = df['sibsp'] + df['parch'] + 1

EDA Checklist

df.describe()
df['column'].value_counts()
df.corr()
sns.heatmap(df.corr(), annot=True)

Project: Titanic Survival Analysis

Clean data → EDA → answer:
- Survival rate by gender?
- Did age affect survival?
- Fare vs survival?


Week 8: Mini-Project + GitHub

Final Project: Titanic Data Explorer

Deliverables:
1. Jupyter Notebook: titanic_analysis.ipynb
2. Cleaned dataset: titanic_clean.csv
3. GitHub Repo: yourname/titanic-ds
4. README.md with:
- Problem statement
- Key findings (3 bullet points)
- Charts (embed or link)
- How to run

GitHub Setup

git init
git add .
git commit -m "Titanic EDA complete"
git remote add origin https://github.com/yourname/titanic-ds.git
git push -u origin main

README Template

# Titanic Survival Analysis

## Key Insights
- Women survived at 74% vs men at 19%
- 1st class: 63% survival
- Children (<12) had highest survival

## How to Run
```bash
pip install pandas matplotlib seaborn
jupyter notebook titanic_analysis.ipynb

Visualizations

Survival by Gender

---

## Tools to Install (Week 1)
```bash
# Anaconda (recommended)
https://www.anaconda.com/products/distribution

# Or via pip
pip install pandas numpy matplotlib seaborn jupyter

Daily Learning Template (60 mins)

Time Activity
10 min Review yesterday
30 min Watch/read new topic
15 min Code along
5 min Write notes (Notion/Obsidian)

Assessment: Can You Do This?

Task Yes/No
Read CSV into DataFrame
Filter passengers > 30 years
Group by class and compute mean fare
Plot survival rate by gender
Save cleaned data to new CSV

If all Yes → You passed Phase 1!


Next: Phase 2 – Statistics & Math

“Garbage in, garbage out.” Learn why models work.


Free Resources Cheat Sheet

Resource Link
Automate the Boring Stuff automatetheboringstuff.com
Kaggle Python kaggle.com/learn/python
Kaggle Pandas kaggle.com/learn/pandas
Pandas 10min pandas.pydata.org/10min
NumPy Quickstart numpy.org/quickstart

Pro Tip: Build a “Cheat Sheet”

Create python_cheat_sheet.md:

# Python for DS

## Pandas
df.head() → first 5 rows
df['col'].mean()
df.groupby('cat').size()

Update daily.


Start Now:
1. Open terminal
2. jupyter notebook
3. Create week1_day1.ipynb
4. Write: print("I will be a Data Scientist")

Tag me on LinkedIn when you push your first repo!
Let’s make Phase 1 legendary.

Last updated: Nov 09, 2025