Unit I: Neural Networks – I
Comprehensive Understanding Notes + In-Depth Explanation + Best Python Code Examples (2025 Standards)
Neural Networks – I
Unit I: Neural Networks – I
Comprehensive Understanding Notes + In-Depth Explanation + Best Python Code Examples (2025 Standards)
Goal: After reading this, you should be able to explain every concept to a friend and implement it from scratch in Python (NumPy/PyTorch).
1. Neuron, Nerve Structure and Synapse (Biological Motivation)
| Biological Neuron | Artificial Neuron (McCulloch-Pitts / Modern) |
|---|---|
| Dendrites → receive signals | Input vector x₁, x₂, …, xₙ |
| Cell body (soma) → integrates | Weighted sum + bias |
| Axon → transmits signal | Output after activation function |
| Synapse → connection with weight | Learnable weights w₁, w₂, …, wₙ and bias b |
Key point: Strength of synapse = synaptic weight (can be excitatory >0 or inhibitory <0).
2. Artificial Neuron Model (Mathematical Form)
Single artificial neuron output:
a = f ( ∑(i=1 to n) wᵢ xᵢ + b ) = f (w · x + b)
where f(.) = activation function.
3. Activation Functions (Most Important Ones in 2025)
| Function | Formula | Range | Use Case | Derivative (for backprop) |
|---|---|---|---|---|
| Step (Heaviside) | 0 if z≤0, 1 if z>0 | {0,1} | Original Perceptron | Not differentiable |
| Sigmoid (Logistic) | σ(z) = 1/(1+e⁻ᶻ) | (0,1) | Binary classification (old) | σ(1-σ) |
| Tanh | tanh(z) = (eᶻ - e⁻ᶻ)/(eᶻ + e⁻ᶻ) | (-1,1) | Hidden layers (zero-centered) | 1 - tanh²(z) |
| ReLU | max(0,z) | [0,∞) | Default today (2025) | 0 if z<0, 1 otherwise |
| Leaky ReLU | max(αz, z) (α=0.01) | (-∞,∞) | Fixes dying ReLU | α if z<0 else 1 |
| GELU (used in BERT, GPT) | 0.5z(1 + erf(z/√2)) | Transformers | Complex but smooth | |
| Swish / SiLU | z ⋅ sigmoid(z) | (-0.28,∞) | Often beats ReLU | sigmoid + z⋅sigmoid(1-z) |
Best modern choice (2025):
- Hidden layers → GELU or Swish (Transformers)
- CNNs → ReLU or Mish
- Simple feed-forward → ReLU is still perfectly fine
4. Neural Network Architectures
4.1 Single Layer Feed-Forward Network (Perceptron)
- Only input → output (no hidden layer)
- Can only solve linearly separable problems
4.2 Multi-Layer Feed-Forward Network (MLP)
- Input → One or more hidden layers → Output
- Universal approximator (Cybenko 1989)
- Fully connected (dense) layers
4.3 Recurrent Networks (RNN, LSTM, GRU)
- Have loops → memory of previous inputs
- Used for sequences (text, time-series, speech
5. Learning Techniques
| Type | Description | Example Algorithm |
|---|---|---|
| Supervised | Input + correct output given | Backpropagation |
| Unsupervised | Only input, find patterns | Autoencoders, Hebbian |
| Reinforcement | Reward signal | Not covered in this unit |
6. Perceptron Learning Rule & Convergence
Rosenblatt’s Perceptron (1962)
Update rule (when mistake):
w(new) = w(old) + η (y − ŷ) x
b(new) = b(old) + η (y − ŷ)
where η = learning rate, y = true label (±1), ŷ = prediction
Convergence Theorem: If data is linearly separable, perceptron will converge in finite steps.
7. Auto-Associative vs Hetero-Associative Memory
| Type | Input = Key | Output | Example Network |
|---|---|---|---|
| Auto-Associative | Pattern itself | Same pattern (denoising/completion) | Hopfield Network, Denoising Autoencoder |
| Hetero-Associative | Pattern A | Different pattern B | BAM, Sequence memory |
Hopfield Network (auto-associative) is classic example in this unit.
Best Code Examples (From Scratch + PyTorch)
Example 1: Single Artificial Neuron from Scratch (NumPy)
import numpy as np
import matplotlib.pyplot as plt
class Neuron:
def __init__(self, n_inputs, activation='relu'):
self.W = np.random.randn(n_inputs) * 0.01
self.b = np.zeros(1)
self.activation = activation
def forward(self, X):
z = X @ self.W + self.b
return self.activate(z)
def activate(self, z):
if self.activation == 'sigmoid':
return 1 / (1 + np.exp(-z))
elif self.activation == 'tanh':
return np.tanh(z)
elif self.activation == 'relu':
return np.maximum(0, z)
elif self.activation == 'leaky_relu':
return np.where(z > 0, z, z * 0.01)
elif self.activation == 'swish':
return z / (1 + np.exp(-z))
Example 2: Perceptron Learning on AND Gate (Convergence Demo)
import numpy as np
# AND dataset
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 0, 0, 1]) # labels 0 or 0/1
# Perceptron class
class Perceptron:
def __init__(self, lr=0.1, epochs=1000):
self.lr = lr
self.epochs = epochs
def fit(self, X, y):
self.W = np.zeros(X.shape[1])
self.b = 0
self.errors = []
for epoch in range(self.epochs):
error_count = 0
for xi, target in zip(X, y):
z = np.dot(xi, self.W) + self.b
prediction = 1 if z >= 0 else 0
update = self.lr * (target - prediction)
self.W += update * xi
self.b += update
error_count += int(update != 0.0)
self.errors.append(error_count)
if error_count == 0:
print(f"Converged at epoch {epoch}")
break
return self
def predict(self, X):
return np.where((X @ self.W + self.b) >= 0, 1, 0)
# Train
p = Perceptron(lr=0.1).fit(X, y)
print("Learned weights:", p.W, "bias:", p.b)
print("Predictions:", p.predict(X))
Example 3: Multilayer Perceptron from Scratch (Backpropagation)
import numpy as np
class MLP:
def __init__(self, layers=[2, 4, 1], activation='sigmoid'):
self.layers = layers
self.activation = activation
self.W = []
self.B = []
for i in range(len(layers)-1):
self.W.append(np.random.randn(layers[i], layers[i+1]) * 0.5)
self.B.append(np.zeros((1, layers[i+1])))
def sigmoid(self, z): return 1/(1+np.exp(-z))
def sigmoid_prime(self, z): return z*(1-z)
def relu(self, z): return np.maximum(0, z)
def relu_prime(self, z): return (z > 0).astype(float)
def forward(self, X):
a = X
self.activations = [a]
self.zs = []
for w, b in zip(self.W, self.B):
z = a @ w + b
self.zs.append(z)
if self.activation == 'sigmoid':
a = self.sigmoid(z)
else:
a = self.relu(z)
self.activations.append(a)
return a
def backward(self, X, y, lr=0.1):
m = X.shape[0]
# forward again forward to get caches
self.forward(X)
# output error
if self.activation == 'sigmoid':
delta = (self.activations[-1] - y) * self.sigmoid_prime(self.activations[-1])
else:
delta = (self.activations[-1] - y) * self.relu_prime(self.activations[-1])
dW = self.activations[-2].T @ delta / m
dB = np.sum(delta, axis=0, keepdims=True) / m
self.W[-1] -= lr * dW
self.B[-1] -= lr * dB
# backprop through layers
for l in range(2, len(self.layers)):
if self.activation == 'sigmoid':
delta = (delta @ self.W[-l+1].T) * self.sigmoid_prime(self.activations[-l])
else:
delta = (delta @ self.W[-l+1].T) * self.relu_prime(self.activations[-l])
dW = self.activations[-l-1].T @ delta / m
dB = np.sum(delta, axis=0, keepdims=True) / m
self.W[-l] -= lr * dW
self.B[-l] -= lr * dB
Example 4: Simple Auto-Associative Memory – Hopfield Network (Classic)
import numpy as np
class HopfieldNetwork:
def __init__(self, n_neurons):
self.n = n_neurons
self.W = np.zeros((n_neurons, n_neurons))
def train(self, patterns):
# patterns: list of ±1 vectors of length n
for p in patterns:
p = p.reshape(-1,1)
self.W += p @ p.T
np.fill_diagonal(self.W, 0)
self.W /= len(patterns)
def predict(self, pattern, steps=10):
s = pattern.copy()
for _ in range(steps):
for i in np.random.permutation(self.n):
s[i] = 1 if (self.W[i] @ s) > 0 else -1
return s
# Example: store two patterns
p1 = np.array([1, 1, -1, -1])
p2 = np.array([-1, -1, 1, 1])
net = HopfieldNetwork(4)
net.train([p1, p2])
noisy = np.array([1, -1, -1, -1])
recovered = net.predict(noisy)
print("Noisy:", noisy)
print("Recovered:", recovered) # should be p1
Key Takeaway Summary Table
| Concept | Key Idea | Can Solve | Classic Algorithm/Example |
|---|---|---|---|
| Single Neuron | Weighted sum + activation | Linear decision | McCulloch-Pitts |
| Perceptron | Learns linear separator | AND, OR, NOT | Rosenblatt 1962 |
| MLP + Backprop | Universal approximator with hidden layers | XOR, nonlinear data | Rumelhart 1986 |
| Recurrent Networks | Loops → memory | Sequences | Elman, LSTM, GRU |
| Auto-associative memory | Network recalls complete pattern from partial | Denoising, pattern completion | Hopfield Network |
These notes + code will give you 100% conceptual clarity and practical implementation ability for Unit I. Practice by implementing XOR with MLP from scratch — it’s the classic test that single perceptron fails, MLP succeeds.
Happy learning! 🚀
Unit I: Neural Networks – I
Comprehensive Understanding Notes + In-Depth Explanation + Best Python Code Examples (2025 Standards)
Neural Networks – I
Unit I: Neural Networks – I
Comprehensive Understanding Notes + In-Depth Explanation + Best Python Code Examples (2025 Standards)
Goal: After reading this, you should be able to explain every concept to a friend and implement it from scratch in Python (NumPy/PyTorch).
1. Neuron, Nerve Structure and Synapse (Biological Motivation)
| Biological Neuron | Artificial Neuron (McCulloch-Pitts / Modern) |
|---|---|
| Dendrites → receive signals | Input vector x₁, x₂, …, xₙ |
| Cell body (soma) → integrates | Weighted sum + bias |
| Axon → transmits signal | Output after activation function |
| Synapse → connection with weight | Learnable weights w₁, w₂, …, wₙ and bias b |
Key point: Strength of synapse = synaptic weight (can be excitatory >0 or inhibitory <0).
2. Artificial Neuron Model (Mathematical Form)
Single artificial neuron output:
a = f ( ∑(i=1 to n) wᵢ xᵢ + b ) = f (w · x + b)
where f(.) = activation function.
3. Activation Functions (Most Important Ones in 2025)
| Function | Formula | Range | Use Case | Derivative (for backprop) |
|---|---|---|---|---|
| Step (Heaviside) | 0 if z≤0, 1 if z>0 | {0,1} | Original Perceptron | Not differentiable |
| Sigmoid (Logistic) | σ(z) = 1/(1+e⁻ᶻ) | (0,1) | Binary classification (old) | σ(1-σ) |
| Tanh | tanh(z) = (eᶻ - e⁻ᶻ)/(eᶻ + e⁻ᶻ) | (-1,1) | Hidden layers (zero-centered) | 1 - tanh²(z) |
| ReLU | max(0,z) | [0,∞) | Default today (2025) | 0 if z<0, 1 otherwise |
| Leaky ReLU | max(αz, z) (α=0.01) | (-∞,∞) | Fixes dying ReLU | α if z<0 else 1 |
| GELU (used in BERT, GPT) | 0.5z(1 + erf(z/√2)) | Transformers | Complex but smooth | |
| Swish / SiLU | z ⋅ sigmoid(z) | (-0.28,∞) | Often beats ReLU | sigmoid + z⋅sigmoid(1-z) |
Best modern choice (2025):
- Hidden layers → GELU or Swish (Transformers)
- CNNs → ReLU or Mish
- Simple feed-forward → ReLU is still perfectly fine
4. Neural Network Architectures
4.1 Single Layer Feed-Forward Network (Perceptron)
- Only input → output (no hidden layer)
- Can only solve linearly separable problems
4.2 Multi-Layer Feed-Forward Network (MLP)
- Input → One or more hidden layers → Output
- Universal approximator (Cybenko 1989)
- Fully connected (dense) layers
4.3 Recurrent Networks (RNN, LSTM, GRU)
- Have loops → memory of previous inputs
- Used for sequences (text, time-series, speech
5. Learning Techniques
| Type | Description | Example Algorithm |
|---|---|---|
| Supervised | Input + correct output given | Backpropagation |
| Unsupervised | Only input, find patterns | Autoencoders, Hebbian |
| Reinforcement | Reward signal | Not covered in this unit |
6. Perceptron Learning Rule & Convergence
Rosenblatt’s Perceptron (1962)
Update rule (when mistake):
w(new) = w(old) + η (y − ŷ) x
b(new) = b(old) + η (y − ŷ)
where η = learning rate, y = true label (±1), ŷ = prediction
Convergence Theorem: If data is linearly separable, perceptron will converge in finite steps.
7. Auto-Associative vs Hetero-Associative Memory
| Type | Input = Key | Output | Example Network |
|---|---|---|---|
| Auto-Associative | Pattern itself | Same pattern (denoising/completion) | Hopfield Network, Denoising Autoencoder |
| Hetero-Associative | Pattern A | Different pattern B | BAM, Sequence memory |
Hopfield Network (auto-associative) is classic example in this unit.
Best Code Examples (From Scratch + PyTorch)
Example 1: Single Artificial Neuron from Scratch (NumPy)
import numpy as np
import matplotlib.pyplot as plt
class Neuron:
def __init__(self, n_inputs, activation='relu'):
self.W = np.random.randn(n_inputs) * 0.01
self.b = np.zeros(1)
self.activation = activation
def forward(self, X):
z = X @ self.W + self.b
return self.activate(z)
def activate(self, z):
if self.activation == 'sigmoid':
return 1 / (1 + np.exp(-z))
elif self.activation == 'tanh':
return np.tanh(z)
elif self.activation == 'relu':
return np.maximum(0, z)
elif self.activation == 'leaky_relu':
return np.where(z > 0, z, z * 0.01)
elif self.activation == 'swish':
return z / (1 + np.exp(-z))
Example 2: Perceptron Learning on AND Gate (Convergence Demo)
import numpy as np
# AND dataset
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 0, 0, 1]) # labels 0 or 0/1
# Perceptron class
class Perceptron:
def __init__(self, lr=0.1, epochs=1000):
self.lr = lr
self.epochs = epochs
def fit(self, X, y):
self.W = np.zeros(X.shape[1])
self.b = 0
self.errors = []
for epoch in range(self.epochs):
error_count = 0
for xi, target in zip(X, y):
z = np.dot(xi, self.W) + self.b
prediction = 1 if z >= 0 else 0
update = self.lr * (target - prediction)
self.W += update * xi
self.b += update
error_count += int(update != 0.0)
self.errors.append(error_count)
if error_count == 0:
print(f"Converged at epoch {epoch}")
break
return self
def predict(self, X):
return np.where((X @ self.W + self.b) >= 0, 1, 0)
# Train
p = Perceptron(lr=0.1).fit(X, y)
print("Learned weights:", p.W, "bias:", p.b)
print("Predictions:", p.predict(X))
Example 3: Multilayer Perceptron from Scratch (Backpropagation)
import numpy as np
class MLP:
def __init__(self, layers=[2, 4, 1], activation='sigmoid'):
self.layers = layers
self.activation = activation
self.W = []
self.B = []
for i in range(len(layers)-1):
self.W.append(np.random.randn(layers[i], layers[i+1]) * 0.5)
self.B.append(np.zeros((1, layers[i+1])))
def sigmoid(self, z): return 1/(1+np.exp(-z))
def sigmoid_prime(self, z): return z*(1-z)
def relu(self, z): return np.maximum(0, z)
def relu_prime(self, z): return (z > 0).astype(float)
def forward(self, X):
a = X
self.activations = [a]
self.zs = []
for w, b in zip(self.W, self.B):
z = a @ w + b
self.zs.append(z)
if self.activation == 'sigmoid':
a = self.sigmoid(z)
else:
a = self.relu(z)
self.activations.append(a)
return a
def backward(self, X, y, lr=0.1):
m = X.shape[0]
# forward again forward to get caches
self.forward(X)
# output error
if self.activation == 'sigmoid':
delta = (self.activations[-1] - y) * self.sigmoid_prime(self.activations[-1])
else:
delta = (self.activations[-1] - y) * self.relu_prime(self.activations[-1])
dW = self.activations[-2].T @ delta / m
dB = np.sum(delta, axis=0, keepdims=True) / m
self.W[-1] -= lr * dW
self.B[-1] -= lr * dB
# backprop through layers
for l in range(2, len(self.layers)):
if self.activation == 'sigmoid':
delta = (delta @ self.W[-l+1].T) * self.sigmoid_prime(self.activations[-l])
else:
delta = (delta @ self.W[-l+1].T) * self.relu_prime(self.activations[-l])
dW = self.activations[-l-1].T @ delta / m
dB = np.sum(delta, axis=0, keepdims=True) / m
self.W[-l] -= lr * dW
self.B[-l] -= lr * dB
Example 4: Simple Auto-Associative Memory – Hopfield Network (Classic)
import numpy as np
class HopfieldNetwork:
def __init__(self, n_neurons):
self.n = n_neurons
self.W = np.zeros((n_neurons, n_neurons))
def train(self, patterns):
# patterns: list of ±1 vectors of length n
for p in patterns:
p = p.reshape(-1,1)
self.W += p @ p.T
np.fill_diagonal(self.W, 0)
self.W /= len(patterns)
def predict(self, pattern, steps=10):
s = pattern.copy()
for _ in range(steps):
for i in np.random.permutation(self.n):
s[i] = 1 if (self.W[i] @ s) > 0 else -1
return s
# Example: store two patterns
p1 = np.array([1, 1, -1, -1])
p2 = np.array([-1, -1, 1, 1])
net = HopfieldNetwork(4)
net.train([p1, p2])
noisy = np.array([1, -1, -1, -1])
recovered = net.predict(noisy)
print("Noisy:", noisy)
print("Recovered:", recovered) # should be p1
Key Takeaway Summary Table
| Concept | Key Idea | Can Solve | Classic Algorithm/Example |
|---|---|---|---|
| Single Neuron | Weighted sum + activation | Linear decision | McCulloch-Pitts |
| Perceptron | Learns linear separator | AND, OR, NOT | Rosenblatt 1962 |
| MLP + Backprop | Universal approximator with hidden layers | XOR, nonlinear data | Rumelhart 1986 |
| Recurrent Networks | Loops → memory | Sequences | Elman, LSTM, GRU |
| Auto-associative memory | Network recalls complete pattern from partial | Denoising, pattern completion | Hopfield Network |
These notes + code will give you 100% conceptual clarity and practical implementation ability for Unit I. Practice by implementing XOR with MLP from scratch — it’s the classic test that single perceptron fails, MLP succeeds.
Happy learning! 🚀