Unit I: Neural Networks – I

Comprehensive Understanding Notes + In-Depth Explanation + Best Python Code Examples (2025 Standards)

Neural Networks – I

Unit I: Neural Networks – I

Comprehensive Understanding Notes + In-Depth Explanation + Best Python Code Examples (2025 Standards)

Goal: After reading this, you should be able to explain every concept to a friend and implement it from scratch in Python (NumPy/PyTorch).

1. Neuron, Nerve Structure and Synapse (Biological Motivation)

Biological Neuron Artificial Neuron (McCulloch-Pitts / Modern)
Dendrites → receive signals Input vector x₁, x₂, …, xₙ
Cell body (soma) → integrates Weighted sum + bias
Axon → transmits signal Output after activation function
Synapse → connection with weight Learnable weights w₁, w₂, …, wₙ and bias b

Key point: Strength of synapse = synaptic weight (can be excitatory >0 or inhibitory <0).

2. Artificial Neuron Model (Mathematical Form)

Single artificial neuron output:

a = f ( ∑(i=1 to n) wᵢ xᵢ + b ) = f (w · x + b)

where f(.) = activation function.

3. Activation Functions (Most Important Ones in 2025)

Function Formula Range Use Case Derivative (for backprop)
Step (Heaviside) 0 if z≤0, 1 if z>0 {0,1} Original Perceptron Not differentiable
Sigmoid (Logistic) σ(z) = 1/(1+e⁻ᶻ) (0,1) Binary classification (old) σ(1-σ)
Tanh tanh(z) = (eᶻ - e⁻ᶻ)/(eᶻ + e⁻ᶻ) (-1,1) Hidden layers (zero-centered) 1 - tanh²(z)
ReLU max(0,z) [0,∞) Default today (2025) 0 if z<0, 1 otherwise
Leaky ReLU max(αz, z) (α=0.01) (-∞,∞) Fixes dying ReLU α if z<0 else 1
GELU (used in BERT, GPT) 0.5z(1 + erf(z/√2)) Transformers Complex but smooth
Swish / SiLU z ⋅ sigmoid(z) (-0.28,∞) Often beats ReLU sigmoid + z⋅sigmoid(1-z)

Best modern choice (2025):
- Hidden layers → GELU or Swish (Transformers)
- CNNs → ReLU or Mish
- Simple feed-forward → ReLU is still perfectly fine

4. Neural Network Architectures

4.1 Single Layer Feed-Forward Network (Perceptron)

  • Only input → output (no hidden layer)
  • Can only solve linearly separable problems

4.2 Multi-Layer Feed-Forward Network (MLP)

  • Input → One or more hidden layers → Output
  • Universal approximator (Cybenko 1989)
  • Fully connected (dense) layers

4.3 Recurrent Networks (RNN, LSTM, GRU)

  • Have loops → memory of previous inputs
  • Used for sequences (text, time-series, speech

5. Learning Techniques

Type Description Example Algorithm
Supervised Input + correct output given Backpropagation
Unsupervised Only input, find patterns Autoencoders, Hebbian
Reinforcement Reward signal Not covered in this unit

6. Perceptron Learning Rule & Convergence

Rosenblatt’s Perceptron (1962)

Update rule (when mistake):
w(new) = w(old) + η (y − ŷ) x
b(new) = b(old) + η (y − ŷ)

where η = learning rate, y = true label (±1), ŷ = prediction

Convergence Theorem: If data is linearly separable, perceptron will converge in finite steps.

7. Auto-Associative vs Hetero-Associative Memory

Type Input = Key Output Example Network
Auto-Associative Pattern itself Same pattern (denoising/completion) Hopfield Network, Denoising Autoencoder
Hetero-Associative Pattern A Different pattern B BAM, Sequence memory

Hopfield Network (auto-associative) is classic example in this unit.

Best Code Examples (From Scratch + PyTorch)

Example 1: Single Artificial Neuron from Scratch (NumPy)

import numpy as np
import matplotlib.pyplot as plt

class Neuron:
    def __init__(self, n_inputs, activation='relu'):
        self.W = np.random.randn(n_inputs) * 0.01
        self.b = np.zeros(1)
        self.activation = activation

    def forward(self, X):
        z = X @ self.W + self.b
        return self.activate(z)

    def activate(self, z):
        if self.activation == 'sigmoid':
            return 1 / (1 + np.exp(-z))
        elif self.activation == 'tanh':
            return np.tanh(z)
        elif self.activation == 'relu':
            return np.maximum(0, z)
        elif self.activation == 'leaky_relu':
            return np.where(z > 0, z, z * 0.01)
        elif self.activation == 'swish':
            return z / (1 + np.exp(-z))

Example 2: Perceptron Learning on AND Gate (Convergence Demo)

import numpy as np

# AND dataset
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 0, 0, 1])       # labels 0 or 0/1

# Perceptron class
class Perceptron:
    def __init__(self, lr=0.1, epochs=1000):
        self.lr = lr
        self.epochs = epochs

    def fit(self, X, y):
        self.W = np.zeros(X.shape[1])
        self.b = 0
        self.errors = []

        for epoch in range(self.epochs):
            error_count = 0
            for xi, target in zip(X, y):
                z = np.dot(xi, self.W) + self.b
                prediction = 1 if z >= 0 else 0
                update = self.lr * (target - prediction)
                self.W += update * xi
                self.b += update
                error_count += int(update != 0.0)
            self.errors.append(error_count)
            if error_count == 0:
                print(f"Converged at epoch {epoch}")
                break
        return self

    def predict(self, X):
        return np.where((X @ self.W + self.b) >= 0, 1, 0)

# Train
p = Perceptron(lr=0.1).fit(X, y)
print("Learned weights:", p.W, "bias:", p.b)
print("Predictions:", p.predict(X))

Example 3: Multilayer Perceptron from Scratch (Backpropagation)

import numpy as np

class MLP:
    def __init__(self, layers=[2, 4, 1], activation='sigmoid'):
        self.layers = layers
        self.activation = activation
        self.W = []
        self.B = []
        for i in range(len(layers)-1):
            self.W.append(np.random.randn(layers[i], layers[i+1]) * 0.5)
            self.B.append(np.zeros((1, layers[i+1])))

    def sigmoid(self, z): return 1/(1+np.exp(-z))
    def sigmoid_prime(self, z): return z*(1-z)
    def relu(self, z): return np.maximum(0, z)
    def relu_prime(self, z): return (z > 0).astype(float)

    def forward(self, X):
        a = X
        self.activations = [a]
        self.zs = []
        for w, b in zip(self.W, self.B):
            z = a @ w + b
            self.zs.append(z)
            if self.activation == 'sigmoid':
                a = self.sigmoid(z)
            else:
                a = self.relu(z)
            self.activations.append(a)
        return a

    def backward(self, X, y, lr=0.1):
        m = X.shape[0]
        # forward again forward to get caches
        self.forward(X)
        # output error
        if self.activation == 'sigmoid':
            delta = (self.activations[-1] - y) * self.sigmoid_prime(self.activations[-1])
        else:
            delta = (self.activations[-1] - y) * self.relu_prime(self.activations[-1])

        dW = self.activations[-2].T @ delta / m
        dB = np.sum(delta, axis=0, keepdims=True) / m

        self.W[-1] -= lr * dW
        self.B[-1] -= lr * dB

        # backprop through layers
        for l in range(2, len(self.layers)):
            if self.activation == 'sigmoid':
                delta = (delta @ self.W[-l+1].T) * self.sigmoid_prime(self.activations[-l])
            else:
                delta = (delta @ self.W[-l+1].T) * self.relu_prime(self.activations[-l])
            dW = self.activations[-l-1].T @ delta / m
            dB = np.sum(delta, axis=0, keepdims=True) / m
            self.W[-l] -= lr * dW
            self.B[-l] -= lr * dB

Example 4: Simple Auto-Associative Memory – Hopfield Network (Classic)

import numpy as np

class HopfieldNetwork:
    def __init__(self, n_neurons):
        self.n = n_neurons
        self.W = np.zeros((n_neurons, n_neurons))

    def train(self, patterns):
        # patterns: list of ±1 vectors of length n
        for p in patterns:
            p = p.reshape(-1,1)
            self.W += p @ p.T
        np.fill_diagonal(self.W, 0)
        self.W /= len(patterns)

    def predict(self, pattern, steps=10):
        s = pattern.copy()
        for _ in range(steps):
            for i in np.random.permutation(self.n):
                s[i] = 1 if (self.W[i] @ s) > 0 else -1
        return s

# Example: store two patterns
p1 = np.array([1, 1, -1, -1])
p2 = np.array([-1, -1, 1, 1])
net = HopfieldNetwork(4)
net.train([p1, p2])

noisy = np.array([1, -1, -1, -1])
recovered = net.predict(noisy)
print("Noisy:", noisy)
print("Recovered:", recovered)  # should be p1

Key Takeaway Summary Table

Concept Key Idea Can Solve Classic Algorithm/Example
Single Neuron Weighted sum + activation Linear decision McCulloch-Pitts
Perceptron Learns linear separator AND, OR, NOT Rosenblatt 1962
MLP + Backprop Universal approximator with hidden layers XOR, nonlinear data Rumelhart 1986
Recurrent Networks Loops → memory Sequences Elman, LSTM, GRU
Auto-associative memory Network recalls complete pattern from partial Denoising, pattern completion Hopfield Network

These notes + code will give you 100% conceptual clarity and practical implementation ability for Unit I. Practice by implementing XOR with MLP from scratch — it’s the classic test that single perceptron fails, MLP succeeds.

Happy learning! 🚀

Last updated: Nov 30, 2025

Unit I: Neural Networks – I

Comprehensive Understanding Notes + In-Depth Explanation + Best Python Code Examples (2025 Standards)

Neural Networks – I

Unit I: Neural Networks – I

Comprehensive Understanding Notes + In-Depth Explanation + Best Python Code Examples (2025 Standards)

Goal: After reading this, you should be able to explain every concept to a friend and implement it from scratch in Python (NumPy/PyTorch).

1. Neuron, Nerve Structure and Synapse (Biological Motivation)

Biological Neuron Artificial Neuron (McCulloch-Pitts / Modern)
Dendrites → receive signals Input vector x₁, x₂, …, xₙ
Cell body (soma) → integrates Weighted sum + bias
Axon → transmits signal Output after activation function
Synapse → connection with weight Learnable weights w₁, w₂, …, wₙ and bias b

Key point: Strength of synapse = synaptic weight (can be excitatory >0 or inhibitory <0).

2. Artificial Neuron Model (Mathematical Form)

Single artificial neuron output:

a = f ( ∑(i=1 to n) wᵢ xᵢ + b ) = f (w · x + b)

where f(.) = activation function.

3. Activation Functions (Most Important Ones in 2025)

Function Formula Range Use Case Derivative (for backprop)
Step (Heaviside) 0 if z≤0, 1 if z>0 {0,1} Original Perceptron Not differentiable
Sigmoid (Logistic) σ(z) = 1/(1+e⁻ᶻ) (0,1) Binary classification (old) σ(1-σ)
Tanh tanh(z) = (eᶻ - e⁻ᶻ)/(eᶻ + e⁻ᶻ) (-1,1) Hidden layers (zero-centered) 1 - tanh²(z)
ReLU max(0,z) [0,∞) Default today (2025) 0 if z<0, 1 otherwise
Leaky ReLU max(αz, z) (α=0.01) (-∞,∞) Fixes dying ReLU α if z<0 else 1
GELU (used in BERT, GPT) 0.5z(1 + erf(z/√2)) Transformers Complex but smooth
Swish / SiLU z ⋅ sigmoid(z) (-0.28,∞) Often beats ReLU sigmoid + z⋅sigmoid(1-z)

Best modern choice (2025):
- Hidden layers → GELU or Swish (Transformers)
- CNNs → ReLU or Mish
- Simple feed-forward → ReLU is still perfectly fine

4. Neural Network Architectures

4.1 Single Layer Feed-Forward Network (Perceptron)

  • Only input → output (no hidden layer)
  • Can only solve linearly separable problems

4.2 Multi-Layer Feed-Forward Network (MLP)

  • Input → One or more hidden layers → Output
  • Universal approximator (Cybenko 1989)
  • Fully connected (dense) layers

4.3 Recurrent Networks (RNN, LSTM, GRU)

  • Have loops → memory of previous inputs
  • Used for sequences (text, time-series, speech

5. Learning Techniques

Type Description Example Algorithm
Supervised Input + correct output given Backpropagation
Unsupervised Only input, find patterns Autoencoders, Hebbian
Reinforcement Reward signal Not covered in this unit

6. Perceptron Learning Rule & Convergence

Rosenblatt’s Perceptron (1962)

Update rule (when mistake):
w(new) = w(old) + η (y − ŷ) x
b(new) = b(old) + η (y − ŷ)

where η = learning rate, y = true label (±1), ŷ = prediction

Convergence Theorem: If data is linearly separable, perceptron will converge in finite steps.

7. Auto-Associative vs Hetero-Associative Memory

Type Input = Key Output Example Network
Auto-Associative Pattern itself Same pattern (denoising/completion) Hopfield Network, Denoising Autoencoder
Hetero-Associative Pattern A Different pattern B BAM, Sequence memory

Hopfield Network (auto-associative) is classic example in this unit.

Best Code Examples (From Scratch + PyTorch)

Example 1: Single Artificial Neuron from Scratch (NumPy)

import numpy as np
import matplotlib.pyplot as plt

class Neuron:
    def __init__(self, n_inputs, activation='relu'):
        self.W = np.random.randn(n_inputs) * 0.01
        self.b = np.zeros(1)
        self.activation = activation

    def forward(self, X):
        z = X @ self.W + self.b
        return self.activate(z)

    def activate(self, z):
        if self.activation == 'sigmoid':
            return 1 / (1 + np.exp(-z))
        elif self.activation == 'tanh':
            return np.tanh(z)
        elif self.activation == 'relu':
            return np.maximum(0, z)
        elif self.activation == 'leaky_relu':
            return np.where(z > 0, z, z * 0.01)
        elif self.activation == 'swish':
            return z / (1 + np.exp(-z))

Example 2: Perceptron Learning on AND Gate (Convergence Demo)

import numpy as np

# AND dataset
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 0, 0, 1])       # labels 0 or 0/1

# Perceptron class
class Perceptron:
    def __init__(self, lr=0.1, epochs=1000):
        self.lr = lr
        self.epochs = epochs

    def fit(self, X, y):
        self.W = np.zeros(X.shape[1])
        self.b = 0
        self.errors = []

        for epoch in range(self.epochs):
            error_count = 0
            for xi, target in zip(X, y):
                z = np.dot(xi, self.W) + self.b
                prediction = 1 if z >= 0 else 0
                update = self.lr * (target - prediction)
                self.W += update * xi
                self.b += update
                error_count += int(update != 0.0)
            self.errors.append(error_count)
            if error_count == 0:
                print(f"Converged at epoch {epoch}")
                break
        return self

    def predict(self, X):
        return np.where((X @ self.W + self.b) >= 0, 1, 0)

# Train
p = Perceptron(lr=0.1).fit(X, y)
print("Learned weights:", p.W, "bias:", p.b)
print("Predictions:", p.predict(X))

Example 3: Multilayer Perceptron from Scratch (Backpropagation)

import numpy as np

class MLP:
    def __init__(self, layers=[2, 4, 1], activation='sigmoid'):
        self.layers = layers
        self.activation = activation
        self.W = []
        self.B = []
        for i in range(len(layers)-1):
            self.W.append(np.random.randn(layers[i], layers[i+1]) * 0.5)
            self.B.append(np.zeros((1, layers[i+1])))

    def sigmoid(self, z): return 1/(1+np.exp(-z))
    def sigmoid_prime(self, z): return z*(1-z)
    def relu(self, z): return np.maximum(0, z)
    def relu_prime(self, z): return (z > 0).astype(float)

    def forward(self, X):
        a = X
        self.activations = [a]
        self.zs = []
        for w, b in zip(self.W, self.B):
            z = a @ w + b
            self.zs.append(z)
            if self.activation == 'sigmoid':
                a = self.sigmoid(z)
            else:
                a = self.relu(z)
            self.activations.append(a)
        return a

    def backward(self, X, y, lr=0.1):
        m = X.shape[0]
        # forward again forward to get caches
        self.forward(X)
        # output error
        if self.activation == 'sigmoid':
            delta = (self.activations[-1] - y) * self.sigmoid_prime(self.activations[-1])
        else:
            delta = (self.activations[-1] - y) * self.relu_prime(self.activations[-1])

        dW = self.activations[-2].T @ delta / m
        dB = np.sum(delta, axis=0, keepdims=True) / m

        self.W[-1] -= lr * dW
        self.B[-1] -= lr * dB

        # backprop through layers
        for l in range(2, len(self.layers)):
            if self.activation == 'sigmoid':
                delta = (delta @ self.W[-l+1].T) * self.sigmoid_prime(self.activations[-l])
            else:
                delta = (delta @ self.W[-l+1].T) * self.relu_prime(self.activations[-l])
            dW = self.activations[-l-1].T @ delta / m
            dB = np.sum(delta, axis=0, keepdims=True) / m
            self.W[-l] -= lr * dW
            self.B[-l] -= lr * dB

Example 4: Simple Auto-Associative Memory – Hopfield Network (Classic)

import numpy as np

class HopfieldNetwork:
    def __init__(self, n_neurons):
        self.n = n_neurons
        self.W = np.zeros((n_neurons, n_neurons))

    def train(self, patterns):
        # patterns: list of ±1 vectors of length n
        for p in patterns:
            p = p.reshape(-1,1)
            self.W += p @ p.T
        np.fill_diagonal(self.W, 0)
        self.W /= len(patterns)

    def predict(self, pattern, steps=10):
        s = pattern.copy()
        for _ in range(steps):
            for i in np.random.permutation(self.n):
                s[i] = 1 if (self.W[i] @ s) > 0 else -1
        return s

# Example: store two patterns
p1 = np.array([1, 1, -1, -1])
p2 = np.array([-1, -1, 1, 1])
net = HopfieldNetwork(4)
net.train([p1, p2])

noisy = np.array([1, -1, -1, -1])
recovered = net.predict(noisy)
print("Noisy:", noisy)
print("Recovered:", recovered)  # should be p1

Key Takeaway Summary Table

Concept Key Idea Can Solve Classic Algorithm/Example
Single Neuron Weighted sum + activation Linear decision McCulloch-Pitts
Perceptron Learns linear separator AND, OR, NOT Rosenblatt 1962
MLP + Backprop Universal approximator with hidden layers XOR, nonlinear data Rumelhart 1986
Recurrent Networks Loops → memory Sequences Elman, LSTM, GRU
Auto-associative memory Network recalls complete pattern from partial Denoising, pattern completion Hopfield Network

These notes + code will give you 100% conceptual clarity and practical implementation ability for Unit I. Practice by implementing XOR with MLP from scratch — it’s the classic test that single perceptron fails, MLP succeeds.

Happy learning! 🚀

Last updated: Nov 30, 2025