Mathematics is foundational to understanding how neural networks work. Several key mathematical functions and concepts play a critical role in the design and functionality of neural networks. Here are some of the most important ones:
- Linear Functions
- Weighting and Bias: In a simple neuron, the output is a weighted sum of the inputs plus a bias term.
- The formula is:
z = Σ(w_i * x_i) + b
where x_i are the inputs, w_i are the weights, and b is the bias. This is a linear transformation of the inputs.
- Activation Functions
Neural networks introduce non-linearity through activation functions, which help the model capture complex patterns in data. Some common activation functions include:- Sigmoid: Maps input values to the range (0, 1). The formula is:
σ(z) = 1 / (1 + e^-z) - ReLU (Rectified Linear Unit): Outputs the input directly if it’s positive; otherwise, it outputs zero:
f(z) = max(0, z) - Tanh: Similar to sigmoid but outputs values between -1 and 1. The formula is:
tanh(z) = (e^z – e^-z) / (e^z + e^-z) - Softmax: Converts a vector of values into probabilities that sum up to 1. It’s mainly used for multi-class classification problems:
softmax(z_i) = e^(z_i) / Σ(e^(z_j))
- Sigmoid: Maps input values to the range (0, 1). The formula is:
- Cost/Loss Functions
The cost function quantifies how far off the network’s predictions are from the actual labels. Some commonly used loss functions are:- Mean Squared Error (MSE): Used for regression tasks:
L(y, ŷ) = 1/n Σ(y – ŷ)^2 - Cross-Entropy Loss: Used for classification tasks:
L(y, ŷ) = -Σ y * log(ŷ)
- Mean Squared Error (MSE): Used for regression tasks:
- Gradient Descent and Backpropagation
Neural networks learn by minimizing the cost function using optimization algorithms like gradient descent. Backpropagation is used to compute the gradient of the cost function with respect to the weights, allowing the network to update its weights in the direction that reduces the error. Mathematically:- Gradient Descent: Updates weights w using the gradient of the loss function:
w = w – η * ∂L/∂w
where η is the learning rate. - Backpropagation: : This algorithm applies the chain rule of calculus to propagate the error backward through the network to update each layer’s weights efficiently.
- Gradient Descent: Updates weights w using the gradient of the loss function:
- Convolution (for CNNs)
In Convolutional Neural Networks (CNNs), convolution is used to detect features in images by applying filters (kernels) to input data. The convolution operation is defined as:
(f * g)(t) = ∫ f(τ) * g(t – τ) dτ
In practice, for discrete data like images, it’s often represented as a sum rather than an integral.
- Matrix Operations
Neural networks heavily rely on matrix multiplication, as inputs, weights, and outputs are often represented as vectors and matrices. The forward pass of a neural network involves multiplying matrices of inputs with weight matrices and then adding biases.
- Regularization (to prevent overfitting)
Regularization techniques, such as L2 regularization or dropout, are applied to neural networks to prevent them from overfitting. L2 regularization modifies the cost function by adding a term that penalizes large weights:
L = L_original + λ Σ w^2