Artificial Neural Networks
Artificial neural networks are the computational substrate of deep learning. They are best understood as layered mathematical transformations that learn complex, nonlinear functions from data through gradient-based optimization.
The Neuron
A single neuron computes a weighted sum of its inputs plus a bias, then applies an activation function:
z = w₁x₁ + w₂x₂ + ... + wₙxₙ + boutput = activation(z)The weights and bias are learned during training. The activation function introduces nonlinearity — without it, stacking layers would still produce a linear model.
Network Architecture
Input layer Hidden layer 1 Hidden layer 2 Output layer[x₁] →→→ [h₁] [h₄] [y₁][x₂] →→→ [h₂] →→→ [h₅] →→→ [y₂][x₃] →→→ [h₃] [h₆]- Input layer: Passes raw features — no computation
- Hidden layers: Learn intermediate representations
- Output layer: Produces predictions (logits, regression values)
Building a Neural Network in PyTorch
import torchimport torch.nn as nnimport torch.optim as optimfrom torch.utils.data import DataLoader, TensorDataset
class MLP(nn.Module): def __init__(self, input_dim, hidden_dims, output_dim, dropout=0.3): super().__init__() layers = [] prev_dim = input_dim for hidden_dim in hidden_dims: layers.extend([ nn.Linear(prev_dim, hidden_dim), nn.BatchNorm1d(hidden_dim), nn.ReLU(), nn.Dropout(dropout) ]) prev_dim = hidden_dim layers.append(nn.Linear(prev_dim, output_dim)) self.network = nn.Sequential(*layers)
def forward(self, x): return self.network(x)
model = MLP(input_dim=20, hidden_dims=[128, 64, 32], output_dim=3)Training Loop
optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)criterion = nn.CrossEntropyLoss()
for epoch in range(50): model.train() for X_batch, y_batch in train_loader: logits = model(X_batch) loss = criterion(logits, y_batch) optimizer.zero_grad() loss.backward() # Compute gradients via backpropagation optimizer.step() # Update weights
model.eval() with torch.no_grad(): test_logits = model(X_test_tensor) acc = (test_logits.argmax(dim=1) == y_test_tensor).float().mean() print(f"Epoch {epoch+1}: Accuracy = {acc:.4f}")Backpropagation
Backpropagation uses the chain rule to compute how each weight contributes to the loss:
∂Loss/∂w₁ = ∂Loss/∂output × ∂output/∂z × ∂z/∂w₁PyTorch’s autograd tracks operations in the forward pass and computes all gradients automatically when loss.backward() is called. You don’t implement backprop manually — you design the forward pass and let the framework handle the rest.
Common Loss Functions
| Task | Loss Function | PyTorch |
|---|---|---|
| Binary classification | Binary cross-entropy | nn.BCEWithLogitsLoss() |
| Multiclass | Cross-entropy | nn.CrossEntropyLoss() |
| Regression | MSE | nn.MSELoss() |
| Regression (robust) | Huber | nn.HuberLoss() |
Neural networks are the foundation for CNNs, RNNs, Transformers, and every other deep learning architecture. Mastering this core loop — forward pass, loss, backward, update — is prerequisite for all advanced deep learning work.