Artificial Neural Networks

Artificial neural networks are the computational substrate of deep learning. They are best understood as layered mathematical transformations that learn complex, nonlinear functions from data through gradient-based optimization.

The Neuron

A single neuron computes a weighted sum of its inputs plus a bias, then applies an activation function:

z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
output = activation(z)

The weights and bias are learned during training. The activation function introduces nonlinearity — without it, stacking layers would still produce a linear model.

Network Architecture

Input layer    Hidden layer 1    Hidden layer 2    Output layer
[x₁]    →→→   [h₁]              [h₄]              [y₁]
[x₂]    →→→   [h₂]     →→→     [h₅]     →→→      [y₂]
[x₃]    →→→   [h₃]              [h₆]

Input layer: Passes raw features — no computation
Hidden layers: Learn intermediate representations
Output layer: Produces predictions (logits, regression values)

Building a Neural Network in PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

class MLP(nn.Module):
    def __init__(self, input_dim, hidden_dims, output_dim, dropout=0.3):
        super().__init__()
        layers = []
        prev_dim = input_dim
        for hidden_dim in hidden_dims:
            layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                nn.ReLU(),
                nn.Dropout(dropout)
            ])
            prev_dim = hidden_dim
        layers.append(nn.Linear(prev_dim, output_dim))
        self.network = nn.Sequential(*layers)

    def forward(self, x):
        return self.network(x)

model = MLP(input_dim=20, hidden_dims=[128, 64, 32], output_dim=3)

Training Loop

optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss()

for epoch in range(50):
    model.train()
    for X_batch, y_batch in train_loader:
        logits = model(X_batch)
        loss = criterion(logits, y_batch)
        optimizer.zero_grad()
        loss.backward()       # Compute gradients via backpropagation
        optimizer.step()      # Update weights

    model.eval()
    with torch.no_grad():
        test_logits = model(X_test_tensor)
        acc = (test_logits.argmax(dim=1) == y_test_tensor).float().mean()
    print(f"Epoch {epoch+1}: Accuracy = {acc:.4f}")

Backpropagation

Backpropagation uses the chain rule to compute how each weight contributes to the loss:

∂Loss/∂w₁ = ∂Loss/∂output × ∂output/∂z × ∂z/∂w₁

PyTorch’s autograd tracks operations in the forward pass and computes all gradients automatically when loss.backward() is called. You don’t implement backprop manually — you design the forward pass and let the framework handle the rest.

Common Loss Functions

Task	Loss Function	PyTorch
Binary classification	Binary cross-entropy	`nn.BCEWithLogitsLoss()`
Multiclass	Cross-entropy	`nn.CrossEntropyLoss()`
Regression	MSE	`nn.MSELoss()`
Regression (robust)	Huber	`nn.HuberLoss()`

Neural networks are the foundation for CNNs, RNNs, Transformers, and every other deep learning architecture. Mastering this core loop — forward pass, loss, backward, update — is prerequisite for all advanced deep learning work.