Artificial Neural Networks: Foundations of Deep Learning

Learn how artificial neural networks work — neurons, layers, activation functions, backpropagation, and how to build your first neural network with PyTorch.

Artificial Neural Networks

Artificial neural networks are the computational substrate of deep learning. They are best understood as layered mathematical transformations that learn complex, nonlinear functions from data through gradient-based optimization.


The Neuron

A single neuron computes a weighted sum of its inputs plus a bias, then applies an activation function:

z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
output = activation(z)

The weights and bias are learned during training. The activation function introduces nonlinearity — without it, stacking layers would still produce a linear model.


Network Architecture

Input layer Hidden layer 1 Hidden layer 2 Output layer
[x₁] →→→ [h₁] [h₄] [y₁]
[x₂] →→→ [h₂] →→→ [h₅] →→→ [y₂]
[x₃] →→→ [h₃] [h₆]
  • Input layer: Passes raw features — no computation
  • Hidden layers: Learn intermediate representations
  • Output layer: Produces predictions (logits, regression values)

Building a Neural Network in PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
class MLP(nn.Module):
def __init__(self, input_dim, hidden_dims, output_dim, dropout=0.3):
super().__init__()
layers = []
prev_dim = input_dim
for hidden_dim in hidden_dims:
layers.extend([
nn.Linear(prev_dim, hidden_dim),
nn.BatchNorm1d(hidden_dim),
nn.ReLU(),
nn.Dropout(dropout)
])
prev_dim = hidden_dim
layers.append(nn.Linear(prev_dim, output_dim))
self.network = nn.Sequential(*layers)
def forward(self, x):
return self.network(x)
model = MLP(input_dim=20, hidden_dims=[128, 64, 32], output_dim=3)

Training Loop

optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss()
for epoch in range(50):
model.train()
for X_batch, y_batch in train_loader:
logits = model(X_batch)
loss = criterion(logits, y_batch)
optimizer.zero_grad()
loss.backward() # Compute gradients via backpropagation
optimizer.step() # Update weights
model.eval()
with torch.no_grad():
test_logits = model(X_test_tensor)
acc = (test_logits.argmax(dim=1) == y_test_tensor).float().mean()
print(f"Epoch {epoch+1}: Accuracy = {acc:.4f}")

Backpropagation

Backpropagation uses the chain rule to compute how each weight contributes to the loss:

∂Loss/∂w₁ = ∂Loss/∂output × ∂output/∂z × ∂z/∂w₁

PyTorch’s autograd tracks operations in the forward pass and computes all gradients automatically when loss.backward() is called. You don’t implement backprop manually — you design the forward pass and let the framework handle the rest.


Common Loss Functions

TaskLoss FunctionPyTorch
Binary classificationBinary cross-entropynn.BCEWithLogitsLoss()
MulticlassCross-entropynn.CrossEntropyLoss()
RegressionMSEnn.MSELoss()
Regression (robust)Hubernn.HuberLoss()

Neural networks are the foundation for CNNs, RNNs, Transformers, and every other deep learning architecture. Mastering this core loop — forward pass, loss, backward, update — is prerequisite for all advanced deep learning work.