Autoencoders

Autoencoders learn to compress data into a compact representation and then reconstruct it. No labels required. The bottleneck layer forces the network to learn the most essential features — everything else is discarded. This makes autoencoders powerful for dimensionality reduction, anomaly detection, and generative modeling.

Architecture

Input          Encoder          Latent Space        Decoder          Output
[x ∈ ℝ⁷⁸⁴]  → [128 → 64 →]  → [z ∈ ℝ¹⁶]  → [← 64 ← 128]  → [x̂ ∈ ℝ⁷⁸⁴]
                                      ↑
                               Bottleneck (compressed representation)

Loss = reconstruction error: ||x - x̂||²

The network is trained to minimize reconstruction error — any information not captured in the latent vector z is lost.

Basic Autoencoder in PyTorch

import torch
import torch.nn as nn

class Autoencoder(nn.Module):
    def __init__(self, input_dim=784, hidden_dims=[512, 256], latent_dim=32):
        super().__init__()

        # Encoder: compress to latent space
        encoder_layers = []
        prev_dim = input_dim
        for h_dim in hidden_dims:
            encoder_layers.extend([nn.Linear(prev_dim, h_dim), nn.ReLU()])
            prev_dim = h_dim
        encoder_layers.append(nn.Linear(prev_dim, latent_dim))
        self.encoder = nn.Sequential(*encoder_layers)

        # Decoder: reconstruct from latent space
        decoder_layers = []
        prev_dim = latent_dim
        for h_dim in reversed(hidden_dims):
            decoder_layers.extend([nn.Linear(prev_dim, h_dim), nn.ReLU()])
            prev_dim = h_dim
        decoder_layers.extend([nn.Linear(prev_dim, input_dim), nn.Sigmoid()])
        self.decoder = nn.Sequential(*decoder_layers)

    def forward(self, x):
        z = self.encoder(x)
        return self.decoder(z), z  # Return reconstruction and latent code

# Train with reconstruction loss
model = Autoencoder(784, [512, 256], 32)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

for epoch in range(50):
    for x_batch, _ in train_loader:  # Labels ignored!
        x_flat = x_batch.view(x_batch.size(0), -1)
        x_recon, z = model(x_flat)
        loss = criterion(x_recon, x_flat)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Variational Autoencoder (VAE)

Regular autoencoders learn a deterministic compressed representation. VAEs learn a probabilistic latent space — each input maps to a distribution, not a point. This enables generation of new data by sampling from the latent distribution.

class VAE(nn.Module):
    def __init__(self, input_dim, hidden_dim, latent_dim):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(input_dim, hidden_dim), nn.ReLU())
        self.fc_mu = nn.Linear(hidden_dim, latent_dim)     # Mean of distribution
        self.fc_logvar = nn.Linear(hidden_dim, latent_dim) # Log-variance
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim), nn.ReLU(),
            nn.Linear(hidden_dim, input_dim), nn.Sigmoid()
        )

    def encode(self, x):
        h = self.encoder(x)
        return self.fc_mu(h), self.fc_logvar(h)

    def reparameterize(self, mu, logvar):
        # Reparameterization trick: z = mu + eps * sigma (differentiable sampling)
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        return self.decoder(z)

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

def vae_loss(recon_x, x, mu, logvar, beta=1.0):
    # Reconstruction loss + KL divergence
    recon_loss = nn.functional.binary_cross_entropy(recon_x, x, reduction='sum')
    kl_div = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return recon_loss + beta * kl_div

Anomaly Detection

Autoencoders trained on normal data have high reconstruction error on anomalies:

# Train only on normal samples
model.train()
# ... training loop on normal data only ...

# Detect anomalies at inference
model.eval()
with torch.no_grad():
    x_recon, z = model(X_test)
    recon_errors = ((X_test - x_recon) ** 2).mean(dim=1)

# Samples with high reconstruction error = anomalies
threshold = recon_errors.quantile(0.95)  # Top 5% = anomalies
anomalies = recon_errors > threshold

Applications

Use Case	Autoencoder Type
Dimensionality reduction	Standard AE
Anomaly / fraud detection	Standard AE
Image denoising	Denoising AE
Data generation	VAE, Diffusion
Feature learning	Standard AE
Semi-supervised learning	Standard AE + fine-tune

Autoencoders are often the simplest path to anomaly detection in tabular and image data — they require no labels and the reconstruction error provides a natural, interpretable anomaly score.