Data Engineering  /  Security

🔒 Data Security 11 guides · updated 2026

Protecting data through its whole lifecycle — encryption, access control, masking, and the compliance frameworks (GDPR, SOC 2) that shape modern data platforms.

What “Data at Rest” Actually Means

Data at rest is any data not currently moving between systems — files on a disk, records in a database, objects in cloud storage, database backups, archived logs. It’s sitting somewhere, not being transmitted, waiting to be read.

The threat model is different from data in transit. You’re not worried about someone intercepting a network connection. You’re worried about someone gaining direct access to the underlying storage — through a stolen laptop, a misconfigured cloud bucket, a rogue employee, physical server theft, or a compromised backup media.

Encryption is the primary control. If data is encrypted at rest and the attacker doesn’t have the decryption key, the data is unreadable even if they have the raw storage. Every other control — access management, classification, auditing — layers on top of this foundation.


Encryption Standards That Matter

AES-256: The current standard for symmetric encryption of data at rest. Used in AWS S3 SSE, GCP Cloud Storage, Azure Blob Storage, and virtually every other enterprise storage service. When someone says “encrypted at rest,” they almost always mean AES-256 in CBC or GCM mode.

AES-GCM vs AES-CBC: GCM (Galois/Counter Mode) is preferred for new implementations. It provides both confidentiality and integrity verification (authenticated encryption), meaning tampering with the ciphertext can be detected. CBC mode provides confidentiality only.

ChaCha20-Poly1305: An alternative to AES used in some protocols (WireGuard, TLS 1.3 cipher suites) — faster on hardware without AES acceleration. Less common for data-at-rest encryption but increasingly relevant in mobile and embedded contexts.

For most organizations, the practical question isn’t which algorithm to use — it’s whether encryption is actually enabled and configured correctly.


The Encryption Landscape by Storage Type

Storage Type | Default Encryption? | What to Verify
──────────────────────|──────────────────────|─────────────────────────────
S3 (AWS) | Yes (SSE-S3) | SSE-KMS for key control
GCS (GCP) | Yes (Google-managed) | CMEK for customer key control
Azure Blob | Yes (Azure-managed) | CMK for customer key control
RDS / Cloud SQL | Yes at storage level | Also encrypt backups
Snowflake | Yes (AES-256) | Tri-Secret Secure if needed
PostgreSQL on EC2 | No by default | Use LUKS or pgcrypto
Physical disks | No by default | BitLocker / LUKS full-disk
Backups | Varies | Always verify separately

Cloud storage services encrypt by default now, but “encrypted by default” and “encrypted with keys you control” are different things. Default encryption uses keys the cloud provider manages. Customer-managed keys (CMK/CMEK) let you hold the keys yourself, with implications for who can access data and what happens if you rotate or revoke a key.


Key Management: Where Most Failures Happen

Encryption is only as strong as the key management around it. Storing the encryption key next to the encrypted data is equivalent to leaving a house key under the doormat.

What good key management looks like:

Key Hierarchy
Root Key (rarely used, stored in HSM)
|
v
Key Encryption Key (KEK)
- Stored in KMS (AWS KMS / GCP KMS / Azure Key Vault)
- Rotated on schedule or on event
|
v
Data Encryption Key (DEK)
- Different key per dataset or per record
- Stored encrypted (wrapped by KEK)
- Decrypted in memory when data is accessed, never stored plaintext

This hierarchy means a compromised DEK only exposes one dataset. Rotating the KEK re-encrypts all DEKs without touching the data itself.

Key services by cloud:

Key rotation should be automated and scheduled. Most compliance frameworks (PCI-DSS, NIST) recommend annual rotation at minimum; monthly or quarterly is better for sensitive data. Most cloud KMS services support automatic rotation.


Database-Level Encryption Controls

Full-disk or volume-level encryption protects against physical storage theft, but once the database is running, it reads plaintext. Column-level encryption adds a second layer that persists even when the database is operating normally.

Transparent Data Encryption (TDE): Available in SQL Server, Oracle, and MySQL Enterprise. Encrypts data files, log files, and backups at the database engine level. The database decrypts transparently on access — users and applications don’t need to change anything.

Column-level encryption: Encrypts specific sensitive columns (SSN, credit card numbers, health records) within the database. The key is not held by the database engine — queries that need to decrypt must provide the key. This limits exposure if a database admin account is compromised.

-- PostgreSQL column encryption with pgcrypto
-- Storing encrypted
INSERT INTO customers (id, email_encrypted)
VALUES (
1,
pgp_sym_encrypt('user@example.com', current_setting('app.encrypt_key'))
);
-- Reading decrypted
SELECT
id,
pgp_sym_decrypt(email_encrypted::bytea, current_setting('app.encrypt_key')) AS email
FROM customers;

Row-level security complements encryption by controlling which users can read which rows. In Snowflake, BigQuery, and PostgreSQL, you can define policies that filter data at query time based on the authenticated user’s role or attributes.


Cloud Storage: Practical Configuration

AWS S3 — enforcing SSE-KMS at the bucket level:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-sensitive-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "aws:kms"
}
}
}
]
}

This bucket policy rejects any upload that doesn’t use KMS encryption — preventing accidental plaintext uploads even by authorized users.

Public bucket misconfiguration remains one of the most common causes of data exposure. CloudTrail, S3 Access Analyzer, and tools like AWS Security Hub should be monitoring for buckets with public access enabled.


Backup Encryption: The Overlooked Gap

Backups are a common vector for data exposure. The production database may be fully encrypted, but:

Best practices for backup encryption:


Data Classification and Tiered Controls

Not all data requires the same level of protection. A practical classification scheme:

Data Classification
Level 1 — Public
- No encryption required for rest
- Standard access controls
- Example: published documentation, marketing content
Level 2 — Internal
- Encryption at rest recommended
- Role-based access controls
- Example: internal metrics, operational data
Level 3 — Confidential
- Encryption at rest required (AES-256)
- Customer-managed keys
- Access logging mandatory
- Example: employee records, financial data
Level 4 — Restricted
- Encryption at rest required
- Column-level encryption for most sensitive fields
- HSM-backed key storage
- Strict need-to-know access
- Example: PHI, PCI data, authentication credentials

Many organizations classify data at the table or dataset level and enforce controls through policy. Data catalogs (Collibra, Alation, Datahub) can store classification metadata and connect to access control systems to enforce it automatically.


Audit Logging for Storage Access

Encryption protects against unauthorized access; audit logs detect it and enable investigation. Logging should capture:

All major cloud platforms include audit log services:

These logs should be:


What’s Changed in 2025–2026

Confidential computing has moved from research to mainstream cloud. AWS Nitro Enclaves, GCP Confidential VMs, and Azure Confidential Computing allow computation on encrypted data without decrypting it in untrusted memory. Useful for highly sensitive financial and healthcare workloads.

Post-quantum preparedness: NIST finalized post-quantum cryptography standards in 2024. For data encrypted today with RSA-wrapped keys, a future quantum computer could potentially decrypt it. Organizations managing long-retention sensitive data (healthcare records, government archives) are beginning migration planning now.

AI training data scrutiny: Regulators and data protection authorities are increasingly examining whether personal data stored for AI model training has adequate controls. The same encryption and access controls expected for production data now apply to training datasets and feature stores.


Securing data at rest is not a one-time configuration. It’s a set of controls that need to be enforced through policy, verified through auditing, and maintained as the technology landscape changes. Start with the fundamentals — enable encryption, control the keys, classify your data, and audit access — and build from there.