Data Engineering  /  Security

🔒 Data Security 11 guides · updated 2026

Protecting data through its whole lifecycle — encryption, access control, masking, and the compliance frameworks (GDPR, SOC 2) that shape modern data platforms.

Why Data in Transit Has a Different Threat Model

When data moves — over the internet, between microservices, across internal networks — the attack surface is different from stored data. Physical access to the storage hardware isn’t the concern. The risks are:

The primary defense for all of these is correctly configured TLS. Everything else builds on that foundation.


TLS: The Core Protocol

Transport Layer Security (TLS) is the protocol that secures the majority of data-in-transit on the internet. When you see “https://” in a URL, you’re using TLS. It provides:

TLS 1.3 Handshake (simplified)
Client Server
| |
|──── ClientHello ──────────────> |
| (supported cipher suites, |
| key share, TLS version) |
| |
| <──── ServerHello ─────────────-|
| (chosen cipher suite, |
| server key share, |
| certificate) |
| |
| [Both sides derive session key |
| from the key shares — the |
| private key never travels] |
| |
|──── Finished ─────────────────> |
| <──── Finished ─────────────────|
| |
|==== Encrypted Application Data ===|

TLS 1.3 (finalized in 2018, now dominant) eliminated several cipher suites known to be weak and reduced the handshake to one round trip instead of two. It also removed RSA key exchange in favor of Diffie-Hellman variants that provide forward secrecy.


TLS Version Requirements in 2025

The minimum acceptable version is TLS 1.2. TLS 1.0 and 1.1 are deprecated, disabled in all major browsers, and prohibited by PCI-DSS 4.0.

TLS 1.2 is still widely used and is secure when configured correctly. The danger is in the cipher suite selection — TLS 1.2 supports many weak cipher suites that should be disabled.

TLS 1.3 should be the default for new systems. It only supports modern cipher suites (no configuration required to avoid weak ones) and requires forward secrecy by design.

Checking your TLS configuration:

Terminal window
# Check what TLS versions a server supports
nmap --script ssl-enum-ciphers -p 443 yourdomain.com
# Test a specific server's TLS configuration
openssl s_client -connect yourdomain.com:443 -tls1_2
openssl s_client -connect yourdomain.com:443 -tls1_3
# Check certificate and chain
curl -vI https://yourdomain.com 2>&1 | grep -E "SSL|TLS|issuer|expire"

Tools like SSL Labs (ssllabs.com/ssltest) give a comprehensive graded report on any public-facing HTTPS endpoint.


Cipher Suite Configuration for TLS 1.2

TLS 1.3 has a safe default set of cipher suites. TLS 1.2 requires explicit configuration to disable weak options. A good configuration for Nginx:

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305';
ssl_prefer_server_ciphers on;
ssl_session_tickets off;
ssl_session_cache shared:SSL:10m;

What this does:

Cipher suites to disable: anything with RC4, DES, 3DES, export ciphers, NULL cipher, or anonymous Diffie-Hellman (ADH/AECDH). These appear in vulnerability scanner reports as HIGH or CRITICAL.


Mutual TLS (mTLS) for Service-to-Service Communication

Standard TLS authenticates the server to the client. Mutual TLS (mTLS) requires both parties to present a certificate. This is the standard for securing service-to-service communication in microservices architectures.

mTLS Authentication Flow
Service A Service B
| |
| Presents its certificate ────────> |
| | Verifies A's cert
| | against trusted CA
| <────── Presents its certificate |
| Verifies B's cert |
| against trusted CA |
| |
|====== Encrypted, mutually =========|
|====== authenticated channel =======|

Without mTLS, any service that can reach Service B’s port can make requests. With mTLS, only services with a certificate signed by the trusted CA can connect.

Service meshes like Istio and Linkerd implement mTLS automatically for every connection between pods in a Kubernetes cluster, without application code changes. This is the “zero trust” network model in practice — assume no trust even on the internal network.


Internal Network Traffic: The Overlooked Gap

External HTTPS is almost universal now. Internal service-to-service traffic is where gaps remain.

Common gaps:

# Verify that PostgreSQL connection is using SSL
psql "sslmode=verify-full sslrootcert=/path/to/ca.crt host=db.internal dbname=analytics user=app"
# Kafka producer with TLS
bootstrap.servers=kafka.internal:9093
security.protocol=SSL
ssl.truststore.location=/var/kafka/truststore.jks
ssl.keystore.location=/var/kafka/keystore.jks

In 2025, the standard expectation is that all connections — internal or external — use TLS. Network-level tools like Cilium (eBPF-based) can enforce this as a policy.


VPNs: Where They Still Fit

VPNs create an encrypted tunnel for all traffic from one endpoint to another. They’re appropriate for:

For cloud environments, VPNs (AWS Client VPN, GCP Cloud VPN, Azure VPN Gateway) are commonly used to secure the connection between on-premises infrastructure and cloud networks, replacing the need to expose internal systems to the public internet.

For modern microservices running entirely in cloud environments, mTLS and zero-trust network policies are generally a better fit than VPNs. VPNs provide a perimeter; zero trust provides per-connection authentication.


Secure File Transfer Protocols

When moving files between systems, the choice of protocol determines whether the transfer is encrypted:

Protocol Comparison for File Transfer
Protocol | Encrypted? | Notes
──────────|────────────|────────────────────────────────────
FTP | No | Plaintext credentials and data
FTPS | Yes | FTP over TLS — can be complex to firewall
SFTP | Yes | SSH-based, widely supported, preferred
SCP | Yes | SSH-based, simpler than SFTP, less flexible
HTTPS | Yes | Best for web-facing file transfers
rsync+SSH | Yes | Efficient for large directory syncs

SFTP (not to be confused with FTPS) is the standard recommendation for file transfers in data pipelines. It uses SSH for authentication and encryption, supports key-based auth, and is widely supported by data sources and destinations.


Email and Messaging Security

Email is a high-value target because it frequently carries sensitive data. Controls for securing email in transit:

STARTTLS: Opportunistic encryption for SMTP. If both servers support it, the connection is encrypted. If one doesn’t, it falls back to plaintext. Better than nothing, but not guaranteed.

DANE (DNS-based Authentication of Named Entities): Uses DNSSEC to publish the expected certificate for a mail server, preventing downgrade attacks. More secure than STARTTLS alone.

DKIM (DomainKeys Identified Mail): Adds a cryptographic signature to outbound emails, verifiable by the recipient’s mail server. Protects against email spoofing and tampering in transit.

S/MIME and PGP: End-to-end encryption where the message is encrypted by the sender using the recipient’s public key. Only the recipient can decrypt it, even if the mail server is compromised.


Certificate Management at Scale

Manual certificate management doesn’t scale. Expired certificates cause outages; mismatched certificates cause security incidents.

Let’s Encrypt with ACME protocol: Free, automated certificate issuance and renewal. The cert-manager tool (Kubernetes) or certbot (standalone) automates this entirely. Certificates renew 60 days before expiration.

Internal PKI: For internal services, run your own Certificate Authority using tools like Vault PKI, CFSSL, or step-ca. Issue short-lived certificates (24-hour validity) that renew automatically — this limits the impact of a compromised certificate.

Certificate Transparency (CT): All certificates issued by public CAs are logged in public CT logs. Monitor these logs for unexpected certificates issued for your domain (tools like crt.sh provide free lookup and alerting).


HSTS and Other HTTP Security Headers

For web applications, HSTS (HTTP Strict Transport Security) tells browsers to always use HTTPS for the domain, preventing protocol downgrade attacks:

Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

This header tells browsers to refuse HTTP connections to this domain for the next year. preload adds the domain to the browser preload list, ensuring HTTPS even on the first visit.

Other relevant headers:


Monitoring and Detection

Encrypted traffic can’t be trivially inspected, but you can still monitor for anomalies:

Securing data in transit is a combination of configuration (enforce TLS 1.2+, disable weak cipher suites, enable HSTS), infrastructure (mTLS for internal services, VPN for remote access), and operations (automated certificate management, continuous monitoring). The technical controls are well understood — the gap is usually in enforcement and consistency across all services, including the ones nobody’s paying attention to.