Cloud  /  Google Cloud

GCP Google Cloud Platform 25 guides · updated 2026

Guides to BigQuery, Vertex AI, GKE, Dataflow, and the rest of Google's data- and AI-first cloud — written for engineers shipping real workloads.

Google Cloud Storage: Object Storage With Four Tiers and Fine-Grained Access Control

Google Cloud Storage (GCS) stores objects — files, backups, model artifacts, log archives, media, anything you can put in a byte stream — in flat namespaces called buckets. The simplicity is intentional: no directory hierarchy to manage, no RAID to configure, no capacity planning. You put objects in, you get them out, and GCS handles durability, replication, and geographic distribution.

What makes GCS worth understanding deeply is the access control model, the four storage classes and their cost/latency trade-offs, lifecycle policies, and the signed URL mechanism for temporary public access.


Storage Architecture

┌──────────────────────────────────────────────────────────┐
│ Google Cloud Storage │
├──────────────────────────────────────────────────────────┤
│ Bucket (globally unique name) │
│ ├── Objects (files with metadata) │
│ │ ├── object data (bytes) │
│ │ └── metadata (content-type, custom key-value pairs) │
│ ├── Storage class (Standard / Nearline / Coldline / │
│ │ Archive) │
│ ├── Location (region / dual-region / multi-region) │
│ └── Access controls (IAM + optional ACLs) │
└──────────────────────────────────────────────────────────┘

Objects are addressed by gs://bucket-name/object-name. The object name can include slashes, which the console renders as folders — but there are no actual directories. Every object is a flat key in the bucket namespace.


The Four Storage Classes

Choosing the right class is the primary cost lever in GCS. The trade-off is retrieval cost vs storage cost: cheaper storage classes charge more per GB retrieved.

┌─────────────┬──────────────┬─────────────────┬──────────────────────────────────┐
│ Class │ Storage cost │ Retrieval cost │ Minimum storage duration │
├─────────────┼──────────────┼─────────────────┼──────────────────────────────────┤
│ Standard │ Highest │ Free │ None │
│ Nearline │ Lower │ $0.01/GB │ 30 days │
│ Coldline │ Lower still │ $0.02/GB │ 90 days │
│ Archive │ Lowest │ $0.05/GB │ 365 days │
└─────────────┴──────────────┴─────────────────┴──────────────────────────────────┘

Standard is for data accessed frequently — serving website assets, storing datasets that pipelines read daily, holding model artifacts that inference jobs load at startup.

Nearline suits data you access roughly once a month — monthly compliance exports, infrequently used database backups.

Coldline is for quarterly or less frequent access — disaster recovery archives, historical audit logs.

Archive is the lowest-cost tier and is designed for data you might need once a year or less — long-term regulatory retention, tape replacement. Retrieval takes milliseconds (not hours like some competing archive services), but the retrieval fee makes frequent access expensive.


IAM vs ACLs: Access Control Models

GCS supports two access control mechanisms that can coexist but serve different purposes.

Uniform bucket-level access (recommended) uses only IAM. You grant roles to principals at the project, bucket, or — with Conditions — object prefix level. IAM controls are inherited down the hierarchy.

Organization IAM
└── Project IAM
└── Bucket IAM ← bucket-level permissions

Common IAM roles for GCS:

Legacy ACLs attach to individual objects and bucket-level defaults. They pre-date IAM and are still supported, but Google recommends uniform bucket-level access for new buckets because mixing IAM and ACLs creates confusing permission models.


Lifecycle Management

Lifecycle rules automate object transitions and deletions based on age, storage class, or version status. This is how you implement cost-optimized data retention automatically.

Example policy: keep objects in Standard for 30 days, move to Nearline for 60 days, then delete.

{
"lifecycle": {
"rule": [
{
"action": { "type": "SetStorageClass", "storageClass": "NEARLINE" },
"condition": { "age": 30 }
},
{
"action": { "type": "Delete" },
"condition": { "age": 90 }
}
]
}
}

Apply via CLI:

Terminal window
gcloud storage buckets update gs://my-bucket \
--lifecycle-file=lifecycle.json

Lifecycle rules also work with versioning. You can delete non-current versions after N days, keeping only the most recent N versions, or deleting objects marked as deleted after a retention window.


Signed URLs: Temporary Delegated Access

Signed URLs grant time-limited read or write access to a specific object without requiring the requester to have a GCP account or IAM role. The URL embeds a cryptographic signature created by a service account.

Use cases:

from google.cloud import storage
from datetime import timedelta
client = storage.Client()
bucket = client.bucket("my-bucket")
blob = bucket.blob("reports/q3-2025-summary.pdf")
url = blob.generate_signed_url(
version="v4",
expiration=timedelta(hours=2),
method="GET",
)
print(url)
# https://storage.googleapis.com/my-bucket/reports/q3-2025-summary.pdf?...

The URL is valid for exactly 2 hours. After expiry, any access attempt returns 403.


Customer-Managed Encryption Keys (CMEK)

By default GCS encrypts all data at rest with Google-managed keys. CMEK lets you supply your own key from Cloud KMS, giving you control over the key lifecycle — including the ability to revoke access to all data protected by that key by disabling or destroying the key.

Terminal window
# Create a Cloud KMS key ring and key
gcloud kms keyrings create gcs-keyring --location=us-central1
gcloud kms keys create gcs-key \
--location=us-central1 \
--keyring=gcs-keyring \
--purpose=encryption
# Create a bucket using that key
gcloud storage buckets create gs://my-encrypted-bucket \
--default-kms-key=projects/my-project/locations/us-central1/keyRings/gcs-keyring/cryptoKeys/gcs-key

CMEK is typically required by customers with compliance obligations (PCI-DSS, HIPAA, financial regulations) that mandate customer control over encryption keys.


Object Versioning

When versioning is enabled, overwriting or deleting an object does not destroy the previous version — it becomes a non-current version. You can list, restore, or permanently delete non-current versions.

Terminal window
# Enable versioning
gcloud storage buckets update gs://my-bucket --versioning
# List all versions of an object (including non-current)
gcloud storage ls -a gs://my-bucket/important-file.csv
# Restore a specific version by copying it back as current
gcloud storage cp \
gs://my-bucket/important-file.csv#1698765432000000 \
gs://my-bucket/important-file.csv

Versioning combined with lifecycle rules (delete non-current versions after 30 days) gives you a rolling backup window without unbounded storage growth.


GCS in Data Pipelines

GCS is the default staging area for nearly every GCP data service:

External data sources
GCS bucket (raw zone)
├──► Dataflow pipeline ──► BigQuery (analytics layer)
├──► Dataproc job ──► GCS (processed zone) ──► BigQuery
└──► BigQuery batch load (free, direct from GCS)

Best practices for pipeline use:


Summary

GCS is intentionally simple at the API level — buckets, objects, metadata — but the operational depth is in access control, storage class selection, lifecycle automation, and integration patterns. The right storage class saves substantial money at scale: migrating cold data from Standard to Archive can cut storage costs by 90% for data that legitimately needs year-long retention. Lifecycle rules automate that migration without manual intervention. Signed URLs and CMEK address the two most common enterprise requirements: temporary external sharing and regulatory key control.