Cloud  /  Azure

Microsoft Azure 26 guides · updated 2026

Practical guides to Azure compute, networking, storage, and data services — built for engineers running production workloads on Microsoft's cloud.

Azure Data Lake Storage Gen2: Hierarchical Namespace Object Storage for Analytics

Azure Data Lake Storage Gen2 (ADLS Gen2) is not a separate service — it is Azure Blob Storage with the hierarchical namespace (HNS) feature enabled. That single toggle changes how the storage layer represents directories: instead of simulating folders through key prefixes, it creates real directory objects. This sounds like a small implementation detail, but it has significant performance and security consequences for analytics workloads.

Rename and delete operations on a directory in Blob Storage without HNS are O(n) — Azure must update every blob under the prefix. With HNS, a directory rename is O(1) — a single metadata update. At the scale of data engineering (millions of files, petabyte datasets), this difference matters enormously for job completion times.


Real-World Scenario

A telecom company runs a Databricks pipeline that processes 2 billion CDR (Call Detail Records) per day. The pipeline writes to Parquet files partitioned by date and region. At the end of each day, it renames the staging directory to production. Without HNS, renaming a directory with 40 million files takes over an hour. With ADLS Gen2 HNS, the same rename completes in milliseconds. The entire job can run within its 4-hour SLA window.


Gen1 vs. Gen2 Differences

Azure Data Lake Storage Gen1 was a separate service (not built on Blob Storage) that Microsoft has retired. Gen2 replaced it:

Comparison: ADLS Gen1 vs Gen2
-------------------------------
Feature | Gen1 (Retired) | Gen2
------------------|---------------------|--------------------------------
Storage base | Custom service | Azure Blob Storage + HNS
Pricing model | Separate SKU | Blob Storage pricing (lower)
Protocol | WebHDFS only | Blob REST, ABFS, WebHDFS, NFS
POSIX ACLs | Yes | Yes (improved implementation)
Global redundancy | No (region only) | LRS, ZRS, GRS, RA-GRS
Blob features | No | Yes (lifecycle, tiers, versioning)
Lifecycle mgmt | No | Yes (move to Cool/Archive)
Integration | HDInsight focused | Synapse, Databricks, HDInsight, ADF

If you are on Gen1, Microsoft has provided migration tooling (WANdisco Fusion, ADF copy activity with HNS awareness) to move data to Gen2.


Hierarchical Namespace and POSIX ACLs

With HNS enabled, the storage account presents a true directory tree. Permissions follow the POSIX model: access ACLs (who can read/write/execute this object) and default ACLs (inherited by new children):

ADLS Gen2 Directory Tree
--------------------------
/
├── raw/
│ ├── cdr/
│ │ ├── 2024/
│ │ │ ├── 06/
│ │ │ │ ├── 15/
│ │ │ │ │ ├── region=US/
│ │ │ │ │ └── region=EU/
ACL on /raw/cdr/:
Owner: pipeline-service-principal rwx
Group: data-engineers r-x
Other: ---
Default ACL (inherited by new directories):
data-engineers r-x
pipeline-sp rwx

ACLs are set using the storage SDK, Azure CLI, or Azure Storage Explorer. Managed identities assigned appropriate roles (Storage Blob Data Contributor, Storage Blob Data Reader) interact cleanly with ACL-protected directories.


Azure Blob File System (ABFS) Driver

Analytics engines like Databricks, Synapse Spark, and HDInsight access ADLS Gen2 via the ABFS driver, which speaks the Blob REST protocol but adds HNS-awareness. The URI scheme is:

abfss://<container>@<storage_account>.dfs.core.windows.net/<path>
Example:
abfss://datalake@mycompanyadls.dfs.core.windows.net/raw/cdr/2024/06/15/

The .dfs.core.windows.net endpoint (as opposed to .blob.core.windows.net) routes through the HNS-optimised code path for directory operations. Using the wrong endpoint for a HNS-enabled account works but loses the directory operation performance benefits.


Mounting in Databricks

# Databricks notebook: mount ADLS Gen2 with service principal
configs = {
"fs.azure.account.auth.type":
"OAuth",
"fs.azure.account.oauth.provider.type":
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id":
"<service_principal_client_id>",
"fs.azure.account.oauth2.client.secret":
dbutils.secrets.get(scope="kv", key="sp-secret"),
"fs.azure.account.oauth2.client.endpoint":
"https://login.microsoftonline.com/<tenant_id>/oauth2/token",
}
dbutils.fs.mount(
source="abfss://datalake@mycompanyadls.dfs.core.windows.net/",
mount_point="/mnt/datalake",
extra_configs=configs
)
# Access data
df = spark.read.parquet("/mnt/datalake/raw/cdr/2024/06/15/")

For new Databricks workspaces, Unity Catalog with service principal credential passthrough or Managed Identity is preferred over mount points — it provides per-user data access governance.


Lifecycle Management for Analytics Data

ADLS Gen2 inherits Blob Storage lifecycle management. Analytics workloads typically follow a hot-warm-cold pattern:

Lifecycle Policy for Data Lake
--------------------------------
/raw/ (landing zone, hot access for 30 days)
After 30 days -> move to Cool tier
After 365 days -> move to Archive
/processed/ (query result cache, hot for 7 days)
After 7 days -> Cool
After 90 days -> delete
/archive/ (compliance hold, do not delete)
Apply immutability policy (WORM)

Tiering works at the file level, so you can tier old date-partitioned directories without touching recent ones.


Architecture: Medallion Pattern on ADLS Gen2

[Source Systems]
|
Azure Data Factory / Event Hubs
|
[Bronze Layer] /raw/<source>/<date>/
Raw files as-received (Parquet, JSON, CSV)
Immutable after landing
|
[Databricks / Synapse Spark job]
|
[Silver Layer] /clean/<domain>/<date>/
Validated, de-duplicated, schema-aligned
Delta Lake format
|
[Databricks job / SQL Pool]
|
[Gold Layer] /curated/<subject>/<date>/
Aggregated, business-ready
Served to Power BI, Synapse SQL Serverless

Each layer is a directory hierarchy in ADLS Gen2. ACLs restrict write access to the transformation service principal and read access to analysts per layer.


Key Interview Points


Best Practices