Azure Data Lake Storage Gen2: Hierarchical Namespace Object Storage for Analytics
Azure Data Lake Storage Gen2 (ADLS Gen2) is not a separate service — it is Azure Blob Storage with the hierarchical namespace (HNS) feature enabled. That single toggle changes how the storage layer represents directories: instead of simulating folders through key prefixes, it creates real directory objects. This sounds like a small implementation detail, but it has significant performance and security consequences for analytics workloads.
Rename and delete operations on a directory in Blob Storage without HNS are O(n) — Azure must update every blob under the prefix. With HNS, a directory rename is O(1) — a single metadata update. At the scale of data engineering (millions of files, petabyte datasets), this difference matters enormously for job completion times.
Real-World Scenario
A telecom company runs a Databricks pipeline that processes 2 billion CDR (Call Detail Records) per day. The pipeline writes to Parquet files partitioned by date and region. At the end of each day, it renames the staging directory to production. Without HNS, renaming a directory with 40 million files takes over an hour. With ADLS Gen2 HNS, the same rename completes in milliseconds. The entire job can run within its 4-hour SLA window.
Gen1 vs. Gen2 Differences
Azure Data Lake Storage Gen1 was a separate service (not built on Blob Storage) that Microsoft has retired. Gen2 replaced it:
Comparison: ADLS Gen1 vs Gen2-------------------------------Feature | Gen1 (Retired) | Gen2------------------|---------------------|--------------------------------Storage base | Custom service | Azure Blob Storage + HNSPricing model | Separate SKU | Blob Storage pricing (lower)Protocol | WebHDFS only | Blob REST, ABFS, WebHDFS, NFSPOSIX ACLs | Yes | Yes (improved implementation)Global redundancy | No (region only) | LRS, ZRS, GRS, RA-GRSBlob features | No | Yes (lifecycle, tiers, versioning)Lifecycle mgmt | No | Yes (move to Cool/Archive)Integration | HDInsight focused | Synapse, Databricks, HDInsight, ADFIf you are on Gen1, Microsoft has provided migration tooling (WANdisco Fusion, ADF copy activity with HNS awareness) to move data to Gen2.
Hierarchical Namespace and POSIX ACLs
With HNS enabled, the storage account presents a true directory tree. Permissions follow the POSIX model: access ACLs (who can read/write/execute this object) and default ACLs (inherited by new children):
ADLS Gen2 Directory Tree--------------------------/├── raw/│ ├── cdr/│ │ ├── 2024/│ │ │ ├── 06/│ │ │ │ ├── 15/│ │ │ │ │ ├── region=US/│ │ │ │ │ └── region=EU/
ACL on /raw/cdr/: Owner: pipeline-service-principal rwx Group: data-engineers r-x Other: ---
Default ACL (inherited by new directories): data-engineers r-x pipeline-sp rwxACLs are set using the storage SDK, Azure CLI, or Azure Storage Explorer. Managed identities assigned appropriate roles (Storage Blob Data Contributor, Storage Blob Data Reader) interact cleanly with ACL-protected directories.
Azure Blob File System (ABFS) Driver
Analytics engines like Databricks, Synapse Spark, and HDInsight access ADLS Gen2 via the ABFS driver, which speaks the Blob REST protocol but adds HNS-awareness. The URI scheme is:
abfss://<container>@<storage_account>.dfs.core.windows.net/<path>
Example:abfss://datalake@mycompanyadls.dfs.core.windows.net/raw/cdr/2024/06/15/The .dfs.core.windows.net endpoint (as opposed to .blob.core.windows.net) routes through the HNS-optimised code path for directory operations. Using the wrong endpoint for a HNS-enabled account works but loses the directory operation performance benefits.
Mounting in Databricks
# Databricks notebook: mount ADLS Gen2 with service principalconfigs = { "fs.azure.account.auth.type": "OAuth", "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", "fs.azure.account.oauth2.client.id": "<service_principal_client_id>", "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="kv", key="sp-secret"), "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant_id>/oauth2/token",}
dbutils.fs.mount( source="abfss://datalake@mycompanyadls.dfs.core.windows.net/", mount_point="/mnt/datalake", extra_configs=configs)
# Access datadf = spark.read.parquet("/mnt/datalake/raw/cdr/2024/06/15/")For new Databricks workspaces, Unity Catalog with service principal credential passthrough or Managed Identity is preferred over mount points — it provides per-user data access governance.
Lifecycle Management for Analytics Data
ADLS Gen2 inherits Blob Storage lifecycle management. Analytics workloads typically follow a hot-warm-cold pattern:
Lifecycle Policy for Data Lake--------------------------------/raw/ (landing zone, hot access for 30 days) After 30 days -> move to Cool tier After 365 days -> move to Archive
/processed/ (query result cache, hot for 7 days) After 7 days -> Cool After 90 days -> delete
/archive/ (compliance hold, do not delete) Apply immutability policy (WORM)Tiering works at the file level, so you can tier old date-partitioned directories without touching recent ones.
Architecture: Medallion Pattern on ADLS Gen2
[Source Systems] | Azure Data Factory / Event Hubs |[Bronze Layer] /raw/<source>/<date>/ Raw files as-received (Parquet, JSON, CSV) Immutable after landing | [Databricks / Synapse Spark job] |[Silver Layer] /clean/<domain>/<date>/ Validated, de-duplicated, schema-aligned Delta Lake format | [Databricks job / SQL Pool] |[Gold Layer] /curated/<subject>/<date>/ Aggregated, business-ready Served to Power BI, Synapse SQL ServerlessEach layer is a directory hierarchy in ADLS Gen2. ACLs restrict write access to the transformation service principal and read access to analysts per layer.
Key Interview Points
- HNS is immutable after account creation: You cannot enable HNS on an existing storage account that does not have it. You must create a new account with HNS enabled and migrate data.
- NFS 4.1 support: ADLS Gen2 with HNS supports NFS 4.1 mount for Linux clients, making it usable as a file system for on-premises or VM workloads alongside analytics use.
- Not compatible with all Blob features: Some Blob Storage features do not work with HNS enabled (e.g., certain blob index tag query patterns, anonymous public access). Check compatibility before enabling HNS.
- ACL inheritance: Default ACLs are applied to new objects created under a directory, not to existing objects. Retrospective ACL changes must use
az storage fs access set-recursive. - Delta Lake and ADLS Gen2: Delta Lake’s atomic rename operations for log files require real directory semantics — HNS is a hard requirement for Delta Lake on Azure. Without HNS, rename-based atomicity breaks.
Best Practices
- Enable HNS at storage account creation — you cannot add it later without data migration.
- Use the
.dfs.core.windows.netendpoint for analytics tools to get the full benefit of HNS directory operation performance. - Apply POSIX ACLs at the directory level with default ACLs so new partitions inherit correct permissions automatically.
- Separate storage accounts per layer (bronze, silver, gold) rather than containers in one account — this gives independent access control, lifecycle policies, and billing visibility.
- Integrate ADLS Gen2 with Microsoft Purview for data lineage and cataloguing; Purview can scan ADLS Gen2 and automatically classify sensitive columns.