Azure Managed HSM and Encryption Keys

A regulator asks a single question during an audit: who, other than your own named administrators, can reach the cryptographic keys that protect your customer data? If the honest answer involves a shared pool of hardware that other tenants also use, or an access model that a subscription owner can quietly override, the conversation gets longer and more expensive. Azure Managed HSM exists for the moment when that answer has to be airtight. It is a single-tenant, FIPS 140-3 Level 3 validated key store with its own access-control plane, and it was built for the workloads where the standard vault, good as it is, cannot satisfy the mandate on paper.

The exposure that drives most teams toward this service is not a dramatic breach. It is the slow realization that a compliance framework, a contractual clause, or an internal control standard names a requirement the team cannot meet with what they already run. Sometimes that requirement is single-tenancy: a demand that the hardware protecting your keys belongs to you alone. Sometimes it is an isolation of administrative authority so complete that even the person who owns the Azure subscription cannot grant themselves access to the keys. Sometimes it is the ability to walk in with a key generated on your own on-premises hardware and import it without it ever existing in plaintext outside a validated module. These are the signals that the standard Key Vault, which serves the vast majority of Azure workloads perfectly well, has reached its boundary.

This article sets out one organizing idea that should govern every decision about whether to adopt the service. Call it the compliance-drives-managed-hsm rule: Managed HSM is justified when a single-tenant, higher-assurance, fully-controlled key store is a compliance or control requirement, and not before. Reaching for it because it sounds more secure, when standard Key Vault already meets the need, buys cost and operational friction without buying risk reduction that anyone can point to. Refusing it when a mandate genuinely requires single-tenancy or a customer-controlled security domain leaves a gap that an auditor will find. The whole craft of using this service well lies in reading that boundary correctly, and the rest of this guide is about reading it.

There is a second reason the boundary matters so much here. Managed HSM is not a drop-in upgrade you toggle on. It carries a distinct provisioning ceremony, a separate access-control model that does not inherit from Azure role assignments, and a recovery artifact, the security domain, whose loss is unrecoverable. A team that adopts it for the wrong reason inherits all of that operational weight for a benefit it did not need. A team that adopts it for the right reason has to learn the model properly or risk locking itself out of its own keys or, worse, leaving the door open in a way the marketing brochure never warned about. Both failure modes are common, and both are avoidable once the model is clear.

Azure Managed HSM, then, is best understood not as the most secure key store but as the most controlled one. The difference is the whole subject. What follows walks through what the service is and the mental model to hold, how the underlying hardware and the security domain actually work, the precise line that separates it from standard Key Vault and its premium HSM-backed keys, how to provision and activate it without painting yourself into a corner, how bring-your-own-key import works, the local role model that replaces Azure RBAC inside the cluster, the misconfigurations that turn a high-assurance store into an exposure, how the customer-managed-key consumers such as Storage and SQL plug into it, and how to verify and audit the whole posture. The decision table near the middle is the artifact to bookmark, because it compresses the entire choice into four signals and one verdict.

Azure Managed HSM single-tenant key store architecture

What Azure Managed HSM actually is

Azure Managed HSM is a fully managed, highly available, single-tenant cloud service that stores cryptographic keys inside hardware security modules validated to FIPS 140-3 Level 3. The phrase to dwell on is single-tenant. When you create an instance, Azure dedicates a cluster of hardware partitions to your organization and nobody else. That cluster is cryptographically bound to a customer-specific security domain, a recovery artifact generated during activation that ties the partitions to keys only you hold. The practical consequence is that the protection boundary is no longer a logical one inside shared infrastructure. It is a physical and cryptographic boundary around hardware that is yours for the lifetime of the instance.

Hold a clean mental model: think of the service as a private rack of tamper-resistant cryptographic appliances that Azure racks, powers, patches, and clusters for you, while you keep exclusive control of what goes into them and who may use it. You never touch the firmware or the physical units. Microsoft handles availability, clustering across multiple partitions, and the maintenance burden that on-premises HSM owners normally carry. What Microsoft deliberately does not handle is access. The design goal, stated plainly in the service documentation, is that Microsoft and its agents are precluded from reaching the key material. That single sentence is the reason the service exists and the reason its operational model is stricter than anything in the standard vault.

A Managed HSM instance holds only HSM-backed keys. It does not store secrets, connection strings, or certificates the way a general-purpose vault does. This narrowing is intentional. The service is a key fortress, not a junk drawer. RSA, EC, and symmetric keys live there, and every cryptographic operation that uses them, signing, wrapping, unwrapping, encryption, decryption, happens inside the hardware so the private material never leaves the validated module in the clear. If your need is to store a database password or an API token, the standard vault remains the right home and this service is the wrong tool. The narrowness is a feature, because it keeps the highest-assurance store focused on the one job that justifies its cost and ceremony.

What problem does Managed HSM solve that Key Vault does not?

It solves single-tenancy and administrative isolation. Standard Key Vault protects keys in multi-tenant hardware shared across customers, and a subscription owner can grant themselves vault access through Azure roles. Managed HSM gives you dedicated hardware and a local access model that even a subscription owner cannot override, which is exactly what a strict compliance mandate tends to require.

The deeper way to frame the service is in terms of control sovereignty. In standard Key Vault, you control your keys logically, and the controls are excellent, but the hardware is shared and the management plane that governs access lives in Azure Resource Manager, where high-privilege Azure roles can reach. In Managed HSM, control moves down a layer. The hardware is yours, the security domain that anchors trust is generated and held by you, and the access decisions are made by a role system that lives inside the cluster rather than in Resource Manager. Auditors and security architects reach for the service precisely when control sovereignty, not raw cryptographic strength, is the thing being demanded. Both products use validated hardware. Only one gives you a key store that no Azure platform administrator can quietly walk into.

It is worth separating Managed HSM from a sibling that is easy to confuse with it. Azure Cloud HSM is a newer, bare-metal HSM-as-a-service aimed at lift-and-shift of applications that talk PKCS#11, JCE, or OpenSSL directly to a dedicated cluster, where the customer owns the full administrative stack. Managed HSM, by contrast, speaks the Key Vault data-plane API, so applications and Azure services that already integrate with Key Vault keys can target it with minimal change. For the workloads this article addresses, customer-managed encryption keys for Azure services and key operations through the familiar Key Vault interface, Managed HSM is the service in scope. Cloud HSM answers a different question and carries a different operational model.

How the control works underneath

To use Managed HSM safely you need a working picture of three things: the partitioned hardware, the security domain that anchors trust, and the way cryptographic operations stay inside the boundary. None of this requires firmware-level knowledge, but skipping it is how teams lose access to their own keys.

A Managed HSM instance is not one appliance. It is a cluster of hardware partitions that Azure provisions and keeps synchronized so the instance stays available even when an individual partition is taken out for maintenance. You see one logical endpoint, a URL of the form your-hsm-name.managedhsm.azure.net, and the service hides the clustering. The reason the cluster matters to you is availability and durability: keys and their usage policies replicate across the partitions, so a single hardware fault does not cost you the key. The reason it does not matter operationally is that you never address individual partitions; you address the instance, and the service routes the operation to a healthy member.

The security domain is the artifact that makes the instance yours and the artifact most teams underestimate. During activation, the service generates a security domain: an encrypted blob that cryptographically ties the HSM partitions to a set of keys that only you possess. You download it, and it is protected by a quorum of RSA key pairs you supply, typically three of a possible five or some similar M-of-N arrangement you choose. The security domain is what would let your keys be recovered into a new cluster in a regional disaster, and it is what guarantees that even Microsoft cannot reconstruct your instance without it. The hard truth that follows is blunt: lose the security domain and the private keys that protect it, and the keys inside the instance become permanently unrecoverable. There is no support ticket that recovers them, by design. The same property that keeps Microsoft out keeps a careless owner out too.

Why is the security domain so consequential?

Because it is the single point of cryptographic trust for the instance and it is unrecoverable if lost. The security domain ties the HSM partitions to keys only you hold, which is what guarantees Microsoft cannot access your material. That same guarantee means losing it, or losing a quorum of the RSA keys that protect it, permanently destroys access to every key in the instance.

The third piece is operation locality. Every private-key operation runs inside the hardware boundary. When an application asks Managed HSM to sign a payload or unwrap a data-encryption key, the request crosses the boundary, the hardware performs the operation with the private key that never leaves, and only the result comes back. The private exponent of an RSA key, or the bytes of a symmetric key, are never exposed to the host, the network, or the caller. This is the property that lets a compliance officer assert that key material is protected by a validated module at all times, including in use, not merely at rest. It is also why throughput and latency have hardware ceilings that a software-backed key does not, a trade you accept in exchange for the assurance.

Two more facts round out the model. First, Managed HSM enforces purge protection and a configurable soft-delete retention window so that a deleted key or a deleted instance can be recovered within the window and, where purge protection is on, cannot be permanently destroyed before the window elapses even by an administrator. This protects against both accident and a malicious insider trying to erase keys to cause an outage. Second, the service exposes the Key Vault data-plane API surface for keys, which is why an application written against Key Vault keys can be pointed at a Managed HSM endpoint with little more than a URL change and the right role assignment. The familiar interface sits on top of unfamiliar guarantees.

Managed HSM versus standard Key Vault: drawing the line

This is the decision that the whole service hinges on, and it is the one teams get wrong in both directions. To draw the line cleanly you have to compare three things that people tend to blur into one: standard Key Vault, the premium tier of Key Vault with its HSM-backed keys, and Managed HSM. They form a ladder, and most workloads belong on the first or second rung.

Standard Key Vault, on its Standard tier, protects keys with software inside a multi-tenant service. Its Premium tier adds HSM-backed keys, where the key material is generated and used inside validated hardware shared across customers. For years the headline difference was the FIPS level: the multi-tenant vault validated at FIPS 140-2 Level 2, the premium HSM-backed keys and Managed HSM at Level 3. That neat distinction has blurred, because Microsoft updated the HSM fleet firmware so that both Key Vault Premium and Managed HSM now run on FIPS 140-3 Level 3 validated firmware. The headline level alone, in other words, is no longer the clean separator it once was. Verify the exact current validation status against the Microsoft compliance documentation before you cite a specific level in an audit, because the fleet status changes as firmware and certificates are revalidated.

If the FIPS level is converging, what actually separates the rungs? Tenancy and control. The premium vault gives you HSM-backed keys in hardware shared with other tenants, governed by Azure roles in Resource Manager. Managed HSM gives you a cluster dedicated to you alone, a security domain you generate and hold, and a local role system that no Azure platform administrator can override. The decisive question is therefore not “do I need an HSM” but “do I need the HSM to be mine, isolated from other tenants and from Azure’s own high-privilege administrators.” When the answer is yes because a mandate says so, Managed HSM is the rung. When the answer is no, the premium vault delivers validated hardware protection at a fraction of the cost and none of the ceremony.

Which signal should decide between the vault and Managed HSM?

Single-tenancy as a written requirement. If a compliance framework, contract, or control standard demands a dedicated, single-tenant HSM, a customer-controlled security domain, or administrative isolation from the cloud provider, Managed HSM is required. Absent such a written requirement, standard Key Vault, with the premium tier where HSM-backed keys are needed, is the correct and cheaper choice.

The InsightCrunch HSM decision table compresses the whole comparison into four signals and a verdict. Read each row as a dimension on which the three options genuinely differ, and read the final column as the signal that, on its own, pushes the decision.

Dimension	Standard Key Vault (Standard tier)	Key Vault Premium (HSM-backed keys)	Managed HSM	Deciding signal
Tenancy	Multi-tenant, software-protected	Multi-tenant, shared HSM hardware	Single-tenant, dedicated cluster	A written single-tenancy requirement forces Managed HSM
FIPS validation	FIPS 140-2 Level 1 (software)	FIPS 140-3 Level 3 firmware (verify current status)	FIPS 140-3 Level 3 firmware (verify current status)	Level 3 needed but multi-tenant acceptable points to Premium
Administrative control	Azure RBAC; subscription owner can reach	Azure RBAC; subscription owner can reach	Local RBAC; subscription owner cannot override	A mandate to isolate keys from cloud-provider and platform admins forces Managed HSM
Cost and operational weight	Lowest cost, no ceremony	Low cost, no security-domain ceremony	Hourly per-instance cost, security-domain ceremony, local roles	Cost-sensitive workloads without a mandate stay on the vault
Stored object types	Keys, secrets, certificates	Keys, secrets, certificates	HSM-backed keys only	A need to also store secrets and certificates keeps the vault in play

The verdict that falls out of the table is the namable claim restated as a procedure. Start at standard Key Vault. If you need validated HSM hardware for the keys, move to the premium tier. Only if a written requirement demands single-tenancy, a customer-controlled security domain, or isolation of key access from Azure’s own administrators do you move to Managed HSM. The cost and the ceremony are the price of that last rung, and you pay it only when an auditor would otherwise write a finding.

A worked example makes the boundary concrete. A SaaS company encrypting customer data at rest with customer-managed keys, subject to SOC 2 and serving commercial customers, almost always sits comfortably on Key Vault Premium: HSM-backed keys satisfy the control objectives, and no clause demands single-tenant hardware. Change one fact, a contract with a financial-services customer that requires keys in a dedicated, single-tenant HSM with administrative isolation, and the same workload now belongs on Managed HSM. Nothing about the cryptography changed. The control requirement changed, and the control requirement is what the rule keys on.

Provisioning and activating an instance correctly

Provisioning Managed HSM is a two-stage process, and the order matters more than in almost any other Azure service. Stage one creates the instance. Stage two activates it by generating and downloading the security domain. Between those stages the instance exists but is unusable, and the activation step is the one that, done carelessly, sets a team up to lose its keys later. Treat the ceremony as a deliberate operational event, not a line in a deployment script that nobody watched run.

Creation itself is ordinary Azure work. You pick a resource group, a region, and a name, and you designate the initial administrators by their Entra object IDs. Those administrators are the only identities that will be able to perform the activation. A minimal creation with the Azure CLI looks like this, and it provisions the cluster without yet making it usable.

az keyvault create \
  --hsm-name "fin-prod-mhsm" \
  --resource-group "rg-keys-prod" \
  --location "eastus2" \
  --administrators "<entra-object-id-admin-1>" "<entra-object-id-admin-2>" \
  --retention-days 90

The retention-days value sets the soft-delete window, and ninety days is a defensible default for a production instance because it leaves room to recover from an accidental deletion across a long incident. Note that purge protection, once enabled, cannot be turned off, so decide deliberately. For any instance that protects customer-managed keys consuming services depend on, enable purge protection, because a purged key can leave dependent data permanently inaccessible.

What is the security-domain ceremony and why treat it as a formal event?

It is the activation step where the instance generates its security domain and encrypts it to a quorum of RSA keys you provide. You choose how many keys, and how many are required to recover, in an M-of-N scheme. Treat it formally because the downloaded security domain and its private keys are unrecoverable if lost, and they are the only thing that can restore your keys.

Activation is where care pays off. You generate a set of RSA key pairs, decide the quorum, for instance three of five must be present to recover, and run the download. The service encrypts the security domain to those public keys and hands you the blob. The private keys must be distributed to separate custodians and stored where no single person and no single failure can gather a quorum, and the security domain blob itself must be backed up to durable, access-controlled storage. The command below downloads the security domain after you have generated the key pairs.

az keyvault security-domain download \
  --hsm-name "fin-prod-mhsm" \
  --sd-wrapping-keys "cert1.pem" "cert2.pem" "cert3.pem" \
  --sd-quorum 2 \
  --security-domain-file "fin-prod-mhsm-sd.json"

Once the security domain is downloaded, the instance is activated and the data plane opens for role assignment and key operations. The discipline that separates a safe deployment from a fragile one is the handling of these artifacts after the command returns. Store the security domain file and each private key in separate, audited locations. Document who holds which custodian key. Test, in a non-production instance, that a quorum can actually be assembled and used to recover into a fresh cluster, because the only thing worse than needing the recovery path is discovering during a disaster that the custodian keys were never genuinely retrievable. The recovery rehearsal is the step almost everyone skips and the step that turns the security domain from a liability into the insurance it is meant to be.

A second provisioning concern is network exposure. By default the instance has a public endpoint reachable, subject to access control, from the internet. For a high-assurance store this is rarely acceptable. Lock the data plane down with a private endpoint so traffic flows over your virtual network, and where a private endpoint is not yet in place, restrict the public network access with the firewall to known address ranges. The principle here is the same one that governs the standard vault, covered in depth in the Key Vault security best practices guide, and it applies with more force to a store whose whole reason for existing is isolation. A single-tenant HSM reachable from the open internet has thrown away part of what you paid for.

Bringing your own key and importing securely

Many organizations adopt Managed HSM precisely because they must generate keys on their own hardware and bring them into the cloud without the key ever existing in plaintext outside a validated module. This is bring-your-own-key, and the secure import flow is one of the service’s defining capabilities. Understanding it removes a common fear, that importing a key means exposing it, which is exactly what the protocol is designed to prevent.

The import flow works by wrapping. You ask the target Managed HSM to produce a key exchange key, an RSA key generated inside the destination hardware whose public half you export. On your source HSM, you wrap your key with that public key so the material is encrypted to the destination. You transfer the wrapped blob, and the destination unwraps it inside its own hardware boundary. At no point does the plaintext key cross a network, sit on a disk, or appear in the memory of a general-purpose machine. The key leaves one validated module wrapped and enters another validated module wrapped, and only the destination hardware ever sees it in the clear. That property is what lets an organization assert continuous hardware protection across the migration, which is frequently the exact wording in the requirement that sent them to the service in the first place.

How does BYOK avoid ever exposing the key in plaintext?

By wrapping the key to a key exchange key generated inside the destination HSM. Your source HSM encrypts the key to the destination’s public key, you transfer only the wrapped blob, and the destination unwraps it inside its own hardware boundary. The plaintext key never exists outside a validated module, on disk, on the network, or in host memory, at any step.

The mechanics in Azure use the import package format that the Key Vault BYOK tooling produces. The high-level sequence is to download the key exchange key from the destination, run your HSM vendor’s BYOK tool to produce a wrapped import package targeting that exchange key, and then import the package. The command to bring the wrapped package into Managed HSM is shown below; the wrapped material is what the file contains, never the bare key.

az keyvault key import \
  --hsm-name "fin-prod-mhsm" \
  --name "tde-root-key" \
  --byok-file "tde-root-key-wrapped.byok"

Two governance habits make BYOK sound rather than merely functional. First, decide explicitly whether a key should be exportable from Managed HSM or not. For most compliance scenarios the answer is non-exportable, because the point of importing was to confine the key, and a key marked exportable can leave the boundary under a release policy. Keep imported compliance keys non-exportable unless a specific, documented workflow requires export. Second, record provenance: which source HSM generated the key, when it was imported, and who authorized it. Auditors who care about single-tenant hardware also care about the chain of custody for an imported key, and a clean provenance record turns a hard question into a short one. The hands-on flow, including generating the exchange key and producing the import package, is the kind of procedure worth rehearsing in a sandbox; you can run the hands-on Azure labs and command library on VaultBook to compare a Managed HSM with the standard vault and walk a key import end to end before doing it against production.

Least privilege through local RBAC, not Azure RBAC

The access model is where Managed HSM departs most sharply from the standard vault, and it is the departure that delivers the administrative isolation a mandate asks for. Inside the instance, access is governed by a local role-based access control system that lives in the HSM cluster itself, not in Azure Resource Manager. Azure RBAC still controls management-plane operations on the resource, such as deleting the instance or reading its properties, but it does not grant access to keys or to key operations. That separation is the entire point: a person with Owner on the subscription cannot, through Azure roles, sign with a key or even read the list of keys. To touch a key, an identity must hold a local role inside the instance, and only an instance administrator can grant that.

This is the property that survives the question an auditor most wants answered. In standard Key Vault, a sufficiently privileged Azure administrator can assign themselves a data-plane role and reach the keys, which means the cloud platform’s high-privilege roles are part of the key’s trust boundary. In Managed HSM, the local model breaks that link. The HSM administrators you designated at creation, and only the identities they explicitly grant local roles to, can operate on keys. A management-group or subscription administrator who is not also a local HSM role holder is locked out of the cryptography no matter how much Azure authority they accumulate. That is administrative isolation made concrete, and it is usually the specific sentence a control framework is reaching for.

How is local RBAC different from Azure RBAC in practice?

Azure RBAC governs the resource (creating, deleting, reading properties) through Resource Manager, where subscription owners hold sway. Local RBAC governs keys and key operations inside the cluster, assigned only by HSM administrators. The two do not overlap on key access, so an Azure subscription owner who lacks a local role cannot read, use, or list keys.

The local model defines built-in roles scoped to the whole instance or to an individual key. The most consequential is the Managed HSM Administrator, which can manage role assignments and the instance but, by design, should not be the role that applications use. Below it sit roles such as Crypto Officer, which can create and manage keys, Crypto User, which can use keys for cryptographic operations such as sign and unwrap but not delete them, and more restricted roles for specific operations. The least-privilege discipline is to keep the administrator population tiny and human, give applications only the narrow operation role they need, and scope to the individual key wherever a workload uses a single key. A storage account that needs to unwrap a data-encryption key should hold a key-scoped role permitting unwrap on that one key, not an instance-wide role permitting every operation on every key.

A concrete assignment shows the grain. Suppose a managed identity for a storage account needs only to wrap and unwrap a specific customer-managed key. You assign it a local role scoped to that key, granting the wrap and unwrap operations and nothing else.

az keyvault role assignment create \
  --hsm-name "fin-prod-mhsm" \
  --role "Managed HSM Crypto Service Encryption User" \
  --assignee "<storage-account-managed-identity-object-id>" \
  --scope "/keys/tde-root-key"

The scope path, ending in the specific key, is the difference between least privilege and a broad grant. The role name encodes the intent: this is the role designed for service encryption consumers, permitting exactly the wrap and unwrap a customer-managed-key integration performs, and nothing that would let the identity exfiltrate or destroy the key. Building the model out of these narrow, key-scoped grants is what makes a later access review short and a later breach contained. The principle is identical to the least-privilege thinking in the broader Azure Key Vault complete guide, applied to a model where the roles live in the hardware rather than in the cloud control plane.

Two operational cautions complete the picture. First, because the administrators are designated at creation and managed locally afterward, the loss of every administrator identity is a serious event: with no local administrator, no new role assignments can be made, and recovery may require the security-domain path. Keep at least two human administrators on separate accounts and review them as people change roles. Second, do not use a Managed HSM Administrator identity as an application principal. The administrator can rewrite the access model; an application that is compromised while holding an administrator role can grant an attacker anything. Applications get operation roles, humans get administration roles, and the two populations stay separate.

The misconfigurations that create real exposure

A high-assurance store does not protect you if it is configured to leak the very assurance it provides. The failures here are not exotic; they are the predictable result of treating Managed HSM like a more expensive vault rather than the distinct system it is. Each of the following is a pattern engineers report, and each has a deciding factor that tells you whether it is yours.

The first and most damaging is mishandling the security domain. A team downloads it during activation, the file lands in a shared location or a single person’s laptop, the custodian keys are never actually distributed, and months later nobody can locate a quorum. The exposure cuts both ways: a security domain that is too accessible weakens the isolation guarantee, because anyone who gathers it and a quorum of keys could recover your instance elsewhere, while a security domain that is lost destroys your keys outright. The deciding factor is whether you can, today, assemble a quorum and recover into a fresh instance in a rehearsal. If you cannot, you have the exposure regardless of how the file looked when it was downloaded.

What is the most common Managed HSM mistake that creates exposure?

Adopting it by default for “more security” when the standard vault meets the need, then under-operating it. The instance ends up with a carelessly stored security domain, public network access left open, and over-broad local roles, so it costs more and is less safe than a properly hardened standard vault would have been. The deciding factor is whether a written requirement ever justified the move.

The second pattern is the default-to-Managed-HSM reflex. A team picks the service because it sounds like the strongest option, without a written requirement demanding single-tenancy. They inherit the cost, the security-domain ceremony, and the local-role model, and because none of it was driven by a real control objective, the operational discipline is thin. The result is an instance that is more expensive than the premium vault, no more compliant for the workload at hand, and frequently less safe because the team never built the rigor the service demands. The deciding factor is simple: point to the clause that required single-tenancy or administrative isolation. If you cannot, you reached for the wrong rung.

The third pattern is the mirror image: using the standard vault where a mandate actually requires Managed HSM. A workload subject to a single-tenancy requirement runs on Key Vault Premium because HSM-backed keys felt like enough, and an audit later finds that shared, multi-tenant hardware does not satisfy the written control. This is the more dangerous error because it surfaces as a finding rather than a bill. The deciding factor is the precise wording of the requirement: if it names single-tenant hardware, a customer-controlled security domain, or isolation from the provider’s administrators, the premium vault does not meet it and the move to Managed HSM is mandatory.

The fourth pattern is leaving network access open. An instance provisioned with public network access enabled, and never locked to a private endpoint or a firewall allowlist, exposes its data-plane endpoint to the internet. Access control still gates operations, but the attack surface is far larger than it needs to be, and an exposed endpoint undercuts the isolation story the service was bought to tell. The deciding factor is whether the instance accepts connections from outside your virtual network; if it does, and the workload does not require it, that is exposure you can close today.

The fifth pattern is over-broad local roles. Applications granted instance-wide Crypto User or, worse, an administrator role, can operate on keys far beyond their need, and a single compromised application principal then reaches the whole key store. The deciding factor is whether any non-human identity holds a role broader than the specific operations and the specific key it actually uses. Narrowing those grants is the highest-value hardening step after the security domain and the network.

Customer-managed keys: how Storage and SQL consume the HSM

The reason most teams provision Managed HSM is not to call its API directly from application code. It is to back customer-managed encryption keys for other Azure services, so that the keys protecting a storage account, a SQL database, a disk, or a backup live in single-tenant hardware under the team’s exclusive control. Understanding how a consuming service reaches into the HSM, and what the integration actually grants, is what turns the service from an abstract fortress into a working part of an encryption design.

The pattern is consistent across consumers. The service holds its own data-encryption keys, and those are wrapped by a key-encryption key that lives in Managed HSM. The service never sees the key-encryption key in the clear; it asks the HSM to unwrap the data-encryption key when it needs it, the unwrap happens inside the hardware boundary, and the service caches the result for as long as policy allows. This envelope arrangement is why a single root key in the HSM can protect a large volume of encrypted data without the HSM having to be in the path of every read and write. The HSM protects the key that protects the keys, and the consuming service does the bulk cryptography itself.

How does a storage account use a key in Managed HSM?

The storage account is configured for customer-managed keys and pointed at a key in the HSM. Its managed identity is granted a key-scoped local role permitting wrap and unwrap on that key. Storage then wraps its account encryption key with the HSM key and asks the HSM to unwrap it when needed, so the root of trust sits in single-tenant hardware you control.

For Azure Storage, the integration is the customer-managed-keys configuration on the account. You assign the storage account a managed identity, grant that identity a key-scoped local role on the HSM permitting wrap and unwrap, and configure the account to use the HSM key as its encryption key. From that point the account’s encryption key is protected by your single-tenant hardware, and revoking the identity’s role, or disabling the key, renders the account’s data inaccessible until access is restored, which is precisely the kill switch a control framework wants you to hold. The broader set of storage protections that surround this, network lockdown, access-control hardening, and the rest of the at-rest and in-transit posture, is laid out in the Azure storage security and encryption guide, and the HSM-backed key is the apex of that design rather than a replacement for it.

For Azure SQL, the same envelope idea appears as transparent data encryption with a customer-managed key. The database’s transparent-data-encryption protector is a key in Managed HSM, the SQL service’s identity holds the narrow wrap and unwrap role on that key, and the database encryption key is wrapped by the HSM key. The consequence is that control of the database’s encryption ultimately rests in hardware you alone administer: disable or revoke the protector and the database becomes inaccessible, which is the property auditors test when they ask whether you can cryptographically sever access to a data store on demand. The authentication, network, and encryption posture that frames this for SQL is covered in the Azure SQL security guide, where the customer-managed protector is the strongest available rung for the encryption-at-rest control.

A caution applies to every consumer integration: availability now depends on the HSM. If the consuming service cannot reach the instance, cannot resolve a private endpoint, or finds the key disabled or the identity’s role revoked, the data it protects becomes inaccessible until the path is restored. This is the intended behavior, the kill switch is a feature, but it means the HSM is now in the availability story of every service that depends on it. Soft delete and purge protection guard against accidental destruction of the key, the cluster’s redundancy guards against hardware faults, and a tested security-domain recovery guards against the worst case. Design the dependency deliberately, monitor the path, and never disable a key in production without confirming what depends on it.

Verifying the posture

A high-assurance store is only as good as your ability to prove it is configured the way you claim. Verification for Managed HSM falls into three checks that together answer an auditor’s questions: who can reach the keys, how the instance is exposed, and whether the recovery path actually works.

The first check is the local role assignments. List them and confirm that the administrator population is small and human, that every application identity holds only an operation role scoped to the specific key it uses, and that no non-human identity holds an administrator role. The command to enumerate assignments is straightforward, and the output is the evidence you hand an auditor who asks who can use a key.

az keyvault role assignment list \
  --hsm-name "fin-prod-mhsm" \
  --scope "/keys/tde-root-key"

How do I prove who can actually use a key in the instance?

List the local role assignments at the instance scope and at each key scope. Every assignment is an identity plus a role plus a scope, and because Azure RBAC does not grant key access here, the local assignments are the complete picture. If an identity is absent from the local assignments, it cannot use the key, regardless of its Azure roles.

The second check is exposure. Confirm that public network access is disabled or tightly restricted and that a private endpoint carries the data-plane traffic. Read the instance’s network configuration and verify that the default action denies access and that only your intended virtual networks or private endpoints can reach it. Pair this with a verification that the consuming services resolve the instance over the private path, because a private endpoint that applications do not actually use protects nothing.

The third check is the one teams most often defer and most need: a recovery rehearsal. In a non-production setting, assemble a quorum of the custodian keys and exercise the security-domain recovery into a fresh instance. The rehearsal proves three things at once: that the security domain blob is retrievable, that the custodian keys are where you recorded them, and that the quorum you chose can actually be gathered by real people in a real process. A recovery path that has never been walked is a hope, not a control, and the difference shows up only when it is too late to fix.

Beyond these three, treat the soft-delete and purge-protection settings as part of the verifiable posture. Confirm purge protection is enabled on any instance that backs customer-managed keys, because it is the setting that prevents an administrator, or an attacker who has gained administrative access, from permanently destroying a key and causing irreversible data loss. The verification is a one-line read of the instance properties, and it belongs in the same evidence pack as the role assignments and the network configuration.

Making the posture auditable and repeatable

A control that is configured by hand once and never codified drifts. The discipline that keeps Managed HSM defensible over time is the same that keeps any security control defensible: express the configuration as code, capture the access changes in logs, and review the result on a schedule. The single-tenant store deserves more of this rigor than an ordinary resource, not less, because the stakes of a quiet misconfiguration are higher.

Express the instance and its network configuration as infrastructure code so that the firewall rules, the private endpoint, the purge-protection setting, and the retention window are reviewable in source control and reproducible in a new region. The one part that cannot and should not be automated end to end is the security-domain ceremony, because the whole point is that the custodian keys are held by people, not by a pipeline. Codify everything around the ceremony and treat the ceremony itself as a documented manual runbook with named custodians and a signed record. The seam between automation and human custody is deliberate; do not paper over it.

How do I keep the access model from drifting over time?

Enable diagnostic logging of all data-plane and role-assignment events, route it to a workspace and immutable storage, and run a scheduled access review of the local role assignments. Every key operation and every role change is then recorded, and the periodic review catches grants that outlived their need before an auditor or an attacker finds them.

Logging is the backbone of auditability. Enable diagnostic settings on the instance so that key operations and role-assignment changes flow to a Log Analytics workspace and to immutable storage for retention. The records answer the questions that matter after an incident: which identity used which key, when, and from where, and who changed an access assignment. Route the logs somewhere an instance administrator cannot quietly alter them, because a log that the same person who holds the keys can edit is not evidence. Alert on the events that should be rare: a new administrator assignment, a key set to exportable, a change to network access, a purge attempt. These are the moments where a posture silently weakens, and an alert at the moment of change is worth more than a finding months later.

Access reviews close the loop. On a schedule that matches your risk tolerance, enumerate the local role assignments and confirm each one still has a reason to exist. Identities that belonged to a decommissioned workload, administrators who changed teams, operation roles granted for a one-time migration and never revoked, all of these accumulate quietly and all of them widen the trust boundary. The review is short when the model was built from narrow, key-scoped grants and long when it was built from broad ones, which is one more reason to build it narrow from the start. Documenting the review, the evidence pulled, and the changes made turns an abstract control into something an auditor can read.

The cost trade-off and when the vault is the right answer

Managed HSM is billed differently from the standard vault, and the difference is part of the decision rather than a footnote. Where standard Key Vault charges per operation and per HSM-backed key, Managed HSM bills primarily by the hour for the provisioned instance, regardless of how busy it is. A single-tenant cluster reserved for you carries a steady cost that a shared, per-operation service does not. For a workload with a genuine single-tenancy mandate, that cost is simply the price of the requirement and easy to justify. For a workload without one, it is money spent on assurance nobody asked for.

This is why the rule points back to the vault as the default. The premium tier delivers HSM-backed keys validated to the same firmware level for a fraction of the steady cost and with none of the security-domain ceremony or local-role administration. A team that does not have a written single-tenancy or administrative-isolation requirement gets no compliance benefit from the more expensive store and takes on operational weight that, mishandled, can make them less safe. The cheapest correct answer is almost always the standard vault, stepping up to the premium tier when the keys need validated hardware, and only crossing to Managed HSM when a clause forces the move.

There is a false economy to avoid in the other direction. A team facing a real single-tenancy mandate sometimes tries to satisfy it with the premium vault to save the hourly cost, reasoning that Level 3 hardware is Level 3 hardware. The reasoning fails because the requirement was never only about the FIPS level; it was about tenancy and administrative isolation, and the shared vault provides neither. The saving evaporates the moment an audit produces a finding, and the remediation, migrating keys and re-pointing every consumer to a new store, costs far more than the hourly bill would have. When the mandate is real, pay for the rung the mandate requires the first time.

Key lifecycle inside the instance: types, versions, rotation, and release

The keys an instance holds have a lifecycle, and managing it well is part of operating the service rather than an afterthought. Managed HSM stores HSM-backed keys only, and within that it supports the asymmetric and symmetric types that real workloads use: RSA keys for wrapping and signing, elliptic-curve keys for signing, and symmetric AES keys for wrapping and direct encryption. Choosing the type is a function of the consumer. A customer-managed-key integration for a storage account or a SQL database typically wants an RSA key acting as a key-encryption key, because the service envelope-encrypts its own data-encryption key with it. A signing workload wants an EC or RSA key sized to its policy. Match the type to the consumer’s documented requirement rather than defaulting to one shape for everything.

Versions matter because rotation in this model means creating a new version of a key rather than a new key. When you rotate, the instance generates a fresh version inside the hardware, the key identifier without a version suffix continues to resolve to the current version, and consumers that reference the versionless identifier pick up the new version on their own schedule. The old version remains so that material wrapped under it can still be unwrapped, which is what prevents a rotation from instantly breaking access to already-encrypted data. The discipline is to rotate on a defined cadence, to confirm that each consuming service has actually moved to the new version, and to retire old versions only when nothing depends on them.

How does rotating a customer-managed key work without breaking encrypted data?

Rotation creates a new version of the existing key inside the hardware. Consumers that reference the versionless key identifier adopt the new version, while the previous version is retained so data wrapped under it can still be unwrapped. Nothing breaks at the moment of rotation; access breaks only if an old version is destroyed while data still depends on it.

Exportability and release policy are the lifecycle settings that most directly touch the assurance story. A key can be marked exportable or non-exportable at creation, and for most compliance scenarios the right setting is non-exportable, because the reason the key lives in single-tenant hardware is to keep it there. Where a workload genuinely needs to release a key to an attested environment, such as a confidential-computing enclave, the service supports a secure key release policy that releases the key only to an environment whose attestation satisfies the policy. This is a sharp-edged capability: a release policy widens the boundary by design, so write it narrowly, tie it to a specific attestation, and treat any exportable key as a deliberate, documented exception rather than a convenience. The default posture for a compliance key is non-exportable, full stop, and any departure should appear in the audit record with a reason.

Two more lifecycle habits keep an instance healthy. First, keep keys named and tagged so that an access review can map every key to the workload it protects; an orphaned key whose purpose nobody remembers is a small risk that compounds. Second, decide deletion deliberately, because soft delete and purge protection mean a deleted key lingers recoverably for the retention window and, with purge protection on, cannot be force-destroyed before the window elapses. That is the behavior you want for a key protecting live data, and it is the reason deletion should follow confirmation that nothing depends on the key rather than precede it.

Disaster recovery and the security domain in practice

The single most important operational difference between Managed HSM and the standard vault is that you, not Microsoft, hold the artifact that can recover the instance. That is the source of the isolation guarantee and the source of the largest operational risk, so it deserves a deliberate disaster-recovery design rather than an assumption that the platform will sort it out.

Within a region, the cluster’s redundancy across partitions handles ordinary hardware faults transparently. Keys and their policies replicate across the partitions, so the loss of a single partition does not cost you a key and does not require any action on your part. This is the availability that lets a consuming service depend on the instance without treating every key operation as a single point of failure. It does not, however, address the loss of the region or the loss of the instance as a whole, and that is where the security domain becomes the recovery mechanism.

Can I recover keys if a region is lost?

Yes, if you hold the security domain and a quorum of the custodian keys. You provision a new instance in another region and use the security-domain recovery flow to restore your keys into it. Without the security domain and a quorum, there is no recovery path, by design, which is exactly why the artifact must be backed up durably and rehearsed.

Cross-region recovery uses the security domain to restore your keys into a freshly provisioned instance elsewhere. The flow is to create a new instance in the target region, then run the security-domain recovery using your downloaded security domain and a quorum of the custodian RSA keys. Because the security domain cryptographically ties the keys to material only you hold, the recovery reconstitutes your keys in the new cluster without Microsoft ever being able to do so. The dependency chain is therefore unforgiving: the recovery depends entirely on the security domain blob being retrievable and a quorum of custodian keys being assemblable. If either is missing, the keys are gone, and so is access to everything they protect.

This is why the recovery rehearsal is not optional rigor but the core of the disaster-recovery plan. Schedule a rehearsal in which the named custodians actually retrieve their keys, a quorum is assembled, and a recovery into a non-production instance is performed end to end. The rehearsal surfaces the failures that a tabletop never will: a custodian key stored on a device nobody can access, a quorum threshold set higher than the number of reachable custodians, a security domain backup that was never actually written to durable storage. A plan that has survived a real rehearsal is a control. A plan written in a document and never executed is a guess that you will discover is wrong at the worst possible moment.

Design the custodian arrangement for the failure modes you actually face. Spread the custodian keys across people and locations so that no single incident, a lost laptop, a departed employee, a destroyed site, can take out a quorum, while keeping the quorum reachable enough that a real recovery is feasible under time pressure. The tension between making a quorum hard to gather maliciously and easy to gather legitimately is the central design choice, and it should be made with the disaster scenario in mind rather than defaulted to whatever the activation wizard suggested.

Migrating keys from the standard vault to Managed HSM

Teams rarely start at Managed HSM. More often a requirement changes, a new contract lands, a workload moves into a regulated scope, and a key that has been living in the standard vault now has to live in single-tenant hardware. The migration is mechanical once the model is clear, but it touches every consumer of the key, so it is a coordination exercise as much as a cryptographic one.

The first decision is whether the key can be moved at all or whether a new key must be created in the destination. A software-backed key from the standard tier cannot simply be promoted into validated single-tenant hardware while preserving its exact bytes if it was never HSM-protected, so in many cases the clean path is to create a new HSM-backed key in Managed HSM and re-encrypt or re-wrap under it. Where the source key already lives in an HSM, on-premises or in the premium vault, and the requirement is to preserve the same key material, the BYOK import flow is the mechanism, wrapping the key to the destination’s exchange key so it crosses into the new instance without ever being exposed.

Should I re-key or import when moving to Managed HSM?

Re-key when the source was software-protected or when a fresh key is acceptable, because you cannot retroactively make a software key HSM-protected. Import when the requirement is to preserve the exact key material and the source is already an HSM, using the BYOK wrap-and-unwrap flow so the key never leaves a validated module in the clear. The requirement’s wording decides which path is mandatory.

Whichever path you take, the consumer cutover is the part that demands care. Every service that uses the key, a storage account, a SQL database, a disk encryption set, a backup vault, references it by a specific key identifier, and re-pointing them is a sequenced operation. Stand up the key in Managed HSM, grant each consumer’s identity its narrow key-scoped local role, re-point one consumer at a time, and verify access after each move before proceeding. The data already encrypted under the old key must remain accessible throughout, which means the old key cannot be retired until every consumer has moved and every piece of data has been re-wrapped under the new key where re-wrapping is required. Rushing the retirement of the old key is the classic way a migration turns into an outage.

Plan the rollback before you start. For each consumer, know how to re-point it back to the old key, and keep the old key alive and accessible until the migration is fully verified and a defined soak period has passed. The single-tenant store changes the availability dependency, so a migration that looks complete can still surface a consumer that quietly cached the old key reference; the soak period and the preserved old key are what let you recover gracefully rather than scramble. Document the sequence, the verification at each step, and the rollback, and treat the whole thing as a change with a named owner rather than a quick re-pointing of a few configuration values.

Reading the requirement: turning a clause into a decision

The hardest part of using Managed HSM well is not technical. It is reading a compliance clause or a contractual term correctly and translating it into the right rung on the ladder. Both expensive mistakes, adopting the service without a real requirement and refusing it when a requirement is real, come from misreading the clause, so the skill of reading it precisely is worth as much as the technical knowledge.

Start by separating what the clause names from what it implies. A requirement that says keys must be protected by a FIPS 140 Level 3 validated HSM is satisfied by the premium vault’s HSM-backed keys, because those run on validated Level 3 firmware; it does not, on its own wording, demand single-tenancy. A requirement that says keys must reside in a dedicated or single-tenant HSM, or that names a customer-controlled security domain, or that requires the cloud provider to be cryptographically precluded from accessing the keys, is a different statement, and only Managed HSM satisfies it. The words tenancy, dedicated, sole control, and provider exclusion are the signals that move the decision; the word HSM alone does not.

How do I tell whether a clause actually requires Managed HSM?

Look for words that name tenancy or provider exclusion, not just the word HSM. A clause requiring a FIPS Level 3 HSM is met by the premium vault. A clause requiring a dedicated or single-tenant HSM, a customer-controlled security domain, or that the provider be precluded from key access requires Managed HSM. When the wording is ambiguous, get it clarified in writing before you build.

When the wording is ambiguous, the right move is to get it clarified rather than to guess in either direction. An ambiguous clause resolved by adopting Managed HSM defensively wastes money if the real intent was satisfied by the vault, while an ambiguous clause resolved by staying on the vault risks a finding if the intent was single-tenancy. A short written clarification from the requirement’s owner, the compliance team, the customer’s security organization, the regulator’s guidance, costs an email and resolves a decision that otherwise costs either an unnecessary instance or a remediation project. Record the clarification with the decision, because the next person to read the architecture will ask the same question and deserve the same answer.

The counter-reading worth naming explicitly is the equation of security with the most expensive option. Managed HSM is not more secure than a well-run premium vault for a workload that does not need single-tenancy; it is more controlled, and control without a requirement is overhead, not safety. A premium vault, properly hardened with network restrictions, tight access control, soft delete, and purge protection, protects keys in validated hardware and serves the great majority of regulated workloads. The teams that run into trouble are the ones that treated the choice as a security gradient, more money equals more safe, rather than a requirement match. The rule corrects the gradient thinking by anchoring the decision to a written requirement, and reading the requirement precisely is what makes the rule usable.

The local role catalog and how to assign responsibilities

Building the access model well is easier when you can see the roles laid out against the responsibilities they are meant to carry. The local model is deliberately granular, and the temptation to grant a broad role because it is simpler is exactly the temptation that widens the trust boundary. A short reference of the roles that matter most, and the responsibility each is designed for, keeps the assignments honest.

The roles divide into administration, key management, and key use, and the least-privilege discipline is to keep those populations separate. Administration roles configure the instance and the access model and belong to a tiny set of named humans. Key-management roles create and manage keys and belong to the operators who run the key lifecycle. Key-use roles perform cryptographic operations and belong, narrowly and key-scoped, to the application identities that actually need them. The reference below maps the common roles to the responsibility they carry and the population that should hold them, and it is the artifact to consult when an assignment request lands and you have to decide what to grant.

Role	What it can do	Who should hold it	Scope
Managed HSM Administrator	Manage role assignments and the instance configuration	A small set of named humans only	Instance
Crypto Officer	Create, import, rotate, and manage keys	Key-lifecycle operators	Instance or per key
Crypto User	Use keys for sign, verify, wrap, unwrap, encrypt, decrypt	Workloads that perform crypto operations	Per key wherever possible
Crypto Service Encryption User	Wrap and unwrap for service customer-managed-key integrations	Managed identities of consuming services	The specific key
Crypto Auditor	Read key properties and metadata without using keys	Audit and compliance reviewers	Instance

The grain to take from the table is that no single role should span administration and use. The administrator can rewrite the access model, which is precisely why an application must never hold it: an application compromised while holding an administrator role hands an attacker the ability to grant themselves anything. Conversely, a human operator who only needs to confirm key metadata for an audit should hold the auditor role, which reads without using, rather than a crypto role that can perform operations. Each row exists so that you can grant the narrowest role that still lets the holder do their job, and the discipline of always reaching for the narrowest matching row is what makes a later access review fast and a later compromise contained.

Which role should a consuming service’s identity receive?

The Crypto Service Encryption User role, scoped to the specific key. Service customer-managed-key integrations such as Storage and SQL only need to wrap and unwrap their data-encryption key with the key-encryption key, and this role grants exactly that and nothing more. Scoping it to the single key the service uses, rather than the whole instance, keeps a compromise of that identity from reaching every other key.

A practical habit reinforces the catalog: write down, for every identity that holds a role, the reason it holds it and the workload it serves. The local model does not infer intent, so the documentation is what turns a list of assignments into something an auditor or a future operator can reason about. When the reason for an assignment can no longer be stated, that is the signal to revoke it. The catalog tells you what to grant; the documented reason tells you when to take it back.

Operating over time: monitoring, alerting, and incident response

A key store is a living dependency, not a configure-once resource, and the difference between an instance that stays defensible and one that quietly drifts is the operational practice that surrounds it. Three habits carry most of the weight: watching the events that signal a weakening posture, alerting on the ones that should never happen silently, and having a rehearsed response for the day a key or an identity is compromised.

Watching begins with the diagnostic logs. Every key operation and every role-assignment change flows to the workspace and immutable storage you configured, and the value of those records is realized only when someone, or something, reads them. Build a small set of monitoring views that answer the recurring questions: which identities are using which keys and how often, whether any operation is failing in a pattern that suggests a misconfiguration or an attack, and whether the volume of operations matches the workload you expect. An unexplained spike in unwrap operations, or unwrap failures from an identity that should not be calling the instance, is the kind of signal that a monitoring view surfaces and a raw log buries.

What events should always raise an alert?

A new administrator role assignment, a key being set to exportable, a change to network access, a purge attempt on a key or the instance, and a failed access pattern that suggests probing. These are the moments a posture silently weakens or an attack begins, so an alert at the moment of change is worth far more than a finding discovered months later in a log nobody read.

Alerting narrows the watching to the events that demand a human. A new administrator assignment, a key flipped to exportable, a change to the network configuration, and a purge attempt are all events that should be rare and consequential, so each deserves an alert that reaches a person who can judge whether it was intended. The goal is not to alert on everything, which trains people to ignore alerts, but to alert on the small set of changes that move the trust boundary. Tuning that set so the alerts stay rare and meaningful is itself a periodic task, because a workload’s normal pattern shifts over time and an alert that fires constantly stops being read.

Incident response is the habit teams hope never to use and most regret not rehearsing. When a key is suspected compromised, the response is to rotate it to a new version, confirm the consumers have moved to the new version, and then retire the compromised version once nothing depends on it, all while preserving the ability to decrypt data already protected under the old version until it has been re-wrapped. When an identity is suspected compromised, the response is to revoke its local role immediately, which severs its access to the keys regardless of any Azure authority it holds, and then to investigate what it touched using the operation logs. The single-tenant model makes both responses cleaner than they would be on shared infrastructure, because the local roles give you a precise revocation lever and the dedicated instance gives you a complete operation record. The lever and the record are only useful, though, if the team has walked the response before the incident, so fold a key-compromise and an identity-compromise drill into the same rehearsal cadence as the disaster-recovery exercise.

A final operational note ties the practice back to the rule. The reason all of this rigor is justified is that a written requirement put the keys in single-tenant hardware in the first place. The same requirement that justified the cost justifies the operational investment, because a single-tenant store operated casually delivers neither the assurance the mandate wanted nor the safety a simpler vault would have provided. Operate the instance as seriously as the requirement that created it, and it repays the effort with exactly the controlled, auditable, isolated posture it was chosen to provide.

Managed service versus running your own hardware

For teams that arrive at Managed HSM from an on-premises background, the natural comparison is not to the standard vault at all but to the dedicated appliances they already operate in their own data center. That comparison clarifies what the managed offering buys and what it deliberately keeps out of your hands, and it sharpens the trade-off that sits behind the hourly bill.

Running your own hardware in a data center gives you total physical custody and total operational burden in equal measure. You own the racking, the power, the cooling, the firmware patching, the clustering for availability, the spare-parts logistics, and the staffing to do all of it around the clock. Many organizations carry that burden because a regulation once required physical custody, and many of them no longer need to, because the managed offering provides single-tenancy and a customer-controlled security domain without the data-center operations. The managed service takes the undifferentiated heavy lifting, the patching, the availability engineering, the hardware maintenance, and leaves you exactly the part that the requirement actually cared about: exclusive control of the cryptographic material and isolation from anyone who is not your named administrator.

Does Managed HSM replace an on-premises HSM?

For most cloud workloads, yes, when the requirement is single-tenancy and provider isolation rather than literal physical custody on your own premises. Managed HSM delivers a dedicated, single-tenant cluster, a customer-controlled security domain, and administrative isolation, while removing the data-center burden of running appliances yourself. Where a regulation specifically mandates physical possession on your own site, the managed service does not satisfy that narrow wording.

The trade-off you accept in exchange is the one the rest of this guide has circled: you do not touch the physical units, you do not control the data center, and you depend on the platform for availability and for the maintenance you would otherwise perform yourself. For the requirement most organizations actually face, single-tenancy and provider isolation rather than literal on-site possession, that trade is overwhelmingly favorable, because the operational savings dwarf the hourly cost of the instance and the assurance the requirement wanted is fully delivered. For the narrower requirement of physical custody on your own premises, the managed service is the wrong answer by definition, and the on-premises appliance or a dedicated offering with full physical control is what the wording demands. Reading which requirement you actually have, custody or isolation, is the same reading-the-requirement skill applied one level up, and it is what decides whether the managed model fits at all.

There is a migration angle worth naming here too. An organization moving off self-run hardware does not have to abandon the keys it generated there. The BYOK import flow carries those keys into the managed instance wrapped, preserving the exact material where the requirement demands continuity, so the move from owning appliances to consuming the managed service need not be a re-key event unless you choose to make one. That continuity is often what makes the transition acceptable to a compliance team that has lived with physical custody for years: the keys are the same keys, the control is still exclusively theirs, and only the operational burden has changed hands.

Verdict

Azure Managed HSM is the most controlled key store in Azure, not simply the most secure one, and that distinction is the whole decision. It earns its place when a written requirement demands single-tenant hardware, a customer-controlled security domain, or isolation of key access from the cloud provider’s own administrators, and it earns nothing but cost and ceremony when adopted for a vaguer sense that more is better. The compliance-drives-managed-hsm rule is the discipline that keeps teams on the right rung: start at standard Key Vault, step up to the premium tier when keys need validated hardware, and cross to Managed HSM only when a clause forces it. Master the security domain, build the local-role model narrow and key-scoped, lock the network down, rehearse the recovery, and log everything, and the service delivers exactly the isolation it promises. Skip that discipline and a single-tenant fortress becomes an expensive way to be less safe than the vault you already knew how to run.

Frequently asked questions

What is Azure Managed HSM?

Azure Managed HSM is a fully managed, highly available, single-tenant cloud service that stores cryptographic keys inside hardware security modules validated to FIPS 140-3 Level 3. When you create an instance, Azure dedicates a cluster of hardware partitions to your organization alone and binds it to a customer-specific security domain that only you hold. The service stores HSM-backed keys, not secrets or certificates, and performs every private-key operation inside the hardware so the key material never leaves the validated module in the clear. Microsoft manages availability, clustering, and patching, while you keep exclusive control of the keys and of who may use them. The defining property is that Microsoft and its agents are cryptographically precluded from accessing your key material, which is the reason the service exists and the reason its operational model is stricter than the standard vault’s.

Managed HSM versus standard Key Vault: which should I use?

Start with standard Key Vault, because it serves the great majority of Azure workloads well at the lowest cost. Move to the premium tier when your keys need validated HSM hardware, since premium HSM-backed keys run on the same validated firmware. Cross to Managed HSM only when a written requirement demands single-tenant hardware, a customer-controlled security domain, or isolation of key access from the cloud provider’s own administrators. The decisive question is not whether you need an HSM but whether the HSM must be yours alone and isolated from Azure’s platform administrators. If no clause demands that, the vault is the correct and cheaper choice. If a clause does demand it, Managed HSM is the rung, and the hourly cost and security-domain ceremony are the price of meeting the requirement.

What FIPS level and compliance does Managed HSM meet?

Managed HSM uses single-tenant HSMs validated to FIPS 140-3 Level 3. Microsoft updated the HSM fleet firmware so that both Key Vault Premium and Managed HSM now run on FIPS 140-3 Level 3 validated firmware, which means the raw FIPS level is no longer the clean separator it once was between the premium vault and Managed HSM. Confirm the exact current validation status against the Microsoft compliance documentation before citing a specific level in an audit, because the fleet status changes as firmware and certificates are revalidated. The compliance value of Managed HSM beyond the FIPS level is its single-tenancy, its customer-controlled security domain, and the local access model that isolates keys from the cloud provider, which together satisfy control requirements that the shared vault cannot, regardless of the validation level both happen to share.

How do bring-your-own-key and key import work?

The secure import flow uses wrapping so the key never exists in plaintext outside a validated module. You ask the destination Managed HSM to produce a key exchange key, an RSA key generated inside its hardware whose public half you export. On your source HSM you wrap your key with that public key, encrypting the material to the destination. You transfer only the wrapped blob and import it, and the destination unwraps it inside its own hardware boundary. The plaintext key never crosses a network, sits on a disk, or appears in host memory at any step. This is what lets an organization assert continuous hardware protection across a migration, which is frequently the exact wording in the requirement that drove the adoption. Mark imported compliance keys non-exportable unless a documented workflow requires otherwise, and record the provenance of each imported key.

How does Managed HSM access control work?

Managed HSM uses a local role-based access control system that lives inside the cluster, not in Azure Resource Manager. Azure RBAC governs management-plane operations on the resource, such as deleting the instance, but it does not grant access to keys. To use a key, an identity must hold a local role that only an instance administrator can assign. This is the property that delivers administrative isolation: a subscription owner who lacks a local role cannot read, list, or use the keys, no matter how much Azure authority they hold. Built-in local roles range from administrator through crypto officer and crypto user to narrow operation roles, and the least-privilege discipline is to scope application identities to the specific operations and the specific key they use rather than granting instance-wide roles.

When do I need Managed HSM for customer-managed keys?

You need Managed HSM for customer-managed keys when a written requirement demands that the key-encryption key reside in single-tenant hardware or under a customer-controlled security domain. Services such as Azure Storage and Azure SQL support customer-managed keys backed by either Key Vault or Managed HSM, so the integration mechanics are similar; the choice between them is the same requirement match as everywhere else. If your compliance framework or contract requires single-tenancy or provider isolation for the root key protecting a data store, back the customer-managed key with Managed HSM. If it requires only validated HSM protection without naming tenancy, the premium vault’s HSM-backed key satisfies it at lower cost. The consuming service wraps its data-encryption key with the HSM key and asks the hardware to unwrap it on demand, so the root of trust sits where the requirement says it must.

What is the security domain and what happens if I lose it?

The security domain is an encrypted blob, generated during activation, that cryptographically ties the HSM partitions to a set of keys only you possess. You download it and protect it with a quorum of RSA key pairs you supply in an M-of-N arrangement. It is what guarantees Microsoft cannot reconstruct your instance and what lets you recover your keys into a new cluster after a regional disaster. The consequence of that guarantee is blunt: if you lose the security domain blob, or lose a quorum of the RSA keys that protect it, every key in the instance becomes permanently unrecoverable. There is no support path that recovers them, by design, because the same property that keeps Microsoft out keeps a careless owner out too. Back the security domain up to durable, access-controlled storage, distribute the custodian keys across people and locations, and rehearse recovery before you need it.

Is Managed HSM the same as Azure Dedicated HSM or Azure Cloud HSM?

No. Managed HSM speaks the Key Vault data-plane API, so applications and Azure services that already integrate with Key Vault keys can target it with minimal change, and it is the right fit for customer-managed encryption keys and key operations through the familiar interface. Azure Cloud HSM is a newer, bare-metal HSM-as-a-service for lift-and-shift of applications that talk PKCS#11, JCE, or OpenSSL directly to a dedicated cluster, where the customer owns the full administrative stack. They solve different problems and carry different operational models. For the workloads this guide addresses, customer-managed keys for Azure services and operations through the Key Vault interface, Managed HSM is the service in scope, and the two should not be conflated when matching a requirement to a product.

Can a subscription owner access keys in a Managed HSM?

Not through Azure roles. This is the core of the administrative isolation the service provides. A subscription or management-group owner can perform management-plane operations on the resource, such as reading its properties or deleting it, but Azure RBAC does not grant access to the keys themselves. To use, read, or list a key, an identity must hold a local role inside the instance, and only an instance administrator can assign that. A subscription owner who is not also a local HSM role holder is locked out of the cryptography no matter how much Azure authority they accumulate. That separation is usually the exact sentence a control framework reaches for when it demands that keys be isolated from the cloud provider’s high-privilege administrators, and it is what the standard vault cannot offer.

How much does Managed HSM cost compared to Key Vault?

Managed HSM bills primarily by the hour for the provisioned instance, regardless of how busy it is, because a single-tenant cluster is reserved for you. Standard Key Vault, by contrast, charges per operation and per HSM-backed key, so a lightly used vault costs very little. The steady hourly cost of a reserved cluster is far higher than a per-operation vault for most workloads. For a workload with a genuine single-tenancy mandate, that cost is simply the price of the requirement. For a workload without one, it is money spent on assurance nobody asked for, with no compliance benefit. Verify current pricing on the Azure Key Vault pricing page, since rates change, and treat the cost difference as part of the decision rather than a footnote: the cheapest correct answer is almost always the vault until a clause forces the move.

Can Managed HSM store secrets and certificates?

No. A Managed HSM instance holds HSM-backed keys only. It does not store secrets such as connection strings and passwords, nor certificates, the way a general-purpose vault does. This narrowing is intentional, because the service is a focused key fortress rather than a general store, and keeping it focused is part of what justifies its cost and ceremony. If your need is to store a database password, an API token, or a certificate, the standard vault remains the right home and Managed HSM is the wrong tool. A common architecture pairs the two: a Managed HSM holds the high-assurance keys that a mandate requires in single-tenant hardware, while a standard vault holds the secrets and certificates the same application needs, each store doing the job it is designed for.

How do I rotate a key in Managed HSM?

Rotation creates a new version of the existing key inside the hardware rather than a new key. The versionless key identifier continues to resolve to the current version, so consumers that reference the versionless identifier adopt the new version on their own schedule, while the previous version is retained so material wrapped under it can still be unwrapped. This is what prevents a rotation from instantly breaking access to already-encrypted data. The discipline is to rotate on a defined cadence, confirm that each consuming service has actually moved to the new version, and retire an old version only when nothing depends on it. Destroying an old version while encrypted data still references it is how a routine rotation turns into an outage, so verify dependence before retirement and never force the deletion of a version under time pressure.

Can I export a key from Managed HSM?

Only if the key was created as exportable, and for most compliance scenarios it should not be. A key marked non-exportable cannot leave the hardware boundary, which is usually the entire point of placing it in single-tenant hardware. Where a workload genuinely needs to release a key to an attested environment such as a confidential-computing enclave, the service supports a secure key release policy that releases the key only to an environment whose attestation satisfies the policy. That capability widens the boundary by design, so write any release policy narrowly, tie it to a specific attestation, and treat an exportable key as a deliberate, documented exception that appears in the audit record. The default posture for a compliance key is non-exportable, and any departure from it should carry a recorded reason rather than being chosen for convenience.

How do I migrate keys from standard Key Vault to Managed HSM?

First decide whether to re-key or import. Re-key when the source was software-protected or when a fresh key is acceptable, because you cannot retroactively make a software key HSM-protected; create a new HSM-backed key in Managed HSM and re-wrap under it. Import when the requirement is to preserve the exact key material and the source is already an HSM, using the BYOK wrap-and-unwrap flow. Then sequence the consumer cutover carefully: stand up the key, grant each consumer’s identity its narrow key-scoped local role, re-point one consumer at a time, and verify access after each move. Keep the old key alive until every consumer has moved and every dependent piece of data has been re-wrapped where required, and plan a rollback for each consumer. Treat the whole migration as a change with a named owner and a soak period rather than a quick re-pointing of configuration values.

Does Managed HSM support private endpoints?

Yes, and for a high-assurance store a private endpoint is close to mandatory. By default the instance has a public data-plane endpoint reachable, subject to access control, from the internet, which is rarely acceptable for a service whose whole reason for existing is isolation. Configure a private endpoint so data-plane traffic flows over your virtual network, and where a private endpoint is not yet in place, restrict public network access with the firewall to known address ranges and set the default action to deny. Then verify that the consuming services actually resolve and use the private path, because a private endpoint that applications do not use protects nothing. An exposed single-tenant HSM undercuts the isolation story the service was bought to tell, so closing the network is among the highest-value hardening steps after handling the security domain.

How do I recover a Managed HSM in a regional outage?

You provision a new instance in another region and use the security-domain recovery flow to restore your keys into it, which requires the security domain blob and a quorum of the custodian RSA keys. Because the security domain ties the keys to material only you hold, the recovery reconstitutes your keys without Microsoft being able to do so. The dependency is unforgiving: if the security domain backup is missing or a quorum of custodian keys cannot be assembled, there is no recovery path, by design. This is why a recovery rehearsal is the core of the disaster-recovery plan rather than optional rigor. Schedule a rehearsal in which named custodians actually retrieve their keys, a quorum is assembled, and a recovery into a non-production instance is performed end to end, so the plan is a tested control rather than a hopeful document.

Is Managed HSM overkill for my workload?

Probably, unless a written requirement says otherwise. The most common mistake is adopting the service because it sounds like the strongest option, without a clause demanding single-tenancy or administrative isolation. That choice inherits the hourly cost, the security-domain ceremony, and the local-role model for a benefit nobody can point to, and because the rigor was never driven by a real control objective it is often thin, which can leave the instance less safe than a properly hardened standard vault. Ask one question: can you point to the requirement that names single-tenant hardware, a customer-controlled security domain, or isolation from the provider’s administrators? If yes, the service is justified and the cost is the price of the mandate. If no, the standard vault, with the premium tier where validated hardware is needed, is the correct, cheaper, and frequently safer answer.