Azure Virtual Machines: The Complete Engineering Guide

Most engineers create their first Azure Virtual Machines by accepting a portal default, and the bill, the latency, and the 3 a.m. page all arrive later as separate surprises that nobody connects back to that one click. The gap between using a VM and understanding one is the gap between copying a size name and being able to predict, before deployment, how the machine will behave under load, what it will cost when idle, and which failure will take it down first. This guide closes that gap. The claim it argues is simple to state and easy to get wrong in practice: every production VM is really four coupled choices, the size family, the disk tier, the availability construct, and the cost commitment, and getting one wrong silently caps the other three.

Azure Virtual Machines four-decision model for sizing, disks, availability, and cost - Insight Crunch

No single documentation page joins those four decisions into one reasoning chain, which is why so many deployments are individually defensible and collectively wrong: a generously sized machine throttled by a cheap disk, a redundant pair that shares one failure domain, a reserved commitment on a workload that should have been interruptible. The reader who finishes this guide should be able to derive a size from a workload rather than guess it, name the failure modes before they happen, and read a monthly invoice without surprise.

What an Azure Virtual Machine actually is

An Azure Virtual Machine is not a single object. It is a small graph of resources that the platform provisions and bills independently, presented to you as one machine. The compute itself, the slice of a physical host running your guest operating system, is the piece people picture. Around it sit the resources that make it reachable and durable: a network interface that carries the private address and any associated rules, one or more managed disks that persist state, an optional public address, a network security group that filters traffic, and a placement decision that puts the machine in an availability set, an availability zone, or neither.

How is an Azure Virtual Machine actually assembled?

A VM is the compute resource plus its dependent resources: a NIC for connectivity, an OS disk and optional data disks for persistent state, an optional public IP, a security group for filtering, and a placement into an availability set or zone. The platform schedules that bundle onto a physical host. Each piece is provisioned and billed on its own.

That decomposition matters the moment something breaks, because the symptom rarely names the resource at fault. A machine that will not accept connections might have a healthy guest and a misconfigured security group. A machine that boots but performs terribly often has correct compute and a disk that caps throughput far below what the workload demands. Holding the resource graph in your head, rather than the single-machine abstraction the portal shows you, is the first discipline of running compute on Azure. When you delete a VM, the dependent disks, addresses, and interfaces frequently survive, which is both a safety feature and a common source of orphaned cost.

The compute slice runs on a physical host inside an Azure datacenter, scheduled by the platform’s allocator. You do not pick the host, and ordinarily you should not care which one you land on. The allocator’s behavior becomes visible only at the edges, when a requested size is unavailable in a region and you receive an allocation error, or when a resize needs a host with different hardware and forces the machine to move. Both cases are predictable once you understand that your VM is a guest on shared infrastructure whose capacity is finite per region and per series.

The host, the hypervisor, and what runs beneath your guest

Your guest operating system does not touch hardware directly. It runs on a hypervisor, a thin layer that partitions one physical server into isolated guests and arbitrates their access to the underlying processors, memory, storage controllers, and network cards. Azure runs a customized hypervisor across its fleet, and the practical consequence for you is isolation: a noisy neighbor on the same host should not see your memory or starve your scheduling, because the partition boundary is enforced below the guest. The cases where neighbors matter are subtle and bounded, chiefly shared physical resources such as a local disk’s aggregate throughput on certain storage optimized sizes, and Azure documents which sizes dedicate versus share each resource.

The physical host carries a fixed quantity of cores, memory, local storage, and network capacity, and the sizes you can request are slices of that capacity. This is why a region can run short of a particular size: the hosts that offer that size are finite, and when they fill, the allocator returns a capacity error rather than conjuring hardware. It is also why constrained-core sizes exist. A workload licensed per core, such as a commercial database, sometimes needs the memory and disk capacity of a large size but only a fraction of its cores active, so Azure offers variants that expose fewer schedulable vCPUs while keeping the larger size’s memory and disk limits, cutting the license bill without starving the machine of memory.

The allocator is the platform component that places your requested machine onto a host with the right hardware and free capacity. You never address it directly, yet its decisions surface at two edges worth naming. The first is placement at creation: a request for a specific size in a specific zone with a specific placement constraint can leave the allocator no valid host, producing an allocation failure that means scarcity, not misconfiguration. The second is movement during a resize, when a target size unavailable on the current host forces a relocation. Understanding that your machine is a scheduled guest on finite, shared infrastructure turns both of these from mysteries into expected behaviors with known responses.

Generation 1 and Generation 2 machines

A Generation 1 VM boots through an emulated BIOS, while a Generation 2 VM boots through UEFI, which unlocks larger OS disks, faster boot in some cases, and the security features that depend on UEFI. The generation is fixed at creation and cannot be switched in place, so choose Generation 2 unless a legacy requirement forces otherwise.

The generation is more consequential than its quiet default suggests. UEFI boot is the prerequisite for Trusted Launch, the security baseline that adds secure boot, a virtual Trusted Platform Module, and boot integrity monitoring, which together defend against rootkits and unsigned boot components by refusing to load anything that fails signature checks. Confidential VM sizes go further, encrypting the memory of the guest so that even the host operator cannot read it, which suits workloads handling regulated data where the threat model includes the infrastructure provider. None of this is available to a Generation 1 machine, and because the generation is immutable after creation, picking Generation 1 by inertia quietly forecloses the entire security baseline. The image you deploy must support the generation you choose, since the boot path differs, so verify image compatibility before committing rather than discovering the mismatch at first boot, a failure that surfaces through the boot diagnostics path covered later.

The series families and how to read a size name

Azure groups VM sizes into families tuned for different bottlenecks, and the family is the first of the four decisions. General purpose families balance CPU and memory for web servers, small databases, and development work. Compute optimized families raise the CPU-to-memory ratio for batch processing, application servers, and network appliances that are processor bound. Memory optimized families do the opposite, carrying far more RAM per core for relational databases, in-memory caches, and analytics. Storage optimized families pair high local disk throughput with the cores to drive it, suiting big data and transactional systems that hammer local storage. GPU families attach accelerators for training, inference, rendering, and visualization. A burstable tier sits alongside general purpose for workloads that idle most of the time and spike occasionally, banking credits while quiet and spending them under load.

Reading a size name is a skill that pays off immediately. A name such as Standard_D4s_v5 decomposes cleanly: the family letter signals the workload class, the number gives the vCPU count, the trailing lowercase letters encode capabilities, and the version suffix marks the hardware generation. The lowercase letters carry real meaning that engineers routinely skip. An s indicates the size supports premium storage, which gates your disk options. A d indicates a local temporary disk is attached. An a indicates an AMD processor rather than Intel. Skipping those letters is how someone provisions a size that cannot attach the premium disk their database needs, then spends an afternoon confused about why the option is greyed out.

Which VM series should you choose?

Match the family to the workload’s binding constraint. Choose general purpose when CPU and memory are balanced, compute optimized when the processor is the limit, memory optimized for databases and caches, storage optimized for local-disk-heavy systems, GPU for accelerated work, and burstable for mostly-idle services. Derive the family from the bottleneck, never from a default.

The deciding move is to identify what the workload is actually bound by before you open the size list. A web tier that spends its life waiting on a database is not CPU bound, so a compute optimized size wastes money on cores that sit at single-digit utilization. A reporting job that loads a large dataset into memory and grinds is memory bound, and giving it more cores without more RAM leaves it swapping. The size list is long and the temptation is to sort by price and pick something plausible, but the family choice is a reasoning step, not a shopping step. The cost-driven refinement, picking the smallest size in the right family that meets the load, is its own discipline covered in our guide to right-sizing Azure VMs to cut cost; the family decision comes first and constrains everything after it.

To enumerate the sizes available in a region and their specifications before committing, the command line gives you the authoritative list:

az vm list-sizes --location eastus --output table

That output shows the vCPU count, the memory, the maximum data disk count, and the temporary storage for every size offered in the region, which is the raw material for a defensible family and size decision rather than a guess.

Processor architectures, hardware generations, and specialized sizes

The family answers what a workload is bound by, but two further axes shape the size decision: the processor architecture beneath the family and the hardware generation marked by the version suffix. Azure offers Intel, AMD, and Arm-based sizes, and the architecture is encoded in the size name. An a in the capability letters marks AMD silicon, and dedicated Arm families exist for workloads compiled for that instruction set. The architecture matters for two reasons. Price-to-performance differs between vendors for a given workload, so a CPU-bound batch job can sometimes run cheaper on AMD or Arm for the same throughput. Software compatibility also differs, because Arm sizes run only binaries built for Arm, which rules them out for software shipped solely as x86 and makes them ideal for containerized, recompiled, or interpreted workloads where the runtime already targets the architecture.

The version suffix tracks hardware generation, and newer generations generally deliver more performance per core and per dollar on the same family. A v5 size typically outperforms the v3 it succeeds at a comparable price, which means leaving a long-running workload on an old generation is a quiet, recurring overpayment that a migration to the current generation corrects. Generations are not always available in every region simultaneously, so a design that assumes the newest generation everywhere can hit availability gaps that the size-list command exposes per region before deployment. Checking generation availability in your target region is part of a defensible size decision rather than an afterthought.

When should you choose AMD or Arm over Intel?

Choose AMD when its price-to-performance beats Intel for your measured workload and your software runs on x86, which it does on both. Choose Arm when your software is built for Arm or runs on an architecture-agnostic runtime and you want the efficiency gain. Stay on Intel when a dependency requires it or when only Intel offers a needed feature.

Specialized sizes extend the catalog beyond the standard families. Confidential VM sizes encrypt guest memory and suit regulated workloads whose threat model includes the host. Isolated sizes dedicate an entire physical host to a single customer’s machine, which some compliance regimes require, at the cost of the flexibility a shared host provides. High-performance computing families pair fast interconnects with the cores for tightly coupled parallel work such as simulations. Sizes with local NVMe storage expose very high local throughput for caches and scratch space that tolerates the ephemerality of local disk. The discipline across all of these is the same as for the standard families: identify the binding requirement, whether it is memory confidentiality, host isolation, interconnect bandwidth, or local throughput, and let that requirement select the specialized size rather than reaching for a specialized size by reputation. The cost-driven refinement of choosing the smallest adequate size, including across architectures, is worth treating as its own exercise once the family is settled.

Managed disks and the performance ceiling everyone misses

The second decision, the disk tier, is where the most common and most expensive mistake lives. Engineers size a machine by vCPU and memory, watch it underperform, and blame the compute, when the real ceiling is a storage tier capping input and output operations far below what the application generates. The disk is frequently the true bottleneck, and it is invisible in the size name.

Azure managed disks come in tiers that trade cost against performance. Standard HDD sits at the bottom, cheap and slow, fit for backups, archives, and workloads indifferent to latency. Standard SSD raises consistency and throughput at modest cost for light production and web servers. Premium SSD delivers the low-latency, high-IOPS profile that production databases and latency-sensitive applications require, with performance scaling by provisioned size. Premium SSD v2 decouples capacity from performance, letting you dial IOPS and throughput independently of how many gigabytes you allocate, which removes the old habit of over-provisioning capacity just to buy speed. Ultra Disk sits at the top for the most demanding transactional and analytic systems, again with capacity, IOPS, and throughput set independently.

Why is the disk tier often the real performance ceiling?

Because each tier caps IOPS and throughput, and the cap is set by the disk, not the VM size. A large compute size attached to a Standard SSD will stall when the application’s IOPS demand exceeds the disk ceiling, even with idle cores and free memory. The fix is the disk tier, not a bigger machine.

The exact IOPS and throughput ceiling for each tier and each disk size changes as Azure revises the offering, so the precise per-size numbers should be confirmed against the current official disk documentation before you design to them. The principle is stable even as the figures move: the tier sets a hard ceiling, the VM size itself imposes a second ceiling on aggregate disk performance, and the lower of the two wins. A premium disk capable of high throughput attached to a small VM whose own limit is lower delivers only the VM’s limit, and the reverse traps a large VM behind a slow disk. Both numbers have to be read together. The OS disk is not exempt: it carries its own IOPS and throughput limits, and treating it as free of constraint is a recurring misdiagnosis when a machine that should be fast feels sluggish during boot, logging, or paging.

Caching adds a further layer. Premium disks support a host-level read or read-write cache that can serve repeated reads from memory on the host rather than the disk, raising effective performance for the right access pattern and hurting it for the wrong one. A write-heavy log volume gains nothing from a read cache and can suffer from a misapplied write cache. Matching the cache setting to the access pattern is part of the tuning surface that the Azure VM performance tuning guide covers in the depth this overview only opens.

Changing a disk’s performance characteristics is a live operation in many cases. The command to adjust a managed disk’s tier or provisioned performance is direct:

az disk update --resource-group myRG --name myDataDisk --sku Premium_LRS

For the v2 and Ultra tiers, you provision IOPS and throughput as explicit values rather than inheriting them from a SKU, which is what makes those tiers the right answer when capacity and performance needs diverge.

Disk performance beyond the tier: bursting, striping, caching, and ephemeral disks

The tier sets the baseline ceiling, but several mechanisms shift the effective performance above or below it, and knowing them separates an engineer who tunes from one who guesses. Bursting lets a disk or a VM temporarily exceed its provisioned baseline to absorb spikes. Some disks offer credit-based bursting, banking unused capacity during quiet periods and spending it during short bursts, which suits workloads with a low steady rate and occasional peaks such as a boot storm or a periodic batch. Others offer on-demand bursting at an additional charge for sustained higher performance. The trap is designing for the burst rate and discovering under sustained load that the disk has fallen back to its lower baseline, so the rule is to size the baseline for the sustained demand and treat bursting as headroom for spikes, never as the steady-state budget.

Striping is the technique that breaks past a single disk’s ceiling. By attaching several data disks and combining them into one logical volume through the guest operating system’s volume manager, the aggregate IOPS and throughput become the sum of the members, subject always to the VM size’s own aggregate limit. A workload that needs more performance than any single disk tier provides reaches it by striping across multiple disks rather than buying an exotic single disk, provided the VM size can drive the combined load. This is where the two ceilings interact most visibly: striping ten fast disks behind a small VM yields only the small VM’s limit, so the size and the disk layout must be designed together. Matching the stripe width to the workload and the VM’s aggregate capacity is a core tuning move covered in depth when we work through tuning Azure VM throughput and latency.

How does host caching change disk performance?

Host caching stores frequently accessed disk data in memory on the physical host, serving repeated reads or buffering writes closer to the compute than the disk itself. Read-only caching accelerates read-heavy volumes such as a database’s data files, while none suits write-heavy logs. A mismatched cache setting can slow a workload, so set it to the access pattern.

The caching modes carry real consequences that a default rarely gets right. A read-only cache helps a volume dominated by repeated reads and is the right choice for many database data disks, because the host serves the hot pages from memory rather than fetching them across the storage path each time. A read-write cache can help certain patterns but introduces a window where acknowledged writes sit in the host cache before reaching the disk, which is unacceptable for a transaction log that must be durable the instant it is acknowledged. The correct setting for a write-ahead log is usually no caching, so the durability guarantee holds. Applying a read cache to a write-heavy volume wastes host memory and can add overhead, which is why caching is a per-disk decision driven by what the volume does, not a global switch flipped once.

Ephemeral OS disks change the economics for stateless machines. An ephemeral OS disk lives on the host’s local storage rather than on remote managed storage, which makes it free of the managed disk charge and faster for some operations, at the cost of being wiped if the machine moves hosts or is deallocated. For a stateless web tier behind a load balancer, where any instance can be rebuilt from an image in moments and holds no unique state, an ephemeral OS disk removes a cost and a dependency without losing anything that matters. For anything that keeps state on its OS disk, an ephemeral disk is a data-loss incident waiting to happen. The decision rule is whether the OS disk holds anything you cannot rebuild, and only when the answer is no does the ephemeral option apply.

Shared disks let a single managed disk attach to more than one VM at once, which exists specifically for clustered workloads that coordinate access through a cluster manager, such as a failover cluster running a shared-disk database. This is a narrow tool, not a general file-sharing mechanism, because without a cluster-aware filesystem or coordinator, two machines writing the same disk corrupt it. Reaching for a shared disk to share files between machines is a misuse that a proper file service handles correctly, and the shared disk is reserved for the clustering scenarios that genuinely require concurrent block access with their own coordination layer.

The availability construct and what SLA you actually get

The third decision determines what failure your VM survives, and it is the one most often left at its accidental default of nothing. A VM with no availability construct is a single instance on a single host. If that host fails, undergoes maintenance, or sits in a rack that loses power, the machine goes down. Azure offers two constructs to raise the odds, and they protect against different failures.

An availability set spreads VMs across fault domains and update domains within a single datacenter, so a hardware failure or a planned host update affects only part of your set rather than all of it. An availability zone spreads VMs across physically separate datacenters within a region, each with independent power, cooling, and networking, so a zone-level failure leaves the others running. The set protects against rack and host events inside one building; the zone protects against the loss of an entire building. They answer different questions, and the right one depends on how much failure you need to survive.

What SLA does a single VM receive?

A single VM qualifies for the highest single-instance SLA tier only when every OS and data disk uses premium or ultra storage, a figure commonly cited at 99.9 percent. Two or more VMs in an availability set reach a higher tier, and a zone-spread pair reaches the highest. Confirm the exact percentages against the current Azure SLA.

The disk requirement on the single-instance SLA is the detail that catches people. The highest single-VM availability commitment is contingent on all attached disks being premium or ultra; mix in a standard disk and the machine no longer qualifies for that tier. This couples the availability decision back to the disk decision, which is precisely the kind of hidden coupling the four-decision model exists to surface. A single VM, even at its best disk configuration, still receives a lower commitment than a multi-VM availability set or a zone-spread pair, because one instance has no redundancy when its host fails. Assuming a lone VM carries a high SLA is a misdiagnosis that surfaces only during an outage, which is the worst time to learn it.

The numbers themselves, the single-instance tier, the availability-set tier, and the zone tier, are published in the Azure SLA and revised periodically, so treat the commonly cited figures as a starting point to verify rather than a constant. What does not change is the ordering and the reasoning: more independent failure domains buy a higher commitment, and the construct you pick should follow from the downtime your workload can tolerate, not from whatever the deployment template defaulted to.

Networking the machine: interfaces, filtering, and accelerated paths

The network interface is the resource that connects a VM to a virtual network, carrying its private address and binding it to a subnet. A machine can hold more than one interface, which serves designs that separate traffic onto distinct subnets, such as a network appliance with a management plane on one interface and a data plane on another. The interface count a size supports scales with the size, so a design that needs several interfaces must choose a size large enough to attach them, another coupling between the size decision and a requirement that the size name does not advertise. The public address, when present, attaches through the interface and can be dynamic or static; a dynamic address can change across a deallocation, which is why any machine that must present a stable public endpoint needs a static address reserved explicitly.

Traffic filtering happens through network security groups, which hold the rules that permit or deny traffic by source, destination, port, and protocol, evaluated in priority order until a match decides the packet. A security group can attach to a subnet, to an interface, or to both, and when both apply, the rules combine, which is a frequent source of confusion when a packet is blocked by a subnet rule the interface-level view does not show. The discipline is to keep rules minimal and to know which security groups apply to a given interface, because a connection failure is far faster to diagnose against a small, documented rule set than against a sprawling one accumulated over time. Application security groups raise the abstraction by letting rules reference a named group of machines rather than raw addresses, so a rule can permit traffic from the web tier to the application tier by name, and adding a machine to the web tier inherits the rules without editing addresses.

What does accelerated networking actually do?

Accelerated networking bypasses the host’s virtual switch and lets the VM’s interface talk to the physical network card more directly, cutting latency, raising packets per second, and reducing jitter and host CPU overhead. It suits network-intensive workloads and is enabled per interface on supported sizes, with no downside for traffic that benefits from a faster path.

The mechanism behind accelerated networking is single-root input output virtualization, which presents the physical network card’s virtual function to the guest so that packets skip the software switching layer that otherwise sits between the guest and the wire. The gain is most visible on workloads that move many small packets, where the per-packet overhead of software switching dominates, and on latency-sensitive paths where shaving microseconds matters. Support depends on the size and the operating system, so a design that depends on the lower latency should confirm both the size and the image support the feature rather than assume it. When it is available and the workload is network bound, leaving it off is leaving performance and host efficiency unclaimed, and the broader latency tuning that builds on it is part of the performance work the tuning guide develops. Proximity placement groups complement this by colocating machines in the same datacenter to minimize the network distance between them, which matters for tightly coupled tiers where every millisecond of inter-node latency compounds.

Configuration that matters, in commands

Creating a VM from the command line forces the decisions into the open in a way the portal’s defaults can hide. A minimal but deliberate creation names the image, the size, and the administrative access method explicitly:

az vm create \
  --resource-group myRG \
  --name myVM \
  --image Ubuntu2204 \
  --size Standard_D4s_v5 \
  --admin-username azureuser \
  --generate-ssh-keys \
  --zone 1

Every flag there is one of the four decisions made visible. The size carries the family and the compute. The zone is the availability construct. The image and credentials are the setup. What the command does not show, the disk tier of the OS disk and any data disks, is exactly the decision people forget, which is why a deliberate deployment specifies the OS disk SKU and attaches data disks with their tiers chosen rather than inherited.

Resizing is where a quiet platform behavior surprises people. A resize to a size supported by the current host can happen in place, but a resize to a size the current host cannot serve requires the machine to be deallocated, moved to a host that can serve the new size, and started again. The command is the same in both cases; the disruption is not:

az vm resize --resource-group myRG --name myVM --size Standard_E8s_v5

When the resize forces a deallocation, the public IP can change unless it is static, in-memory state is lost, and the local temporary disk is wiped. Planning a resize as a potentially disruptive operation rather than an instant one prevents the outage that comes from treating it as free.

Boot and access diagnostics are the tools that turn a dead machine into a diagnosable one. Boot diagnostics captures the console output and a screenshot of the boot sequence, which is often the only window into a guest that will not come up. Enabling and reading it is straightforward:

az vm boot-diagnostics enable --resource-group myRG --name myVM
az vm boot-diagnostics get-boot-log --resource-group myRG --name myVM

The serial console builds on the same channel to give interactive access to the guest even when the network path is broken, which is the difference between fixing a misconfigured boot setting in minutes and rebuilding the machine. The full no-boot diagnosis sequence, the distinct causes and their tested fixes, is the subject of our walkthrough on fixing Azure VM boot and no-boot failures.

The VM agent and extensions round out the configuration surface. The agent runs inside the guest and lets the platform install extensions: a custom script that bootstraps the machine on first boot, a desired-state configuration that keeps it converged, a monitoring agent that ships metrics and logs. Extensions are how a VM stops being a hand-built pet and becomes a reproducible artifact, and a machine whose agent is unhealthy quietly loses the ability to receive them, which is a failure mode worth monitoring for directly.

Identity, the metadata service, and bootstrapping

A VM that needs to call other Azure services should not carry stored credentials, because a secret on disk is a secret to leak. Managed identity solves this by giving the machine an identity in the directory that the platform issues tokens for, so the VM authenticates to a storage account or a key vault without any secret embedded in its configuration. A system-assigned identity is tied to the machine’s lifecycle and deleted with it, which suits a single machine that owns its own access. A user-assigned identity is a standalone resource that several machines can share, which suits a fleet that should all carry the same access without configuring each one separately. The choice turns on whether the identity should live and die with one machine or persist and be shared, and using either is strictly better than planting a credential in a configuration file.

The instance metadata service is a non-routable endpoint reachable only from inside the guest that exposes information about the machine itself: its size, its region and zone, its network configuration, the tokens for its managed identity, and the scheduled-events feed that warns of upcoming maintenance. Because the endpoint is reachable only from within the VM and never from outside, it is a safe channel for a machine to learn about itself and to fetch its identity tokens at runtime. Software that needs to adapt to where it runs, or to acquire a token for a managed identity, reads the metadata service rather than hardcoding values, which keeps an image portable across regions and sizes because the machine discovers its own context at boot.

Bootstrapping is how a freshly created machine becomes a configured one without manual steps. Cloud-init, supported on many Linux images, runs declarative configuration at first boot from data passed at creation, installing packages, writing files, and starting services so the machine arrives ready. The custom script extension achieves a similar end across Linux and Windows by running a script the platform delivers through the agent, and desired-state configuration keeps a machine converged to a defined state over time rather than only at first boot. The principle uniting them is that a production machine should be reproducible from its definition rather than hand-built, because a machine nobody can recreate is a liability the day it fails. Choosing cloud-init for first-boot setup on Linux and an extension for ongoing convergence or cross-platform needs covers most cases.

Images and reproducible machine definitions

Every VM starts from an image, the template that supplies the operating system and any preinstalled software, and the image choice shapes both security and repeatability. Marketplace images provide maintained operating systems and preconfigured software stacks, which suit standard deployments that want a vendor-maintained base. A custom image captures a machine you have configured, so you can stamp out identical copies, and it comes in two flavors that matter. A generalized image has machine-specific information removed so each deployment gets fresh identity and hostname, which is what you want for stamping out many machines. A specialized image preserves the original machine’s state and identity, which suits restoring a specific machine rather than cloning a template.

The Azure Compute Gallery is the service that manages custom images at scale, versioning them and replicating them across regions so a global deployment pulls a local copy rather than reaching across the world to a single source. Versioning lets a fleet pin to a known-good image version and roll forward deliberately, and regional replication cuts the deployment latency and the cross-region dependency that a single-region image source imposes. A mature image practice treats the gallery as the source of truth: build an image once, test it, publish a version, replicate it to the regions that need it, and deploy machines from that version, so every machine in a tier is provably identical and an update is a new image version rather than a manual change applied machine by machine. This image discipline is what turns a fleet from a set of snowflakes into a reproducible system, and it pairs naturally with the infrastructure-as-code practice that defines the surrounding resources.

Maintenance, scheduled events, and capacity guarantees

Azure performs maintenance on the physical fleet, and how a machine experiences that maintenance depends on its construct and on whether the maintenance is planned or driven by a hardware fault. Most platform updates are designed to be transparent, applied without rebooting the guest, but some require a reboot or a brief pause, and a hardware failure can force an unplanned move. Fault domains and update domains are the mechanism that limits the blast radius inside an availability set: fault domains group machines by shared power and network hardware so a single hardware failure cannot take the whole set, and update domains group machines so a planned update reboots only one group at a time. Spreading a set across several of each is what converts a single point of failure into a partial, survivable event.

The scheduled events endpoint, exposed through the instance metadata service, gives a machine advance warning of an impending reboot or maintenance so it can react gracefully: drain connections, flush state, fail over a role, or acknowledge the event to let it proceed sooner. A machine that polls scheduled events can turn a disruptive maintenance window into a controlled handoff, which is the difference between a maintenance event that drops requests and one that completes cleanly. Building this awareness into a stateful workload is a reliability investment that pays off every time the platform touches the host.

Capacity reservations address the other side of availability, which is being able to get the capacity at all. A capacity reservation holds a specified quantity of a given size in a region or zone for your use, so the capacity is guaranteed to be there when you start or scale into it rather than subject to the allocator finding free hosts at that moment. This matters for workloads that must be able to scale on demand into a region where the size is in contention, or for disaster recovery where you need to be certain the target capacity exists when you fail over. Dedicated hosts take this further by assigning entire physical servers to you, giving control over maintenance timing and physical placement for compliance or licensing reasons, at a cost and a management burden that only those requirements justify. For most workloads the shared host and on-demand allocation are correct, and reservations and dedicated hosts are tools reserved for the specific guarantees they provide.

The VM sizing decision table

The four decisions resolve fastest against a table that maps a workload profile to a family, a disk tier, and an availability construct, with the signal that decides each row. This is the artifact to keep beside you when a new workload arrives.

Workload profile	Series family	Disk tier	Availability construct	Deciding signal
CPU-bound batch or app server	Compute optimized	Standard SSD or Premium SSD	Zone pair if uptime matters	Sustained high vCPU utilization, low memory pressure
Memory-bound database or cache	Memory optimized	Premium SSD or Ultra	Zone pair, premium disks for single-instance SLA	Working set exceeds general-purpose RAM, swapping under load
Burstable mostly-idle service	Burstable	Standard SSD	Single instance or set	Long idle stretches with occasional short spikes
IO-bound transactional or analytic	Storage optimized	Premium SSD v2 or Ultra	Zone pair	Disk IOPS or throughput is the measured ceiling
GPU training or inference	GPU	Premium SSD	Single instance, scheduled	Accelerator utilization is the binding resource

The table is a starting point, not a verdict. The deciding signal column is the part to internalize, because it pushes you to measure the binding constraint rather than copy a row. Two workloads that look alike on paper can land in different rows once you know which resource each one actually saturates. The way to use it is to profile the workload first, find the one resource it runs out of soonest under realistic load, and let that resource pick the row, then adjust the disk tier and the availability column to the durability and uptime the workload actually requires. A row chosen from a measurement holds up in a design review; a row chosen from a guess gets challenged the moment someone asks why.

Failure modes and how to design against them

A VM fails in a small number of recognizable ways, and naming them in advance turns a frightening incident into a checklist. The no-boot failure, where the machine provisions but the guest never comes up, is read through boot diagnostics and the serial console; the causes range from a corrupt boot configuration to a full OS disk to a misapplied update. An OS provisioning timeout during creation usually points at an image or agent problem rather than the platform. An SSH connection refused or an RDP connection error most often traces to a security group rule, a missing public address, or a guest service that did not start, and the discipline is to check the network path and the guest service separately rather than assuming one.

Allocation failures are a different species, surfacing as AllocationFailed, SkuNotAvailable, or an overconstrained allocation request. These mean the region or zone cannot currently satisfy your size request, not that your configuration is wrong. The response is to try a different size in the same family, a different zone, or a different region, and to design capacity-sensitive deployments so they are not pinned to a single scarce size in a single zone. Disk IOPS saturation is the quiet failure that masquerades as a slow application: the cores are idle, the memory is free, and the disk queue is full because the workload is generating more operations than the tier permits. The fix is a higher disk tier or a redistribution of IO, not a bigger VM, which is the disk-as-ceiling lesson arriving as an incident.

Designing against these means treating each as a known quantity. Enable boot diagnostics before you need it. Keep security group rules minimal and documented so a connection failure is fast to diagnose. Avoid pinning to scarce sizes. Monitor disk queue depth alongside CPU and memory so saturation is visible before it becomes a complaint. Each failure mode has a dedicated diagnosis path, and the troubleshooting articles in this series carry the tested fixes for the named errors above.

A host failure is the failure mode the availability construct exists to address, and understanding how it presents prevents panic when it happens. When the physical host fails, an unprotected single machine goes down and the platform attempts to heal it by restarting it on a healthy host, which means a single VM can recover on its own but only after downtime measured in minutes, not seconds. A machine in an availability set or spread across zones sees only part of its capacity affected, because the construct guarantees the instances do not share the failed hardware. The lesson that lands during a host failure is the one the SLA section argued in the abstract: redundancy is not a checkbox for an audit, it is the difference between a partial degradation and a full outage when the hardware beneath you fails, which it eventually will.

Memory pressure and disk-full conditions are the guest-side failures that platform metrics cannot see, which is why they catch teams that monitor only from outside. A machine that exhausts its memory begins swapping or killing processes, presenting as mysterious slowness or service crashes that the host’s processor and network metrics do not explain, because the problem lives inside the guest. A full OS disk can prevent logging, block writes, and stop the machine from booting after a restart, turning a slow creep into a sudden outage at the next reboot. Both are invisible without the guest agent reporting memory and filesystem metrics, and both are preventable with alerts that fire as the resource approaches its limit rather than after it is gone. The recurring pattern across these guest-side failures is that they are silent from the host’s vantage point and obvious from the guest’s, which is the entire argument for installing the agent and watching the guest metrics rather than trusting the platform view alone.

A generalization mismatch is the subtler image-related failure. Deploying a generalized image expects the platform to inject fresh identity, while deploying a specialized image preserves the source machine’s identity, and confusing the two produces machines with duplicate identities or a failed first boot. The symptom, a machine that provisions but behaves as though it is impersonating another, traces back to capturing or deploying the image in the wrong mode. Knowing which mode an image holds before deploying it, and capturing custom images in the generalized mode when the goal is to stamp out many distinct machines, prevents an entire class of confusing first-boot failures that have nothing to do with the platform and everything to do with the image’s preserved state.

Scale sets: when one machine becomes many

A single VM is the right unit for a workload that runs as one machine, but many production workloads run as a fleet of identical machines behind a load balancer, scaling out and in with demand. A virtual machine scale set is the construct for that fleet, managing a group of identical machines from one definition, adding instances when load rises and removing them when it falls, and replacing unhealthy instances automatically. The scale set turns the four decisions from a per-machine choice into a fleet-wide policy: one size family, one disk configuration, one image, and one scaling rule applied uniformly, which is what makes a large fleet manageable rather than a hundred machines configured by hand.

Scale sets come in two orchestration modes that differ in how much control you trade for which conveniences. Uniform orchestration treats the instances as fungible copies driven by a single model, optimized for large stateless fleets where any instance is interchangeable. Flexible orchestration manages individual machines that can differ more from one another while still gaining the group’s scaling and availability features, which suits workloads that need scale set benefits without perfect uniformity. The choice follows the workload: a stateless web tier of identical machines fits uniform, while a more heterogeneous group that still wants automatic scaling and instance management fits flexible.

When should you use a scale set instead of individual VMs?

Use a scale set when the workload runs as a fleet of identical or near-identical machines that should scale with demand and self-heal, such as a stateless web or worker tier. Use individual VMs when each machine is distinct, holds unique state, or is small enough in number to manage directly. The dividing line is fungibility and scale.

Autoscale is the policy engine that makes a scale set elastic, adding and removing instances based on metrics such as processor utilization, queue depth, or a schedule. A well-designed autoscale rule scales out fast enough to absorb a load spike before users feel it and scales in conservatively enough to avoid thrashing, with cool-down periods that prevent a brief metric swing from triggering churn. The common mistakes are scaling on the wrong metric, such as scaling a queue-driven worker on processor utilization when queue length is the true signal of backlog, and scaling in too aggressively so the fleet flaps. Pairing a scale set with spot capacity for the interruptible portion of a fleet is a frequent cost design, letting the baseline run on committed capacity while burst capacity rides cheaper interruptible instances, an arrangement that draws on the interruptible-compute model we unpack in how spot VMs cut compute cost.

Cost: stopped is not deallocated

The fourth decision, the cost commitment, starts with a billing distinction that costs people real money. A VM that you stop from inside the guest operating system is still allocated to a host, and you keep paying for the compute even though nothing is running. A VM that you deallocate through Azure releases the host, and compute billing stops; you continue to pay only for the disks and any static addresses that persist. The portal labels the released state “Stopped (deallocated)” precisely because the two states look similar and bill differently.

What is the difference between a stopped and a deallocated VM?

A stopped VM, halted from inside the guest, remains allocated to its host and continues to incur compute charges. A deallocated VM, stopped through Azure, releases the host and stops compute billing, leaving only disk and static-IP costs. The labels differ by one word and the bill differs by the full compute rate.

az vm deallocate --resource-group myRG --name myVM

That command is the one that actually saves money on a machine you are not using; a guest-level shutdown does not. For development and test machines that run only during working hours, automating deallocation is among the highest-return cost moves available, far simpler than re-architecting anything.

Beyond the on-and-off decision, the cost commitment is a spectrum. Pay-as-you-go carries the highest hourly rate and the most flexibility. A reserved commitment over one or three years trades flexibility for a substantial discount on a steady, predictable workload. Spot capacity offers the deepest discount in exchange for interruptibility: the platform can reclaim the machine when it needs the capacity, which suits fault-tolerant, restartable work and ruins anything that cannot survive eviction. Choosing the commitment is choosing how much you are willing to pay for certainty, and the interruptible option in particular is covered in our explainer on spot VMs and cost-saving compute. The right-sizing discipline, picking the smallest adequate size, multiplies whatever commitment you choose, because a discount on an oversized machine still overspends.

Reservations and savings plans are two distinct commitment instruments that engineers routinely conflate, and the difference governs which one fits. A reservation commits to a specific size family in a region for one or three years and discounts that exact capacity, which suits a workload you know will run on a known family in a known place. A savings plan commits to a fixed hourly spend across compute for one or three years and applies the discount automatically to whatever eligible usage runs, which suits a fleet whose exact sizes and regions shift over time but whose total spend is steady. The deciding signal is predictability of shape versus predictability of spend: reserve when the capacity is stable and specific, choose a savings plan when the spend is stable but the placement moves. Committing to a reservation on a workload you later migrate to a different family wastes the commitment, which is why the stability of the shape, not just the spend, drives the reservation choice.

Spot eviction behavior deserves precise understanding rather than a vague sense that the machine might disappear. You set an eviction policy that determines what happens when the platform reclaims capacity: the machine can be deallocated, preserving its disks so it can restart later when capacity returns, or deleted outright. You also set a maximum price you are willing to pay, and the machine runs as long as the spot price stays below it and capacity exists. Designing for spot means designing for eviction as a normal event, checkpointing progress so a reclaimed machine loses minimal work and treating the capacity as opportunistic rather than guaranteed. A batch pipeline that writes intermediate results durably and resumes from the last checkpoint barely notices an eviction, while a workload that holds hours of progress in memory loses it all, which is the line between work that belongs on spot and work that does not.

Azure Hybrid Benefit changes the math for organizations that already hold eligible operating system or database licenses with software assurance, letting them apply those licenses to Azure machines and pay only for the compute rather than the license-inclusive rate. For a fleet running a licensed operating system or database, applying the benefit can cut a meaningful slice of the per-hour cost, and overlooking it is a common, recurring overpayment hiding in plain sight. Bandwidth is the other line item that surprises people: inbound data transfer is generally free, but outbound data leaving a region is metered, so an architecture that moves large volumes of data across regions or out to the internet carries a transfer cost that the compute and disk figures do not show. Accounting for egress when designing a data-heavy or multi-region system prevents the bill that arrives from traffic nobody costed.

Protecting the data: encryption at rest and in the guest

Managed disks are encrypted at rest by default with platform-managed keys, so data on the disk is unreadable to anyone reading the underlying storage without the key. That default covers the baseline requirement, but workloads with stricter obligations have stronger options. Server-side encryption with customer-managed keys puts the encryption key in a key vault you control, so you hold the ability to rotate or revoke it and the disk becomes unreadable the moment you cut access, which satisfies regimes that require customer control of keys. Encryption at host extends protection to the temporary disk and the caches on the host itself, closing the gap that disk-only encryption leaves on local and cached data. For the strongest posture, double encryption applies two layers so a single compromised key does not expose the data.

Guest-side encryption is a distinct layer that operates inside the operating system rather than at the platform. Azure Disk Encryption uses the guest’s native facilities, dm-crypt on Linux and BitLocker on Windows, to encrypt volumes with keys held in a key vault, which some compliance frameworks specifically require because the encryption happens within the guest’s control. The choice between platform server-side encryption and guest-side encryption turns on where the requirement places the trust boundary: server-side with customer-managed keys gives you key control with less operational overhead, while guest-side encryption puts the encryption inside the operating system at the cost of managing it there. Stacking guest-side encryption on top of the default at-rest encryption is possible where a requirement demands defense in depth, and the right combination follows from the specific obligation rather than from a reflex to encrypt everything twice.

Backup, snapshots, and recovery

A disk encrypted and performant is still lost if it is corrupted, deleted, or hit by ransomware without a recoverable copy. Snapshots are point-in-time copies of a managed disk, cheap to take and fast to create a new disk from, which makes them ideal for a quick checkpoint before a risky change such as a major update or a schema migration. A snapshot taken immediately before a change gives a fast rollback if the change goes wrong, and taking one as a matter of habit before significant operations is a low-cost insurance that repeatedly proves its worth.

A backup service raises this to a managed, scheduled discipline. Azure Backup takes application-consistent backups on a policy, retains them according to a retention rule, and stores them in a vault isolated from the machine so that losing the machine does not lose its backups. The distinction between a snapshot and a managed backup matters: a snapshot is a manual, single point-in-time copy useful for a known moment, while a backup is an automated, retained, isolated series that survives the loss of the source and supports recovery to any retained point. Application consistency is the property that makes a backup restorable into a running application rather than into a crash-consistent state that may need repair, achieved by coordinating with the guest to quiesce writes at the moment of capture. A recovery plan that has never been tested is a hope, not a plan, so periodically restoring a backup to a scratch machine and verifying it boots and the application runs is what turns a backup policy into an actual recovery capability.

Monitoring, diagnostics, and patching

A machine you cannot see is a machine you cannot operate, and Azure exposes two layers of visibility. Platform metrics, collected without anything installed in the guest, report what the host sees: processor utilization, disk and network throughput, and the like, which is enough to spot a saturated resource from outside. Guest metrics require an agent inside the operating system and report what the host cannot see, such as memory pressure, disk space inside the filesystem, and per-process detail, which is where many real problems live. Relying on platform metrics alone leaves you blind to memory and disk-space exhaustion that the host cannot observe, so a production machine carries the agent and ships guest metrics and logs to a workspace where they can be queried and alerted on.

Alerts turn metrics from a dashboard you might check into a signal that reaches you, firing when a metric crosses a threshold or a log query matches a condition, so a disk filling or a service failing pages someone before users notice. The discipline is to alert on the symptoms that precede an outage, such as a disk approaching full or a queue backing up, rather than only on the outage itself, so the response happens during the warning window rather than after the failure. Patching closes the loop on keeping a machine healthy over time. An update management capability can assess which machines are missing updates, schedule patching in maintenance windows, and report compliance across a fleet, which converts patching from a manual chore that slips into a governed, auditable process. Combining boot diagnostics for the failure cases, guest metrics for the steady state, alerts for the warning signs, and managed patching for the maintenance is what operating a VM well actually looks like, well beyond the act of creating one.

Reproducing the machine as code

A machine clicked together in the portal is a machine nobody can recreate identically, which is acceptable for an experiment and a liability for production. Infrastructure as code captures the VM and its dependent resources, the interface, the disks, the security group, the placement, as a declarative definition that can be reviewed, versioned, and applied repeatedly to produce an identical result. Bicep and the underlying resource templates express Azure resources natively, while Terraform expresses them in a multi-cloud language, and both turn a machine from a hand-built artifact into a reproducible one defined in a file that lives in version control alongside the application.

The payoff is concrete and arrives at the worst moments. When a machine is lost, a reviewed definition rebuilds it exactly rather than from half-remembered portal clicks. When a second environment is needed, the same definition stamps out a matching one rather than a close-but-different approximation that behaves subtly unlike production. When a change is proposed, a code review catches the mistake before it reaches a running system, and a what-if preview shows exactly what a deployment will alter before it alters anything. The four decisions live in that definition as explicit, reviewable choices, the size, the disk SKUs, the zone, and any commitment, so the reasoning is recorded rather than lost in a console. Pairing the code that defines the resources with the image discipline that defines their contents produces a machine that is reproducible top to bottom, which is the standard a production fleet should hold itself to.

The start, stop, and hibernate lifecycle

A VM moves through a small set of states, and knowing them precisely prevents both billing surprises and data loss. Running is the active state where compute bills and the machine serves. A guest-level stop halts the operating system while leaving the machine allocated, so it still bills, which is the state people mistake for off. The deallocated state releases the host and stops compute billing, as the cost section established, but it also discards anything on the local temporary disk and can change a dynamic address, so it is off in the billing sense at the cost of ephemeral state. Deleted removes the compute resource while frequently leaving disks and addresses behind. Reading the state correctly is what tells you whether a machine is costing money, holding state, or safe to remove.

Hibernation adds a state that splits the difference between running and deallocated. When a machine hibernates, the contents of memory are written to the OS disk and the machine is deallocated, so compute billing stops, and when it resumes, memory is restored from disk and the machine continues where it left off rather than booting fresh. This matters for workloads with a long, expensive startup, such as a development machine with a heavy toolchain loaded or an application that warms a large cache, because hibernation preserves that warmed state across an off period without paying for compute while idle. The trade-offs are real: hibernation requires support from the size and the operating system, it consumes OS disk space to hold the memory image, and it is not instantaneous. The decision rule is whether resuming a warmed machine is worth more than the cost and constraints of hibernation, which it often is for stateful development machines and rarely is for a stateless tier that boots clean in moments.

Does a deallocated VM still cost anything?

A deallocated VM stops the compute charge, but its disks continue to bill because they persist independently, and a static public address you reserved continues to bill whether or not the machine runs. Deallocation removes the largest cost, the compute, while leaving the storage and reserved-address costs, so it lowers the bill without zeroing it.

Automating these transitions is where the lifecycle becomes a cost lever rather than a manual chore. A schedule that deallocates development and test machines outside working hours, that hibernates machines whose warmed state is worth keeping, and that leaves production running with the appropriate construct turns the lifecycle into policy. The automation should account for the side effects, reserving static addresses where a stable endpoint matters and never relying on the temporary disk for anything that must survive, so the off-hours saving never becomes an outage or a data-loss incident. The full no-boot recovery path, for the cases where a machine fails to come back from a stopped or deallocated state, is something we trace step by step in diagnosing why an Azure VM will not boot, which is worth knowing before you need it rather than during the incident.

When to use a VM and when to reach for something else

A VM is the right tool when you need full control of the operating system, when you are running software that expects a conventional machine, when you are migrating an existing server with minimal change, or when a workload’s licensing or compliance requires a dedicated guest. It is the most flexible compute on Azure and the most work to operate, because you own the patching, the configuration, and the scaling logic.

It is the wrong tool when a managed service would carry that operational weight for you. A stateless web application often belongs on a managed application host rather than a VM you patch by hand. An event-driven function belongs on a serverless platform that scales to zero. A containerized system at scale belongs on a managed orchestrator. The reasoning is the same in each case: a VM gives you control you may not need in exchange for operational burden you may not want, and the mature instinct is to reach for the most managed option that still meets the requirement, dropping to a VM only when the control is genuinely necessary.

The comparison sharpens when you name what each alternative removes from your plate. A managed application host runs your web or API code without you patching an operating system, configuring a web server, or wiring up scaling, which suits a standard web workload that does not need kernel access or unusual system dependencies. The cost is constraint: you accept the platform’s runtime, its limits, and its sandbox in exchange for shedding the operational load, and a workload that needs something the sandbox forbids has to climb back to a VM. A serverless function platform goes further, running individual pieces of code in response to events and scaling to zero when idle so you pay nothing between invocations, which is ideal for spiky, event-driven, short-lived work and wrong for anything long-running or latency-sensitive in a way that a cold start would harm.

A managed container service sits between these and the VM, running your containers with the platform handling the host fleet, the scaling, and much of the networking, which suits a containerized application that wants orchestration without you operating the control plane. A managed Kubernetes service offers the full orchestrator for systems that genuinely need its power, again with the platform owning the parts beneath your workloads. A batch service handles large-scale parallel compute by managing a pool of machines that run your jobs and scale with the queue, which suits embarrassingly parallel work such as rendering or simulation far better than hand-managing a fleet of VMs. The thread through all of these is that the VM is the floor of control and the ceiling of operational burden, and each managed service trades a slice of control for a larger slice of relief. The engineering judgment is to take the most relief you can without giving up control you actually need, and to drop to a VM deliberately, knowing exactly which capability forced the choice, rather than by default.

How to think about Azure Virtual Machines

The single most useful mental model is the four-decision frame. When a workload arrives, ask in order what binds it (which sets the family), what its storage performance ceiling needs to be (which sets the disk tier), what failure it must survive (which sets the availability construct), and how predictable its usage is (which sets the cost commitment). The decisions are coupled: the single-instance SLA depends on the disk tier, the achievable IOPS depends on both the disk and the size, and every commitment discount depends on having right-sized first. Working them in order, with the couplings in view, is the difference between a machine that behaves as designed and one that surprises you on cost, latency, or uptime. To run and reproduce each of these decisions on a live machine, watching the disk-tier effect on throughput and the deallocation effect on billing firsthand, you can run the hands-on Azure labs and command library on VaultBook.

The four decisions also organize everything layered on top of them. The image and the bootstrapping define what the machine contains; the identity and the metadata service define how it authenticates and discovers itself; the encryption, backup, and monitoring define how it is protected and observed; and the infrastructure-as-code definition records the whole arrangement so it can be rebuilt. None of these displaces the four decisions, because a perfectly monitored, encrypted, code-defined machine on the wrong size with the wrong disk still underperforms and overspends. The order of operations is to get the four coupled choices right first, then layer the operational practices that keep the machine secure, recoverable, and reproducible, rather than polishing the operational surface of a machine whose foundation is wrong. An engineer who internalizes that order stops treating a VM as a single thing to provision and starts treating it as a small system to design, which is the shift this entire guide is built to produce.

The verdict

An Azure Virtual Machine rewards the engineer who treats it as four coupled decisions and punishes the one who treats it as a single default. The compute family follows from the bottleneck, the disk tier sets a performance ceiling the size name never reveals, the availability construct determines which failure you survive and quietly depends on the disk tier through the SLA, and the cost commitment turns on the difference between stopped and deallocated and on right-sizing before discounting. None of these is hard in isolation. The skill, and the entire wager of reasoning over recall, is holding all four together so that none silently caps the others. Derive the machine from the workload, name the failures before they arrive, and the VM becomes the predictable, controllable foundation it is meant to be rather than a recurring source of surprise.

Everything beyond the four decisions is the discipline of operating what you have designed: bootstrapping the machine from an image rather than building it by hand, giving it a managed identity rather than a stored secret, encrypting and backing up what matters, watching the guest metrics the host cannot see, and capturing the whole arrangement as code so it survives the day the machine does not. These practices do not compete with the four decisions; they sit on top of a foundation that has to be right first, because no amount of monitoring rescues an undersized machine behind a throttled disk, and no backup policy fixes a single VM assumed to carry an availability commitment it never qualified for. The engineers who run Azure compute well are the ones who reason from the workload to the size, from the size to the disk and the construct and the commitment, and only then to the operational layer, in that order, every time. Hold that sequence and the platform stops surprising you. Reach for the default and it will keep finding new ways to.

Frequently Asked Questions

Q: What is an Azure Virtual Machine and how is it assembled?

An Azure Virtual Machine is a compute resource the platform schedules onto a physical host, surrounded by dependent resources that the platform provisions and bills separately. Those dependents are a network interface for connectivity, an OS disk and optional data disks for persistent state, an optional public address, a network security group for traffic filtering, and a placement into an availability set or zone. You interact with all of this as one machine, but the underlying graph is what matters during troubleshooting, because the symptom rarely names the resource at fault. Deleting the VM often leaves the disks and addresses behind, which is both a recovery safeguard and a frequent cause of orphaned cost. Holding the resource graph in mind, rather than the single-machine abstraction, is the first habit of running compute well on Azure.

Q: Which Azure VM series and size should I choose?

Start by identifying the workload’s binding constraint, then match the family to it. General purpose suits balanced CPU and memory workloads such as web servers and small databases. Compute optimized fits processor-bound batch and application work. Memory optimized carries the RAM that databases and caches need. Storage optimized pairs high local-disk throughput with the cores to drive it. GPU families serve accelerated work, and burstable suits mostly-idle services that spike occasionally. Within the family, pick the smallest size that meets the measured load, since oversizing wastes money on every billing dimension. Use the size-list command to see vCPU, memory, disk count, and temporary storage per size in your region, and derive the choice from the bottleneck rather than from a default or a price sort.

Q: Which managed disk type should an Azure VM use?

Choose the disk tier by the performance the workload requires, since each tier sets a hard ceiling on IOPS and throughput. Standard HDD fits backups and latency-indifferent data. Standard SSD suits light production and web servers. Premium SSD is the baseline for production databases and latency-sensitive applications. Premium SSD v2 lets you set IOPS and throughput independently of capacity, which avoids over-provisioning gigabytes just to buy speed. Ultra Disk serves the most demanding transactional and analytic systems with fully independent performance provisioning. Read the disk ceiling and the VM size ceiling together, because the lower of the two governs, and remember that the OS disk carries its own limits. The exact per-size figures should be verified against current Azure disk documentation, since they are revised over time.

Q: What SLA does a single Azure VM receive?

A single VM qualifies for the highest single-instance availability commitment only when every OS and data disk uses premium or ultra storage; with a standard disk in the mix, it no longer qualifies for that tier. The single-instance figure is commonly cited around 99.9 percent, but a lone VM has no redundancy when its host fails, so two or more VMs in an availability set reach a higher commitment, and two or more spread across availability zones reach the highest. The exact percentages live in the Azure SLA and are revised periodically, so confirm them against the current document rather than memorizing a constant. The reasoning is stable even when the numbers move: more independent failure domains buy a higher commitment, and the construct should follow from the downtime the workload can tolerate.

Q: How is an Azure VM billed and what drives the cost?

Compute is billed per second of allocation while the machine is running or merely stopped from inside the guest, disks are billed by their provisioned tier and size whether or not the VM runs, and static public addresses and outbound data transfer add their own line items. The largest cost lever is the compute rate, which is driven by the size and the commitment model: pay-as-you-go is most flexible and most expensive, a reserved one or three year commitment discounts steady workloads, and spot capacity offers the deepest discount for interruptible work. Right-sizing multiplies every commitment, because a discount on an oversized machine still overspends. The single most common avoidable cost is leaving a machine merely stopped rather than deallocated, which keeps the compute meter running.

Q: What is the difference between a stopped and a deallocated VM?

A VM stopped from inside the guest operating system stays allocated to its host and keeps incurring compute charges, because the platform is still reserving capacity for it. A VM deallocated through Azure releases the host, stops compute billing, and leaves only disk and static-address costs running. The portal labels the released state “Stopped (deallocated)” to distinguish it, and the two states look nearly identical while billing very differently. The practical consequence is that shutting a machine down from the command line inside the OS saves nothing, while issuing the deallocate command through Azure is what actually stops the compute meter. Deallocation also wipes the local temporary disk and can change a dynamic public address, so plan around those effects when scheduling machines off during idle hours.

Q: Why does my Azure VM perform poorly even with a large size?

The most common cause is a disk tier that caps IOPS or throughput below what the application generates, leaving cores idle and memory free while the disk queue backs up. A large compute size attached to a Standard SSD will stall under a database workload that a premium or ultra disk would serve comfortably. The fix is a higher disk tier or a redistribution of IO, not a larger VM. Read both ceilings together, since the VM size itself imposes an aggregate disk-performance limit independent of the disk tier, and the lower of the two governs. The OS disk is not exempt and can throttle boot, logging, and paging. Monitoring disk queue depth alongside CPU and memory makes saturation visible before it becomes a user complaint.

Q: Why does resizing an Azure VM sometimes require a restart?

A resize to a size the current physical host can serve happens in place with no disruption, but a resize to a size the host cannot serve requires the platform to deallocate the machine, move it to a host with the appropriate hardware, and start it again. The resize command is identical in both cases, so the disruption is not obvious until it happens. When the move is forced, in-memory state is lost, the local temporary disk is wiped, and a dynamic public address can change. Treat any resize that crosses hardware generations or jumps families as potentially disruptive, schedule it accordingly, and use a static address if a stable public endpoint matters. Planning the resize as a maintenance event rather than an instant change prevents the surprise outage.

Q: What causes an AllocationFailed or SkuNotAvailable error?

These errors mean the region or availability zone cannot currently satisfy the requested size, not that the configuration is wrong. Azure capacity is finite per series and per region, and a popular size in a constrained zone can be temporarily unavailable. The response is to request a different size within the same family, target a different zone, or choose another region, and to design capacity-sensitive deployments so they are not pinned to a single scarce size in a single zone. An overconstrained allocation request points at too many simultaneous constraints, such as a specific size plus a specific zone plus a proximity placement group, which together leave the allocator no valid host. Relaxing one constraint usually clears it. Building flexibility into size and zone selection is the durable prevention.

Q: How do I diagnose an Azure VM that will not boot?

Enable boot diagnostics, then read the captured console output and boot screenshot, which is often the only window into a guest that never comes up. The serial console builds on the same channel to give interactive access even when the network path is broken, letting you correct a misconfigured boot setting without rebuilding. Common causes include a corrupt boot configuration, a full OS disk, a misapplied operating system update, or a file system that needs repair. Work from the captured evidence rather than guessing, since the symptom of no boot has several distinct causes that demand different fixes. Enabling boot diagnostics before an incident, rather than after, is the difference between a fast recovery and a long one, because you cannot read history that was never captured.

Q: How do I read an Azure VM size name like Standard_D4s_v5?

The name decomposes into a family letter, a vCPU count, capability letters, and a version suffix. The family letter signals the workload class, so D marks general purpose, F compute optimized, E or M memory optimized, L storage optimized, N a GPU family, and B burstable. The number gives the vCPU count, so D4 carries four vCPUs. The lowercase letters encode capabilities that matter: s means the size supports premium storage, d means a local temporary disk is attached, and a means an AMD processor. The version suffix marks the hardware generation, with higher numbers indicating newer silicon. Reading those capability letters prevents the common mistake of choosing a size that cannot attach the premium disk a workload requires, then losing time to a greyed-out option.

Q: What is the difference between an availability set and an availability zone?

An availability set spreads VMs across fault domains and update domains inside a single datacenter, protecting against host hardware failures and planned maintenance that would otherwise take down every instance at once. An availability zone spreads VMs across physically separate datacenters within a region, each with independent power, cooling, and networking, protecting against the loss of an entire building. The set defends against rack and host events in one location; the zone defends against a location-level failure. They answer different questions, and the higher availability commitment comes from the zone construct because the failure domains are fully independent. Choose the set when in-datacenter redundancy suffices and the zone when you need to survive a datacenter-scale event, and remember that multi-instance constructs require more than one VM to deliver their benefit.

Q: Do I need premium disks just for the SLA?

For the highest single-instance availability commitment, yes: that tier requires every OS and data disk on the VM to be premium or ultra, and a single standard disk disqualifies the machine from it. This couples the disk decision to the availability decision in a way that surprises people who chose a standard disk to save money and then found their single VM no longer qualified for the commitment they assumed. If you are running multiple VMs in an availability set or across zones, the multi-instance commitment does not carry the same single-instance disk requirement, so the premium-disk obligation is specific to standing on the single-VM SLA. Decide the availability construct and the disk tier together rather than in isolation, since each constrains the other, and verify the current SLA wording before designing to it.

Q: How do Azure VM extensions and the VM agent work?

The VM agent is a lightweight process running inside the guest that lets the Azure platform manage the machine after deployment, principally by installing and running extensions. Extensions are small managed components that perform a task: a custom script extension bootstraps the machine on first boot, a desired-state configuration keeps it converged to a defined state, and a monitoring extension ships metrics and logs to a workspace. Extensions are how a VM becomes a reproducible artifact rather than a hand-built machine nobody can recreate. A machine whose agent is unhealthy silently loses the ability to receive extensions, which can block patching, monitoring, and automation without an obvious error, so the agent’s health is worth monitoring directly. Keeping configuration in extensions rather than manual steps is what makes a fleet maintainable.

Q: Can I change a managed disk’s tier without recreating the VM?

In many cases yes, you can adjust a managed disk’s tier or, on the v2 and ultra tiers, its provisioned IOPS and throughput as a live operation, which lets you raise performance for a workload that has outgrown its disk without rebuilding the machine. The disk update command targets the disk directly and changes the SKU or the provisioned performance values. Some changes between certain tiers or detach-and-reattach scenarios can require the disk to be unattached or the VM to be stopped, so confirm the specific transition against current documentation before assuming it is fully online. The ability to dial performance on the v2 and ultra tiers without touching capacity is one reason those tiers suit workloads whose throughput needs change over time, since you adjust the number rather than the machine.

Q: When should I use a VM instead of a managed Azure service?

Choose a VM when you need full control of the operating system, when you run software that expects a conventional server, when you are migrating an existing machine with minimal change, or when licensing or compliance requires a dedicated guest. A VM is the most flexible compute Azure offers and the most operational work, because you own the patching, configuration, and scaling. Reach for a managed service instead when it would carry that burden for you: a stateless web application often belongs on a managed application host, an event-driven workload on a serverless platform that scales to zero, and a containerized system at scale on a managed orchestrator. The mature default is the most managed option that still meets the requirement, dropping to a VM only when the control it provides is genuinely necessary rather than merely familiar.

Q: How do I reduce Azure VM costs without hurting reliability?

Begin with the billing distinction that costs the most for the least reason: deallocate machines you are not using rather than merely stopping them, and automate deallocation for development and test machines that run only during working hours. Right-size next, choosing the smallest size in the correct family that meets the measured load, since oversizing inflates every cost dimension and a discount on an oversized machine still overspends. Then match the commitment to the usage pattern: a reserved one or three year commitment discounts steady workloads, while interruptible spot capacity offers the deepest discount for fault-tolerant, restartable work that can survive eviction. None of these moves reduces reliability when chosen correctly, because deallocation, right-sizing, and reservations do not change availability, and spot is reserved for work designed to tolerate interruption.

Q: Why does my SSH or RDP connection to an Azure VM get refused?

A refused SSH connection or an RDP error almost always traces to one of three separable causes: a network security group rule blocking the port, a missing or misconfigured public address, or a guest service that did not start. Diagnose them independently rather than assuming one. Confirm the security group permits the port from your source, confirm the machine has a reachable address, and confirm the SSH daemon or remote desktop service is running inside the guest, using the serial console if the network path itself is the problem. Treating the connection failure as a single problem leads to flailing, while checking the network path and the guest service as distinct layers isolates the fault quickly. Keeping security group rules minimal and documented makes this diagnosis fast every time it recurs.

Q: What happens to my data when I deallocate or delete an Azure VM?

Deallocating a VM preserves the OS disk and any data disks, since those are independent managed resources billed separately from compute; only the local temporary disk is wiped, because it is ephemeral by design. Your persistent state survives a deallocation and returns when you start the machine again. Deleting a VM removes the compute resource but, by default, frequently leaves the disks, network interface, and public address behind as orphaned resources that continue to bill until you remove them explicitly. This is both a safety feature, because it prevents accidental data loss, and a common source of unexpected cost, because forgotten disks accumulate. Never store anything you need to keep on the temporary disk, and audit for orphaned disks after deleting machines so that deletion actually reduces your bill rather than leaving a quiet residue.

Q: When should I use hibernation on an Azure VM?

Use hibernation when a machine has an expensive startup worth preserving across an off period, such as a development machine with a heavy toolchain loaded or an application that warms a large in-memory cache. Hibernation writes the memory contents to the OS disk and deallocates the machine, stopping compute billing, then restores that memory on resume so the machine continues where it left off rather than booting fresh. The trade-offs are that the size and operating system must support it, the memory image consumes OS disk space, and resuming is not instantaneous. Skip hibernation for a stateless tier that boots clean in moments, since there is no warmed state worth preserving and a fresh boot is simpler. The decision turns on whether the warmed state is worth more than the disk space and the constraints, which favors stateful development and cache-heavy machines and disfavors fungible, stateless ones.