Azure Kubernetes Service (AKS) Explained

Azure Kubernetes Service sits in an awkward middle ground that trips up almost everyone who adopts it. The marketing line says it is managed Kubernetes, and engineers reasonably hear that as “Azure runs the cluster for me.” Then a pod crashes, an upgrade stalls, the virtual network runs out of addresses, and the on-call engineer opens a support ticket expecting Microsoft to fix something that was never theirs to fix. The gap between using AKS and understanding AKS is precisely the gap between what the service manages and what you still operate, and that line is rarely drawn clearly anywhere. This guide draws it explicitly, because the single most useful thing you can know about the platform is which failures land on Microsoft’s side and which land on yours.

Azure Kubernetes Service managed boundary and node pool architecture - Insight Crunch

The reader who finishes this piece should be able to design a cluster’s node pools and network plugin on purpose rather than by accepting a portal default, predict the failure modes before they arrive, and reason about the tier, the upgrade cadence, and the address plan well enough to defend each choice in a design review. None of that comes from a single documentation page, because the docs treat each topic in isolation. The size of a node, the type of its disk, the plugin that assigns addresses, the tier that backs the API server, and the cadence that keeps the whole thing patched are one connected reasoning chain, and the value here is connecting them.

What AKS Actually Is and the Mental Model to Hold

Kubernetes is an orchestrator. You hand it containers and a description of how they should run, and it schedules those containers onto machines, restarts them when they die, replaces machines that fail, and exposes the running workloads to the network. Running Kubernetes yourself means running the orchestrator’s brain, the API server and its backing datastore and the schedulers and controllers that make decisions, plus the fleet of worker machines that execute the work. The brain is fiddly to operate, demands quorum, and punishes mistakes with cluster-wide outages.

Azure Kubernetes Service takes the brain off your hands. The orchestrator’s control components run inside a Microsoft-operated environment that you never log into and never patch. What remains yours is the fleet of worker machines, the addressing scheme they live inside, the identity model that governs who can talk to the API, and the rhythm of keeping versions current. The mental model to hold is two halves of one machine: a managed head you cannot see, bolted to a worker fleet you fully own and pay for. Most of the confusion about the platform dissolves once that picture is firm.

What does AKS actually manage?

AKS manages the Kubernetes control plane: the API server, the etcd datastore behind it, the scheduler, and the built-in controllers. You do not patch those components, size them, or back them up. Everything below the control plane, meaning the worker machines, the network model, identity wiring, and upgrades, remains your responsibility to design and operate.

That division has a direct financial consequence worth stating early. The control plane is offered free on the entry tier, so the bill you receive is overwhelmingly for the worker fleet and the resources it consumes: the virtual machines, their disks, the load balancers fronting your services, and the logs and metrics you collect. People who expect a large “Kubernetes charge” are surprised to find the orchestrator brain costs little or nothing while the machines underneath dominate the invoice. The pricing model mirrors the responsibility model. You pay for what you operate.

How AKS Works Internally at the Level You Need

A running AKS cluster has two planes. The control plane is the decision-making layer: it accepts your declarations of desired state through the API server, records them in etcd, and runs control loops that continuously compare desired state against observed state and act to close the difference. When you ask for five replicas of a service and only four are running, a controller notices the shortfall and the scheduler places the fifth. This loop is the heart of how Kubernetes self-heals, and it runs entirely inside Microsoft’s managed environment.

The data plane is where your code actually executes. It is made of worker machines, called nodes, each running a kubelet agent that talks to the API server, a container runtime that pulls and runs images, and a network proxy that programs the routing rules so traffic reaches the right place. Your containers run inside pods, and pods run on nodes. The scheduler in the control plane decides which node a new pod lands on, but the kubelet on that node is what brings the pod to life and reports its health back. This separation matters enormously for diagnosis, a point worth returning to later: when something breaks, the first question is always which plane owns the failure.

The connection between the two planes runs over the network. Your kubectl commands hit the API server endpoint, which by default is a public address with authentication and authorization in front of it, and the kubelets on your nodes also reach the API server to receive their instructions and stream back status. You can make that API endpoint private so it is reachable only from inside your network, a choice covered in depth in the complete guide to securing AKS clusters, but the topology is the same either way: a managed head talking to an owned body across a network you control.

Why does the control plane versus data plane split matter for debugging?

Because the two planes fail for different reasons and have different owners. A control plane problem is rare and largely Microsoft’s to resolve. A data plane problem, a crashing container, an unschedulable pod, a node that went unhealthy, is almost always yours, rooted in your image, your resource requests, or your capacity. Knowing which plane broke tells you immediately where to look and who can fix it.

The Managed Boundary: A Responsibility Map

The whole argument of this guide reduces to one table. Print it, pin it above your desk, and consult it during the next incident. The claim it encodes is the most important thing to remember about the platform: the control plane is managed, but the worker fleet, the network model, identity wiring, and the upgrade cadence are not, and the large majority of real incidents live entirely on your side of that line.

Layer	Microsoft-managed	Customer-owned
API server, scheduler, controllers	Provisioned, patched, scaled, kept available per the chosen tier	Nothing
etcd datastore	Hosted, backed up, kept consistent	Nothing
Node operating system image	Image built and published by AKS	Choosing when to apply it through node-image upgrades
Worker virtual machines (node pools)	Nothing	Series, size, count, disk, scaling, health
Pod scheduling decisions	The scheduler logic	Requests, limits, affinities, taints that shape decisions
Cluster networking model	The plugin code	Choosing kubenet or Azure CNI and sizing the address space
Ingress and load balancing	The provisioning integration	Installing and configuring the ingress controller and rules
Identity and access	The Entra integration plumbing	Granting roles, binding service accounts, scoping access
Kubernetes version	Versions made available and the upgrade mechanics	Deciding when to upgrade and staying inside the support window
Application workloads	Nothing	Everything: images, config, secrets, health probes

Read down the right column and a pattern jumps out. Almost every operational decision that determines whether your cluster is reliable lives there. The left column is real and valuable, removing the hardest part of running Kubernetes, but it is also narrow. The phrase “AKS handles it” is true only for the control plane, and treating it as true for the rest is the single most common conceptual error new operators make. The autoscaler being misconfigured, the address space being too small, the upgrade being skipped until the version fell out of support: each of those is a right-column failure wearing a left-column assumption.

Node Pools: System, User, and the Scale Set Underneath

A node pool is a group of identical worker machines that share a virtual machine size, an operating system, and a set of configuration knobs. Under the surface a node pool is an Azure virtual machine scale set, the same primitive that powers ordinary autoscaling fleets elsewhere on the platform. That single fact explains a great deal of node behavior. When the cluster adds capacity it asks the scale set to grow, when it removes capacity it asks the scale set to shrink, and when a machine misbehaves the remedy is often to reimage or replace the underlying scale set instance. Nodes are not pets and never were. They are cattle in a scale set, and the orchestrator treats them accordingly.

AKS distinguishes two kinds of node pool, and the distinction is operational rather than cosmetic. A system node pool hosts the critical add-on pods that keep the cluster functioning: the DNS service that lets pods resolve each other, the metrics pipeline, and the various managed components AKS injects. Every cluster needs at least one system node pool, and that pool should be protected from being starved by application workloads. A user node pool hosts your applications. Separating the two means a runaway deployment cannot evict the DNS pods and take the whole cluster’s name resolution down with it. The recommended shape for anything beyond a toy is a small, dedicated system pool running only the system workloads, with one or more user pools sized for the applications.

Splitting workloads across multiple user pools is one of the most underused design levers on the platform. A pool can have a different machine size, so a memory-hungry workload lands on memory-optimized machines while a general workload runs on cheaper general-purpose ones. A pool can run on spot instances, trading guaranteed availability for a steep discount on interruptible work. A pool can be tainted so that only pods that explicitly tolerate the taint schedule onto it, which is how you reserve expensive hardware, a GPU pool for instance, for the workloads that actually need it. The scheduling mechanics that make this work, requests and limits, node selectors, affinities, and tolerations, are the same Kubernetes primitives everywhere, but on AKS they map cleanly onto pool boundaries.

What is the difference between a system and a user node pool?

A system node pool runs the cluster’s critical add-on pods such as CoreDNS and the metrics server, and every cluster requires at least one. A user node pool runs your application workloads. Keeping system components on a dedicated, protected pool stops an application surge from evicting the components the cluster itself depends on to stay healthy.

Sizing a node pool is a reasoning exercise, not a default to accept. The machine size sets the ceiling for how many pods can pack onto one node and how much each can request, and it also interacts with the networking model in a way the next section makes concrete. Picking a size means estimating the resource requests of the pods you intend to run, leaving headroom for the kubelet and operating system overhead that every node reserves, and then choosing a size that packs efficiently without leaving large slivers of stranded capacity. The same reasoning that goes into choosing a virtual machine series for a standalone workload applies here, and the complete guide to Azure virtual machines covers how to read a size name and match it to a workload profile, which transfers directly to node-pool sizing.

The Networking Model: kubenet, Azure CNI, and the Address Planning Trap

Networking is where AKS clusters most often go wrong in a way that is expensive to fix later, because the foundational choice is made at creation time and is painful to change once workloads are running. The choice is which network plugin assigns addresses to your pods, and the two historical options behave so differently that picking the wrong one can force a cluster rebuild.

With kubenet, nodes receive an address from your virtual network subnet, but pods do not. Pods live in a separate, logically internal address range, and traffic leaving a pod is translated to the node’s address through network address translation. The appeal is frugality: you only consume subnet addresses for nodes, not for every pod, so a small subnet supports a large cluster. The cost is that pods are not first-class citizens of your virtual network. They cannot be addressed directly from outside, certain integrations that expect a routable pod address do not work, and the extra translation hop adds complexity to troubleshooting. It is worth knowing that kubenet is on a retirement path; Microsoft has announced that kubenet networking for AKS will be retired, with migration to Azure CNI Overlay as the recommended successor, so new clusters should not be built on it. Verify the exact retirement date against the current official Azure announcement before relying on it, as such dates shift.

With Azure CNI, every pod receives a real address from your virtual network subnet, the same address space the nodes use. Pods become routable network citizens, reachable and addressable like any other resource, which makes integrations clean and troubleshooting straightforward. The cost is address consumption, and this is the trap. Because each node reserves addresses for the maximum number of pods it might ever run, the subnet must be large enough to hold every node plus every pod those nodes could host, with room for the surge of extra addresses consumed transiently during an upgrade. Undersize the subnet and the cluster runs out of addresses, new pods fail to schedule, and because you cannot resize the subnet under a running cluster cleanly, the remedy is frequently to rebuild in a larger subnet. This is the single most common self-inflicted AKS wound, and it is entirely preventable with arithmetic done up front.

Azure CNI Overlay is the modern reconciliation of the two. Nodes take addresses from the subnet, while pods take addresses from a separate overlay space that does not consume subnet addresses, combining CNI’s clean model with kubenet’s frugal address usage. For most new clusters it is the recommended starting point. There is also a Cilium-based data plane option layered on top for higher-performance networking and policy enforcement. The landscape evolves, so confirm the currently recommended default and the supported plugin variants against the official networking documentation before you commit.

Should I use kubenet or Azure CNI?

For a new cluster, choose Azure CNI Overlay. It gives pods their own address space without draining your subnet, sidestepping the address-exhaustion trap of traditional Azure CNI while avoiding the translation overhead and the retirement timeline of kubenet. Reserve traditional Azure CNI for cases that specifically need pods to hold routable subnet addresses.

The address arithmetic deserves a worked example, because the formula is where intuition fails. Under traditional Azure CNI, the addresses a node pool reserves equal the node count multiplied by the maximum pods per node, plus the node addresses themselves, plus a margin for the upgrade surge. If you run a pool that can scale to twenty nodes at a maximum of thirty pods each, that pool alone wants roughly twenty times thirty-one addresses before the upgrade margin, which already exceeds a small subnet. The default maximum pods per node differs by plugin, commonly cited around thirty for traditional Azure CNI and one hundred ten for kubenet, with higher defaults under the overlay model; treat these defaults as values to verify against current documentation rather than constants, because they have changed across releases. The discipline is simple to state and easy to skip: size the subnet for the maximum the cluster can ever grow to, not the size it launches at.

Identity: Managed Entra Integration and Azure RBAC for Kubernetes

Authentication and authorization in a Kubernetes cluster have two distinct layers, and conflating them is a frequent source of access confusion. The first layer answers who you are, and the second answers what you may do. On a vanilla cluster both layers are configured by hand with certificates and role bindings. On AKS the first layer can be delegated to Microsoft Entra ID through the managed integration, so the people and workloads that authenticate to your cluster present Entra identities rather than static cluster credentials. This is a meaningful security win: access can be governed by the same directory, conditional-access policies, and group memberships that govern the rest of your estate, and revoking a person’s directory access revokes their cluster access too.

The authorization layer then has a choice of its own. You can authorize with native Kubernetes role-based access control, where roles and bindings live as objects inside the cluster, or you can authorize with Azure RBAC for Kubernetes, where the cluster defers authorization decisions to Azure’s own role assignments. The native path keeps authorization portable and familiar to Kubernetes practitioners. The Azure path centralizes authorization alongside every other Azure permission, so a single role assignment governs both the management of the cluster resource and actions inside it. Neither is universally correct. Teams deep in Kubernetes tooling often prefer native role-based access control for its portability, while teams standardizing on Azure governance prefer the centralized model. The hardening trade-offs, private API endpoints, network policy, and workload identity for pods that need to reach other Azure services, are treated thoroughly in the dedicated securing AKS clusters guide, and they build directly on the identity foundation described here.

The important conceptual point is that identity is firmly on the customer-owned side of the responsibility map. Microsoft provides the integration plumbing, the ability to wire Entra in and to defer to Azure RBAC, but the role assignments, the group design, the decision about whether the API server is public or private, and the workload-identity bindings that let a pod authenticate to a database without a stored secret are all yours to design. A cluster with a wide-open public API endpoint and cluster-admin handed out broadly is not insecure because AKS failed; it is insecure because the operator made those choices.

Tiers, Limits, and Quotas That Shape Design

AKS offers tiers for cluster management, and the difference between them is essentially the strength of the guarantee behind the control plane plus the scale ceiling. The Free tier provides a managed control plane at no charge with no financially backed availability guarantee. Microsoft still targets high availability internally on the free tier, but there is no service level agreement to claim against, and the free control plane is provisioned with limited resources that suit development, experimentation, and small clusters rather than production scale. The guidance has consistently been to keep free-tier clusters small, on the order of ten nodes or fewer, because the control plane is not provisioned to absorb the request load of a large fleet.

The Standard tier adds a financially backed uptime guarantee for the API server endpoint, commonly stated as 99.95 percent for clusters spread across availability zones and 99.9 percent for clusters that are not, in exchange for a fixed per-cluster hourly charge. Beyond the guarantee itself, the Standard tier provisions a more robust control plane that scales with load and supports far larger clusters, into the thousands of nodes. The Premium tier builds on Standard by adding long-term support, extending the window during which an older Kubernetes version remains supported so that teams with slow upgrade cycles or strict change-control regimes get more runway before a forced upgrade. Each tier’s exact price, the precise availability percentages, and the node-count ceilings are the kind of numbers that change, so flag every one of them for verification against the current official pricing and service level pages before you quote them in a design document.

Does the AKS control plane have an SLA?

Only on the paid tiers. The Free tier offers a managed control plane with an internal availability target but no financially backed service level agreement. The Standard and Premium tiers include an uptime guarantee for the Kubernetes API server, typically 99.95 percent across availability zones and 99.9 percent without them, in return for a per-cluster management fee. Production clusters should run on a paid tier.

The tier choice has a design consequence that is easy to miss: the availability guarantee covers the API server endpoint, not your application. A 99.95 percent API server does nothing for an application running on a single node in a single zone. Real application availability comes from spreading nodes across availability zones, running enough replicas to survive a node loss, and configuring disruption budgets so that voluntary operations like upgrades do not take down more replicas than the application can spare. The tier protects the brain; you protect the body. Quotas matter here too, because every node is a virtual machine consuming regional compute quota, and a cluster that wants to autoscale into a region without sufficient quota will simply fail to add nodes, presenting as pods stuck waiting for capacity that never arrives.

The Cluster Autoscaler and Node Sizing

Scaling an AKS cluster happens at two levels that people routinely conflate. Pod-level scaling adds or removes copies of your workload in response to load, and node-level scaling adds or removes worker machines so there is somewhere for those pods to run. The cluster autoscaler is the node-level mechanism. It watches for pods that cannot be scheduled because no node has room, and when it sees them it asks the underlying scale set to add nodes; conversely, when nodes sit underused for long enough and their pods can be consolidated elsewhere, it drains and removes them. The autoscaler is configured per node pool with a minimum and maximum count, and those bounds are a genuine design decision, not a formality. A maximum set too low caps the cluster’s ability to absorb a surge, and pods pile up waiting; a minimum set too high keeps idle machines running and inflates the bill.

The interaction between pod-level and node-level scaling is where reasoning pays off. If your workload scales its pod count on a metric like processor usage, those new pods need somewhere to land, and only the cluster autoscaler can provide it by growing the node pool. The two must be tuned together: pod scaling that outruns the node pool’s maximum produces pending pods, while a node pool that scales eagerly without pod-level demand produces waste. The full treatment of how horizontal pod scaling, vertical pod scaling, and the cluster autoscaler interlock, including event-driven scaling for workloads that respond to queue depth rather than processor load, is the subject of the dedicated guide to AKS autoscaling with HPA, VPA, and the cluster autoscaler, which extends the node-sizing reasoning introduced here into a complete scaling strategy.

Node sizing and autoscaling are not independent choices. A cluster of many small nodes scales in fine-grained steps and packs small pods efficiently, but pays more overhead per node because every node reserves a slice of its capacity for the operating system and kubelet, and it consumes more addresses under CNI. A cluster of fewer large nodes amortizes that overhead and consumes fewer addresses, but scales in coarse jumps and can strand capacity when a large node holds only a few pods. The right answer depends on the pod size distribution of the workload, and the discipline is to choose a node size that lets the autoscaler add capacity in increments that match how the workload actually grows.

The Upgrade and Node-Image Model

Upgrades are the responsibility most often neglected until it becomes urgent, and the neglect is understandable: a cluster that works today gives no daily signal that it is drifting toward an unsupported version. Kubernetes moves quickly, and AKS supports only a window of recent minor versions. Fall behind that window and you lose support, miss security patches, and eventually face a forced upgrade on Microsoft’s timetable rather than your own. The managed label does not exempt you here. AKS makes versions available and provides the upgrade mechanics, but deciding when to upgrade and staying inside the support window are squarely customer-owned, sitting on the right column of the responsibility map.

There are two distinct things to keep current, and conflating them causes confusion. The Kubernetes version is the orchestrator version itself, the API version your manifests target and the behavior the control plane exhibits. The node image is the operating system image on the worker machines, which receives security patches and component updates independently of the Kubernetes version. You can refresh node images frequently to absorb security fixes without changing the Kubernetes version at all, and you should, because an unpatched node operating system is a standing exposure even on a current Kubernetes version. Treating the two as one thing leads teams either to upgrade Kubernetes more often than necessary chasing patches, or to leave node images stale because they were not aware the two move separately.

The upgrade itself proceeds node by node to preserve availability, and understanding the mechanics explains the address surge mentioned earlier. AKS adds a fresh node running the new version, drains an old node by evicting its pods so they reschedule onto available capacity, removes the drained node, and repeats. During that dance the cluster transiently runs extra nodes, which is why a CNI subnet sized to the exact steady-state node count will exhaust addresses mid-upgrade and stall. The surge is configurable, and a larger surge upgrades faster at the cost of more transient capacity. The eviction step also explains why disruption budgets matter: if an application declares that it must always keep a minimum number of replicas available, the drain will respect that and proceed carefully, whereas an application with no such declaration can have all its replicas evicted at once and suffer a blip. The upgrade exercises every resilience setting you did or did not configure, which is why the first real upgrade is so often where latent fragility surfaces.

# Check the versions currently available for upgrade in a region
az aks get-upgrades --resource-group myRG --name myAKS --output table

# Upgrade the cluster's Kubernetes version
az aks upgrade --resource-group myRG --name myAKS --kubernetes-version 1.29.2

# Refresh only the node image without changing the Kubernetes version
az aks nodepool upgrade --resource-group myRG --cluster-name myAKS \
  --name userpool --node-image-only

# Configure the upgrade surge for a node pool (how many extra nodes during upgrade)
az aks nodepool update --resource-group myRG --cluster-name myAKS \
  --name userpool --max-surge 33%

How do AKS cluster and node upgrades work?

AKS upgrades roll node by node: a new node on the target version joins, an old node is drained and removed, and the cycle repeats until the pool is current. The Kubernetes version and the node operating-system image upgrade separately, so you can patch node images for security without changing the orchestrator version, and you stay supported by remaining inside the recent-version window.

Failure Isolation: Pod, Node, and Control Plane

The responsibility map pays its largest dividend during an incident, because the first diagnostic move on any AKS problem is to locate the failure on the pod, node, or control plane axis. Each level fails for different reasons, surfaces different signals, and points to a different fix, and misreading the level sends engineers chasing the wrong layer for hours.

A pod-level failure is by far the most common, and it is entirely yours. The pod is crashing, restarting, failing to pull its image, or being killed for exceeding its memory limit. The signal lives in the pod’s status, its events, and its container logs. A container that starts and immediately exits enters a restart loop that Kubernetes reports as a crash backoff, and the cause is in your code, your configuration, or a failing health probe, not in the platform. The complete root-cause walkthrough for that specific failure lives in the guide to fixing AKS CrashLoopBackOff, which catalogs the distinct causes and the command that confirms each. When the failure is instead a pod that never starts because nothing will schedule it, the signal is a Pending status with a scheduling event explaining why, and the causes, insufficient capacity, an unsatisfiable affinity, a taint with no matching toleration, an unbound storage claim, are walked through in the companion guide to fixing AKS pods stuck in Pending.

A node-level failure is less common and partly yours. A node that reports NotReady has a sick kubelet, a network partition, exhausted disk, or memory pressure that is killing pods to stay alive. The signal lives in the node’s status and conditions. Some node failures resolve by reimaging or replacing the scale set instance, since the node is cattle; others reveal a capacity or configuration problem you must address, such as a disk that keeps filling because of unrotated logs. The node sits at the boundary of the responsibility map: the image came from Microsoft, but the sizing, the disk, and the workload density that pushed it into pressure came from you.

A control plane failure is rare and largely Microsoft’s. If the API server is genuinely unreachable, kubectl times out against the endpoint, and after ruling out your own network path to it, the issue may be on the managed side and the appropriate move is a support case backed by the tier’s service level agreement. The crucial discipline is to reach this conclusion last, not first. The overwhelming majority of “the cluster is broken” reports are pod-level failures misdiagnosed as control plane outages, and the responsibility map exists precisely to redirect that first instinct toward the layer that actually broke.

The Configuration That Matters: Standing Up a Cluster on Purpose

Bringing the threads together, here is what creating a cluster deliberately rather than by default looks like, with the choices that the responsibility map says are yours made explicitly rather than accepted. The command below is illustrative; flag the specific version and size for verification against current availability before running it.

# Create a resource group
az group create --name myRG --location eastus

# Create a cluster with deliberate choices:
# - Standard tier for a financially backed control-plane SLA
# - Azure CNI Overlay so pods do not drain the subnet
# - A small dedicated system node pool
# - Availability zones for node spread
# - Managed Entra integration with Azure RBAC for Kubernetes
# - The cluster autoscaler bounded with a sensible min and max
az aks create \
  --resource-group myRG \
  --name myAKS \
  --tier standard \
  --network-plugin azure \
  --network-plugin-mode overlay \
  --node-count 2 \
  --node-vm-size Standard_D4s_v5 \
  --zones 1 2 3 \
  --enable-aad \
  --enable-azure-rbac \
  --enable-cluster-autoscaler \
  --min-count 2 \
  --max-count 6 \
  --generate-ssh-keys

# Add a dedicated user node pool for applications
az aks nodepool add \
  --resource-group myRG \
  --cluster-name myAKS \
  --name userpool \
  --mode User \
  --node-vm-size Standard_D8s_v5 \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 10 \
  --zones 1 2 3

# Pull credentials so kubectl can talk to the cluster
az aks get-credentials --resource-group myRG --name myAKS

# Confirm the nodes are present and Ready
kubectl get nodes -o wide

Every flag in that command corresponds to a decision the responsibility map assigned to you. The tier sets the guarantee behind the control plane. The plugin and mode decide whether pods will exhaust your subnet. The zones decide whether a single data-center fault can take the whole cluster down. The Entra and Azure RBAC flags decide how identity is governed. The autoscaler bounds decide how the cluster absorbs load and how much idle capacity it carries. A cluster created with the bare minimum flags works in a demo and fails to meet the first production requirement, because every one of these decisions silently took a default that was chosen for convenience rather than for your workload.

The Add-On Ecosystem and What It Quietly Owns

A bare cluster is rarely what teams run, because AKS ships a catalog of managed add-ons that bolt capabilities onto the cluster: monitoring that streams container metrics and logs into a workspace, a managed ingress option, key management integration that mounts secrets from a vault into pods, policy enforcement that blocks non-compliant workloads at admission, and a service mesh add-on for traffic management between services. These add-ons blur the responsibility line in a useful way, because Microsoft takes over more of the operation of each, but they do not erase it. You still decide which add-ons to enable, you still configure their policies, and you still pay for the resources they consume, the log ingestion bill being a frequent surprise for teams that enabled verbose monitoring without considering volume.

The mental discipline with add-ons is to treat each as a deliberate inclusion rather than a free default. Monitoring that captures everything produces a log bill that can rival the compute bill. A policy add-on that blocks workloads lacking resource limits is excellent governance, but it will reject deployments that worked yesterday, so it must be rolled out with awareness. The ingress add-on simplifies getting traffic into the cluster, though many teams still prefer to install and manage their own ingress controller for the control it affords, a path covered end to end in the guide to configuring AKS ingress with NGINX and TLS. The pattern repeats: managed convenience on one side, configuration and cost ownership on the other.

When to Use AKS and When to Reach for Something Simpler

AKS is powerful, and power has a price in operational surface area. The honest question before adopting it is whether the workload needs Kubernetes at all, because a large fraction of the teams running clusters would be better served by a simpler container platform that hides the orchestrator entirely. The deciding factor is not the size of the application but the shape of the operational requirements.

Reach for AKS when you need the full Kubernetes feature set: fine-grained scheduling control, a rich ecosystem of operators and controllers, multi-container pods with sidecars, advanced traffic management, portability across clouds because your manifests are standard Kubernetes, or simply because your team already has deep Kubernetes expertise and tooling that you want to reuse. AKS is also the right call when you are running many heterogeneous services that benefit from sharing a cluster’s capacity and from the bin-packing efficiency that a scheduler provides.

Reach for something simpler when you are running a handful of stateless services that scale on processor or request load, when nobody on the team wants to operate node pools and upgrades, and when the Kubernetes feature set is more than the workload needs. A serverless container platform that scales to zero and abstracts the nodes away removes the entire right column of the responsibility map from your plate, at the cost of the control that column represents. The trade-off is real and bidirectional: you give up scheduling control, certain networking models, and ecosystem breadth in exchange for never sizing a node or running an upgrade again. For event-driven and bursty workloads especially, the simpler platform often wins decisively, and choosing AKS for them means signing up to operate a fleet you did not need.

When should I choose AKS over a simpler container platform?

Choose AKS when you genuinely need Kubernetes: fine-grained scheduling, sidecars, operators, multi-cloud portability, or existing team expertise and tooling. Choose a simpler serverless container platform when you run a few stateless services, want the nodes abstracted away, and do not need the full feature set, because that path removes node pools and upgrades from your responsibilities entirely.

The strongest version of this reasoning treats the responsibility map as a cost rather than a feature list. Everything in the customer-owned column is operational work you are agreeing to perform forever. If the workload’s requirements justify that work, AKS is an excellent home for it. If they do not, the same workload on a simpler platform frees the team to spend its attention on the application instead of the infrastructure. The mistake is adopting Kubernetes because it is the prestigious choice rather than because the workload’s requirements name it as the deciding factor.

The Single Best Way to Think About AKS

If you keep one idea from this guide, keep the managed boundary. AKS is a managed control plane bolted to an unmanaged worker fleet, network model, identity wiring, and upgrade cadence. The control plane being managed is genuine and valuable, removing the hardest and most dangerous part of running Kubernetes yourself, but its scope is narrow. Every reliability decision that actually determines whether your cluster stays healthy in production lives on your side of that boundary: how you size and spread the nodes, how you plan the address space, how you govern identity, how you bound the autoscaler, and how diligently you upgrade. The platform hands you a working brain and asks you to build and operate the body responsibly.

This framing turns vague anxiety into a checklist. Instead of worrying that the cluster might break in some unknowable way, you can walk the right column of the responsibility map and ask, for each row, whether you have made the decision deliberately. Did you choose the tier for the guarantee the workload needs, or accept the free default into production? Did you size the subnet for the cluster’s maximum growth, or for its launch size? Did you separate system and user workloads so an application surge cannot evict the cluster’s own components? Did you bound the autoscaler so it can absorb a surge without running idle machines? Are you upgrading inside the support window, and refreshing node images for security between version upgrades? A cluster that can answer yes to those questions is a cluster operated on purpose, and that is the entire difference between using AKS and understanding it.

Strategic Verdict

Azure Kubernetes Service is the right platform for teams that need Kubernetes and are prepared to own the worker fleet, the network, identity, and upgrades that the managed control plane does not cover. Its great strength is removing the most failure-prone part of running the orchestrator while leaving you in full control of everything that shapes your workload’s behavior. Its great risk is the mismatch between the word “managed” and the reality of the responsibility map, a mismatch that leads teams to under-invest in the operational work that the platform quietly assigned them, and then to be surprised when the consequences arrive as address exhaustion, unsupported versions, or an application that the control plane’s service level agreement was never going to protect.

Adopt it deliberately. Decide whether the workload’s requirements actually name Kubernetes as the answer, and if they do not, prefer a simpler platform that takes the right column off your hands. If they do, then design the cluster against the responsibility map rather than against a portal default: a paid tier for production, an address plan sized for maximum growth, a separated system pool, zone spread, bounded autoscaling, governed identity, and a disciplined upgrade rhythm. The teams that treat the customer-owned column as the real work of running AKS are the ones whose clusters are quiet, and the teams that treat “managed” as a promise that the work is done are the ones who learn the responsibility map one incident at a time.

To put the model into practice on a real cluster, stand one up and inspect each layer for yourself: create a cluster, list its node pools, examine the network model it provisioned, and watch the autoscaler add and remove nodes under load. You can run the hands-on Azure labs and command library on VaultBook to reproduce every command in this guide in a sandbox, walk the node-pool and networking choices interactively, and confirm the responsibility map by breaking and fixing each layer in a place where breaking things is the point.

Getting Traffic In: Services, the Load Balancer, and Ingress

A workload nobody can reach is not much use, and how traffic enters an AKS cluster is one of the areas where the managed integration and your own configuration meet. Inside the cluster, a Kubernetes Service gives a stable address and name to a set of pods that come and go, so callers do not chase individual pod addresses. The interesting part is how a Service that needs to be reachable from outside the cluster gets a public entry point, and this is where AKS reaches back into Azure on your behalf.

When you declare a Service of the load-balancer type, AKS responds by provisioning an Azure load balancer and wiring a public address through to the pods behind the Service. The integration is genuinely managed: you describe the intent in Kubernetes terms and the platform creates and configures the Azure resource. What remains yours is the decision of how many such entry points to create, because a separate load-balanced Service for every microservice is both wasteful and operationally noisy. The common and better pattern is a single ingress controller, one load-balanced entry point that terminates traffic and routes it to many internal Services by hostname and path, so one public address fronts the whole application. The controller itself, the routing rules, and the certificates that secure the traffic are yours to install and maintain, which is exactly the work covered in the guide to configuring AKS ingress with NGINX and TLS.

The networking model from earlier shapes what is possible here. Under a model where pods hold routable addresses, the path from the load balancer to the pod is direct, while under address-translated models there is an extra hop. The outbound direction matters too: pods reaching out to the internet do so through an outbound path that AKS configures, and at scale the number of simultaneous outbound connections can exhaust the translation ports available, a failure that presents as intermittent connection problems to external services under load. That source-translation exhaustion is a classic right-column surprise, invisible at low traffic and disruptive at high traffic, and it is solved by provisioning the outbound path deliberately rather than accepting the default for a high-connection workload.

How does a Service get a public IP on AKS?

You declare a Kubernetes Service of the load-balancer type, and AKS provisions an Azure load balancer with a public address and routes it to the pods behind the Service. For most applications a single ingress controller is preferable to many load-balanced Services, because one public entry point can route by hostname and path to every internal Service behind it.

Storage: Persistent Volumes, Disks, and Files

Containers are ephemeral by design, so any state that must survive a pod restart needs storage that outlives the pod, and AKS provides this through the standard Kubernetes storage abstractions backed by Azure storage services. A persistent volume claim is a request for durable storage, and AKS satisfies it by provisioning the underlying Azure resource through a storage driver. The two common backings are Azure managed disks, which attach to a single node and suit workloads that need fast block storage owned by one pod at a time, and Azure Files shares, which can be mounted by many pods at once and suit workloads that need shared access across nodes.

The decision between them follows the access pattern. A database that wants exclusive, high-performance block storage takes a disk, and the disk’s performance tier, the same tiers that govern standalone virtual machine disks, sets the throughput and operation ceiling the workload can reach. A workload where several pods must read and write the same files takes a Files share, accepting the network-file-system performance characteristics that come with shared access. A subtle constraint follows from the disk model: because a managed disk attaches to one node, a pod using a disk can only run where its disk can attach, which in a zone-spread cluster means the pod and its disk must land in the same zone. Forgetting this produces a pod that cannot schedule because the only node with room is in a different zone from its disk, a Pending failure whose root cause is the storage topology rather than capacity.

Storage sits firmly in the customer-owned column. The driver integration is managed, but choosing the storage type, the performance tier, the access mode, and the data-protection approach is yours, as is remembering that durable state in a cluster is a liability if it is not backed up, because the cluster’s self-healing protects the pods and not the data they hold. A node failure that the cluster heals by replacing the node does nothing for data that lived only on that node’s local storage, which is why genuinely stateful workloads use the durable Azure-backed volumes rather than the node’s own scratch space.

The Access Model: Credentials and kubectl

Talking to a cluster means presenting credentials to its API server, and AKS offers two flavors that are frequently confused. The administrative credentials are a static, certificate-based path that grants full control and bypasses the directory integration entirely, which is exactly why they are dangerous: they cannot be governed by conditional access, cannot be tied to a person’s directory identity, and cannot be revoked short of rotating the cluster’s certificates. The user credentials, when the cluster is integrated with the directory, route authentication through Entra so that access reflects directory identity and policy. The practical guidance is to disable the static administrative path on any cluster that matters and to grant access exclusively through the governed identity path, so that every action against the API server is attributable and revocable.

# Pull governed, Entra-backed credentials (preferred)
az aks get-credentials --resource-group myRG --name myAKS

# The static admin credentials bypass identity governance; avoid on real clusters
# az aks get-credentials --resource-group myRG --name myAKS --admin

# Inspect what the current identity is allowed to do
kubectl auth can-i --list

# A private API server is reachable only from inside the network;
# from outside you connect through a bastion, jump host, or the run-command path
az aks command invoke --resource-group myRG --name myAKS \
  --command "kubectl get pods -A"

The access model intersects with the public-versus-private API endpoint decision. A public endpoint with strong identity in front of it is reachable from anywhere your engineers work, which is convenient and, with the directory integration enforced, defensible. A private endpoint is reachable only from inside your network, which is stronger but means engineers and pipelines must reach it through a network path you provide, whether a jump host, a bastion, or the platform’s run-command capability that proxies a command through the managed side. Neither is universally correct, and the trade-off between reachability and exposure is part of the broader hardening discussion that the securing AKS clusters guide develops in full.

Recurring Misdiagnoses and How to Avoid Them

The same wrong turns appear again and again across teams new to the platform, and naming them is the fastest way to skip the lessons that would otherwise be learned painfully. The first is treating the control plane as the thing to debug. A pod is failing, the application is down, and the instinct is to suspect the managed Kubernetes layer, when the failure is almost always in the pod itself. The responsibility map redirects this instinct: start at the pod, read its status, events, and logs, and only climb toward the node and then the control plane as each lower layer is cleared. The overwhelming majority of incidents never leave the pod level, and the rare control plane problem reveals itself as a genuinely unreachable API server, not as a single workload misbehaving.

The second is undersizing the virtual network for pod addresses under Azure CNI. A cluster launches comfortably, runs fine for months, then hits an address ceiling during a scale-up or an upgrade and cannot place new pods. The fix at that point is disruptive, often a rebuild in a larger subnet, and the prevention is trivial: compute the maximum addresses the cluster can ever consume at full scale plus the upgrade surge, and provision the subnet for that number from the start, or sidestep the whole class of problem with an overlay model that keeps pod addresses out of the subnet entirely. The arithmetic costs ten minutes at creation time and saves a rebuild later.

The third is ignoring node-pool upgrades until a version falls out of support. Because a working cluster gives no daily prompt to upgrade, the work slips, the version ages, and eventually support lapses or a forced upgrade arrives at an inconvenient moment. The prevention is to treat upgrades as routine maintenance with a schedule rather than as an event triggered by an external deadline, refreshing node images for security between version upgrades and moving Kubernetes versions forward inside the support window deliberately. A fourth, related misdiagnosis is assuming the control plane’s availability guarantee protects the application; it protects the API server, and application availability comes from replicas, zone spread, and disruption budgets that you configure. Each of these is a right-column responsibility misread as a left-column guarantee, which is the through-line of nearly every avoidable AKS incident.

Requests, Limits, and Bin-Packing: How the Scheduler Decides

The scheduler’s job is to place pods onto nodes, and it makes that decision almost entirely on the basis of two numbers you attach to each container: the request and the limit. The request is what the pod reserves, the amount of processor and memory the scheduler sets aside for it on a node, and a node is considered full when the sum of its pods’ requests reaches its allocatable capacity, regardless of how much those pods are actually using at the moment. The limit is the ceiling, the most a container may consume before the platform throttles its processor or kills it for exceeding its memory allowance. Getting these two numbers right is the quiet foundation of an efficient, reliable cluster, and getting them wrong is behind a startling share of both waste and instability.

Set requests too high and the cluster reserves capacity that pods never use, so nodes fill on paper while sitting idle in reality, the autoscaler adds machines to satisfy reservations that correspond to no actual demand, and the bill climbs for nothing. Set requests too low and the scheduler packs more pods onto a node than it can truly support, the node comes under memory pressure, and the platform begins evicting pods to save itself, producing seemingly random restarts that are in fact the predictable result of overcommitment. The discipline is to base requests on observed usage, measuring what a workload actually consumes under representative load and setting the request to its steady-state need with modest headroom, then revisiting as the workload evolves.

Limits carry their own subtlety, particularly for memory. A container that exceeds its memory limit is terminated, and if the workload’s memory use is spiky, a limit set at the average will kill it during normal peaks, presenting as a crash loop whose root cause is a limit set below the workload’s real peak need rather than a bug in the code. Processor limits behave differently, throttling rather than killing, which can quietly degrade latency-sensitive workloads in a way that is hard to spot because nothing crashes; the application simply gets slower under load as it is held below its limit. The relationship between requests and limits also sorts pods into quality-of-service tiers that determine which pods the platform sacrifices first under pressure, so a workload that must survive node pressure should have its requests and limits set so it falls into the most protected tier rather than the most expendable one. None of this is configured by AKS for you. It is workload design, sitting in the customer-owned column, and it is where the largest reliability and cost gains usually hide.

How do resource requests and limits affect scheduling on AKS?

Requests are reservations the scheduler uses to decide whether a pod fits on a node, so a node fills when its pods’ total requests reach its capacity, not when actual usage does. Limits cap consumption: a container over its memory limit is killed and over its processor limit is throttled. Right-sizing both prevents the twin failures of idle waste and overcommitment-driven eviction.

Designing for Availability: Zones, Replicas, and Disruption Budgets

The control plane’s uptime guarantee, discussed earlier, protects the API server and says nothing about whether your application survives a node failure, a zone outage, or a routine upgrade. Application availability is something you engineer in the data plane, and it rests on three pillars that work together. The first is running more than one replica of anything that must stay available, so the loss of a single pod or the node beneath it does not take the workload offline. A single-replica deployment is a single point of failure no matter how reliable the platform beneath it, because the platform will faithfully restart that one replica but cannot serve traffic during the gap.

The second pillar is spreading those replicas across failure domains, which on Azure means availability zones. A node pool that spans three zones places its nodes in physically separate sections of a region with independent power and cooling, so a fault confined to one zone leaves nodes in the other two serving traffic. Combining zone spread with a scheduling rule that discourages placing all replicas of a workload in the same zone turns the zone architecture into real resilience rather than a configuration that merely happens to span zones while concentrating every replica in one. The cluster does not do this for you automatically in every case; the spread constraints are workload configuration you supply.

The third pillar is the pod disruption budget, which is the mechanism that protects availability during voluntary disruptions such as the node-by-node drain of an upgrade. A disruption budget declares the minimum number of replicas that must remain available, and the platform honors it when draining nodes, refusing to evict so many replicas at once that the workload would drop below its floor. Without a disruption budget, an upgrade can drain a node and evict every replica of a workload simultaneously, producing an outage during what was supposed to be routine maintenance. With one, the same upgrade proceeds carefully, waiting for evicted replicas to come back elsewhere before evicting more. The upgrade, the autoscaler’s node removals, and any manual node maintenance all respect the budget, which is why it is the single most cost-effective availability setting a team can add: a few lines of configuration that convert a fragile workload into one that survives the operations the platform performs on it routinely. Each of these pillars is data-plane design, owned by you, and together they are what make the difference between a cluster that is available because nothing has gone wrong yet and one that stays available when something does.

Why the Managed Control Plane Is Worth More Than It Looks

It is easy to undervalue the managed control plane precisely because you never see it work, so it helps to spell out what running it yourself would entail. The etcd datastore that records the cluster’s entire desired state is a distributed, quorum-based system, which means it needs an odd number of members spread for fault tolerance, careful attention to disk latency because slow storage stalls the whole cluster, regular backups whose restoration you have actually tested, and version upgrades coordinated with the rest of the orchestrator. A corrupted or quorum-lost etcd is a cluster-wide outage of the worst kind, the sort where the cluster has forgotten what it was supposed to be doing. The API server in front of etcd must be made highly available, secured, and scaled to absorb the request load that every node and controller generates, and the schedulers and controllers behind it must be kept healthy and consistent. Operating all of this correctly is a specialized discipline, and getting it wrong takes down everything at once.

AKS lifts that entire burden, and the value is proportional to how badly self-managed control planes fail when neglected. This reframes the responsibility map in a more sympathetic light: the platform took the most dangerous, most specialized, least differentiated work off your plate and left you the work that is specific to your application and your environment. Sizing your nodes, planning your addresses, and governing your identity are not arbitrary chores; they are the decisions that genuinely depend on what you are running and therefore cannot be made for you by a platform that does not know your workload. The split is not Microsoft keeping the easy part and handing you the hard part. It is Microsoft taking the universal, perilous part and leaving you the part that is irreducibly yours because only you have the context to decide it.

What does it cost me to run the control plane myself instead of using AKS?

It costs the specialized, continuous work of operating a quorum-based etcd datastore, a highly available and secured API server, and the schedulers and controllers behind it, where a single mistake can cause a cluster-wide outage. AKS removes that universal, high-risk burden entirely, which is why even teams with deep Kubernetes skill usually let the platform own the control plane.

The platform also offers a spectrum of how much management you delegate, and knowing where you sit on it clarifies your responsibilities. At one end is a cluster you configure in detail, choosing every node size, plugin, and pool layout yourself, which is the model this guide has mostly described and which gives you the most control over the customer-owned column. Toward the other end is a more opinionated, automated experience that provisions production-ready clusters with infrastructure operations such as node provisioning, scaling, and network configuration handled for you according to embedded best practices, trading some of that control for less operational surface area. The automated experience does not erase the responsibility map; it applies sensible defaults to more of it and operates more of it on your behalf, which suits teams that want Kubernetes without configuring every knob. Choosing where to sit on this spectrum is itself a design decision: more control where your workload genuinely needs it, more automation where it does not, and an honest assessment of which rows of the responsibility map your team actually wants to own. Verify the current capabilities and availability of the automated option against official documentation, as the managed experience continues to expand.

Observability: Seeing Into a Cluster You Will Have to Debug

A cluster you cannot see into is a cluster you cannot operate, and observability is the quiet prerequisite for every diagnostic move this guide has described. AKS integrates with Azure’s monitoring stack to stream node and pod metrics, container logs, and control-plane logs into a workspace, and it can expose metrics in the open Prometheus format that the wider Kubernetes ecosystem expects, paired with dashboards for visualization. Enabling this is a customer decision with a customer cost, because the volume of logs and metrics a busy cluster produces is substantial, and a monitoring configuration that captures everything indiscriminately generates an ingestion bill that occasionally rivals the compute it is watching. The discipline is to collect what you will actually use to diagnose and to set retention deliberately rather than hoarding logs no one will read.

The reason observability is not optional becomes clear the moment you revisit the failure-isolation model. Diagnosing a pod that keeps restarting means reading its logs and its events, which requires that the logs were being captured before the failure rather than after you went looking. Diagnosing a node under memory pressure means having the node metrics that show the pressure building. Confirming that an upgrade respected disruption budgets, that the autoscaler added nodes when pods went pending, or that outbound connection failures correlate with traffic volume all depend on having the relevant signal already flowing. Observability set up after an incident tells you nothing about the incident. The practical pattern is to enable metrics and logs from day one, build a small number of dashboards that map onto the pod, node, and control-plane layers so you can locate a failure quickly, and configure alerts on the conditions that actually page a human, such as pods stuck pending, nodes going unready, or a workload dropping below its replica floor. Like every other row of the responsibility map, the platform provides the integration and you decide how to use it, and the teams whose clusters are calm are the ones who invested in seeing into them before they had to.

Why should I enable monitoring before I have a problem to investigate?

Because diagnosis depends on signal that was captured before the failure, not after. Pod logs, node metrics, and scheduling events only help if they were already flowing when the incident began. Enable metrics and logs from day one, build dashboards mapped to the pod, node, and control-plane layers, and alert on the conditions that genuinely require a response.

Frequently Asked Questions

Q: What does Azure Kubernetes Service actually manage on my behalf?

AKS manages the Kubernetes control plane: the API server that accepts your declarations, the etcd datastore that records them, the scheduler that places pods, and the built-in controllers that reconcile desired state against reality. You never log into these components, patch them, size them, or back them up, and on the entry tier they are provided free. Everything below the control plane remains yours to operate, which is the part that surprises people. The worker virtual machines that make up your node pools, the network plugin and address space they live inside, the identity wiring that governs access to the API, the storage that holds durable state, and the cadence of keeping Kubernetes versions and node images current are all customer responsibilities. The clean way to hold this is that AKS gives you a managed brain and asks you to build and operate the body. Most real incidents live on the body side, which is why understanding the boundary is the foundation of operating the platform well.

Q: Should I use kubenet or Azure CNI networking on a new AKS cluster?

For a new cluster, the modern recommendation is Azure CNI Overlay rather than either traditional kubenet or traditional Azure CNI. The overlay model gives pods their own address space without consuming addresses from your virtual network subnet, which avoids the address-exhaustion trap that traditional Azure CNI is famous for while keeping the clean, routable behavior that kubenet lacks. Traditional kubenet, which translates pod traffic to the node address and keeps pods off the subnet, is on a retirement path and should not anchor a new build; verify the current retirement timeline against the official Azure announcement, as the date can shift. Reserve traditional Azure CNI, where every pod takes a real subnet address, for the specific cases that need pods to be directly addressable on the virtual network, and when you choose it, size the subnet for the cluster’s maximum scale plus the upgrade surge from the start. The networking choice is made at creation and painful to change later, so it deserves deliberate thought rather than an accepted default.

Q: What is the difference between system and user node pools, and do I need both?

A system node pool hosts the critical add-on components that keep the cluster itself functioning, including the in-cluster DNS service and the metrics pipeline, and every cluster must have at least one. A user node pool hosts your application workloads. You can technically run everything on a single pool, and small development clusters often do, but any cluster you care about should separate the two. The reason is blast radius: if applications and system components share a pool, a runaway application that consumes the pool’s capacity can evict the DNS pods, and once name resolution fails inside the cluster, problems cascade in ways that are hard to diagnose. A small, dedicated system pool that runs only system components, protected so application pods cannot schedule onto it, keeps the cluster’s own machinery insulated from whatever your applications do. User pools then give you a place to vary machine size, use spot instances for interruptible work, or reserve specialized hardware behind taints, matching each workload to the capacity that fits it.

Q: Does the AKS control plane come with a service level agreement?

It depends on the tier. The Free tier provides a managed control plane with no financially backed availability guarantee, only an internal target Microsoft aims for, and it is provisioned with limited resources suited to development and small clusters rather than production scale. The Standard and Premium tiers add a financially backed uptime guarantee for the Kubernetes API server endpoint, commonly stated as 99.95 percent for clusters spread across availability zones and 99.9 percent for clusters that are not, in exchange for a per-cluster management fee. The Premium tier further adds long-term support for older Kubernetes versions. Production clusters should run on a paid tier for the guarantee and the more capable control plane that comes with it. One caveat matters: the guarantee covers the API server, not your application. A highly available API server does nothing for an application running on a single node, so real availability still depends on replicas, zone spread, and disruption budgets that you configure. Confirm the current percentages and fees against the official pricing and service level pages before quoting them.

Q: How do AKS cluster upgrades and node-image upgrades differ?

They are two separate things that move on independent schedules, and treating them as one causes either unnecessary churn or stale, vulnerable nodes. A cluster upgrade changes the Kubernetes version, the orchestrator behavior and the API surface your manifests target, and AKS supports only a window of recent versions, so staying inside that window is a customer responsibility. A node-image upgrade refreshes the operating system image on the worker machines, delivering security patches and component updates without touching the Kubernetes version at all. The practical pattern is to refresh node images frequently to absorb security fixes between the less frequent Kubernetes version upgrades, so the nodes stay patched even when the orchestrator version holds steady. Both kinds of upgrade roll node by node, adding a fresh node, draining and removing an old one, and repeating, which is why they transiently consume extra capacity and why workloads need disruption budgets to come through them cleanly. Schedule both as routine maintenance rather than waiting for a support deadline to force the issue at an inconvenient time.

Q: Why did my AKS subnet run out of IP addresses?

This almost always happens under traditional Azure CNI, where every pod takes a real address from the virtual network subnet and each node reserves addresses up front for the maximum number of pods it can host. A subnet sized for the cluster’s launch state, rather than its maximum scale, fills as the cluster grows or during an upgrade, when the node-by-node roll transiently adds extra nodes and consumes a burst of additional addresses. Once the subnet is exhausted, new pods cannot get an address and fail to schedule, and because resizing the subnet under a running cluster is awkward, the remedy is frequently a disruptive rebuild in a larger subnet. The prevention is arithmetic done at creation: multiply the maximum node count by the maximum pods per node, add the node addresses, add a generous margin for the upgrade surge, and size the subnet for that total. Better still for most new clusters, use Azure CNI Overlay, which keeps pod addresses in a separate overlay space and removes this entire failure class because pods no longer draw from the subnet at all.

Q: Should I expose an AKS workload with a load-balanced Service or an ingress controller?

A Kubernetes Service of the load-balancer type gives one workload its own dedicated external entry point, which AKS provisions through the managed integration. That is fine for a single exposed workload, but creating a separate load-balanced Service for every microservice is wasteful and noisy, because each one stands up its own entry point. For anything beyond a single service, prefer one ingress controller. The controller owns a single entry point and routes incoming requests to many internal Services by hostname and path, so the whole application sits behind one front door. The deciding factor is the number of services you expose and how much shared routing logic they need. A handful of unrelated endpoints can each take a load-balanced Service; a multi-service application wants the consolidation, centralized certificate handling, and single point of routing and security control that an ingress controller provides. The trade-off is operational ownership: the ingress controller and its rules are yours to install and maintain, in exchange for the consolidation it buys.

Q: What causes intermittent outbound connection failures from AKS pods under load?

The usual culprit is source network address translation port exhaustion on the cluster’s outbound path. When many pods open many simultaneous connections to external destinations, those connections share a pool of translation ports on the outbound address, and a high-connection workload can exhaust that pool, at which point new outbound connections fail intermittently while existing ones succeed. The failure is invisible at low traffic and appears only under load, which makes it confusing to diagnose because the same code works fine in testing and fails in production peaks. The signal is intermittent connection timeouts to external services that correlate with traffic volume rather than with any particular destination. The fix is to provision the outbound path deliberately for the connection volume the workload generates, rather than relying on the default outbound configuration sized for modest use, and to reduce unnecessary connection churn in the application by reusing connections where possible. This is a classic example of a default that suits small workloads silently becoming a problem at scale, sitting squarely in the customer-owned column.

Q: How should I size AKS nodes, with many small nodes or fewer large ones?

The trade-off runs between packing efficiency and overhead, and the right answer depends on your pods’ size distribution. Many small nodes let the autoscaler add and remove capacity in fine-grained steps and pack small pods tightly, but each node reserves a slice of its capacity for the operating system and kubelet, so a fleet of small nodes loses more total capacity to that fixed overhead and, under traditional Azure CNI, consumes more addresses. Fewer large nodes amortize the per-node overhead and consume fewer addresses, but the autoscaler then adds capacity in coarse jumps, and a large node holding only a few pods strands capacity that the cluster paid for. The discipline is to choose a node size whose capacity is a sensible multiple of your typical pod’s requests, so pods pack with little waste and the autoscaler’s increments roughly match how the workload grows. Workloads with very different resource profiles benefit from separate node pools sized for each, rather than forcing one node size to serve every workload shape in the cluster.

Q: Is AKS a good fit for a small application, or is it overkill?

For many small applications, AKS is more platform than the workload needs, and a simpler container service that abstracts away nodes and upgrades would let the team spend its attention on the application instead of the infrastructure. The deciding factor is not the size of the application but the shape of its requirements. If the application is a handful of stateless services that scale on processor or request load and nobody on the team wants to operate node pools, networking, and upgrades, a serverless container platform removes the entire customer-owned column from your plate and is usually the better choice. AKS earns its operational cost when you genuinely need the Kubernetes feature set: fine-grained scheduling, sidecar containers, the ecosystem of operators and controllers, advanced traffic management, portability across clouds, or existing team expertise and tooling you want to reuse. Choosing Kubernetes for prestige rather than because the requirements name it leads to operating a fleet you did not need. Match the platform to the requirements, not to the trend.

Q: What is the most common way teams misdiagnose AKS problems?

The most common error is reaching for the control plane when a pod is at fault. An application goes down, and the instinct is to suspect the managed Kubernetes layer, when the failure is almost always in the pod itself: a crashing container, an image that will not pull, a health probe failing, or a memory limit set below the workload’s real peak. The right discipline is to start at the pod, read its status, events, and logs, then climb to the node only if the pod layer is clean, and consider the control plane last and only when the API server is genuinely unreachable after you have ruled out your own network path to it. The responsibility map exists to redirect the first instinct toward the layer that actually broke. The overwhelming majority of incidents never leave the pod level, and the time lost to inspecting the managed layer that was never the problem is time the application stays down. Diagnose from the bottom up, not the top down.

Q: Do I need to back up data in an AKS cluster, or does the platform handle it?

You need to back it up. The platform’s self-healing protects the running pods, not the data they hold. When a node fails, the cluster heals by replacing the node and rescheduling its pods, but any state that lived only on that node’s local scratch storage is gone, because local storage does not survive the node. Durable state must live on Azure-backed persistent volumes, managed disks for single-pod block storage or Azure Files shares for shared access, so that the data outlives any individual pod or node. Even then, the persistent volume is not a backup; it is durable storage that can still be deleted, corrupted, or lost to a regional event. Genuinely important state needs a backup strategy of its own, whether snapshots of the underlying disks, backups taken by the application, or a dedicated cluster backup tool. The principle is that Kubernetes manages availability of compute, not durability of data, and the data-protection responsibility is yours regardless of how managed the rest of the cluster feels.

Q: How do availability zones improve AKS resilience, and are they automatic?

Availability zones place a node pool’s machines in physically separate sections of an Azure region with independent power, cooling, and networking, so a fault confined to one zone leaves nodes in the others serving traffic. They are not fully automatic. You enable zone spread when you create the node pool, and you still need to ensure your workloads actually distribute their replicas across the zones rather than concentrating in one, which is done with scheduling constraints that discourage placing every replica of a workload in the same zone. A cluster can span three zones while a single-replica deployment runs entirely in one of them, in which case the zone architecture provides no protection for that workload. Real zone resilience comes from combining a zone-spread node pool, multiple replicas, and spread constraints so the replicas land in different zones. Zones also interact with storage, because a managed disk attaches within a zone, so a pod using a disk must schedule in the disk’s zone, a constraint worth remembering when a disk-backed pod will not schedule despite apparent free capacity elsewhere.

Q: What does a pod disruption budget do and why does it matter for upgrades?

A pod disruption budget declares the minimum number of replicas of a workload that must remain available during voluntary disruptions, and the platform honors it whenever it drains nodes. Its importance becomes obvious during an upgrade, which rolls node by node, draining each node by evicting its pods before removing it. Without a disruption budget, that drain can evict every replica of a workload at once if they happen to share a node or if several nodes drain in quick succession, producing an outage during what was meant to be routine maintenance. With a budget in place, the drain refuses to evict so many replicas that the workload would drop below its declared floor, waiting for evicted replicas to come back elsewhere before continuing. The same protection applies when the cluster autoscaler removes underused nodes and when an operator drains a node for maintenance. A disruption budget is a few lines of configuration that convert a fragile workload into one that survives the operations the platform performs on it routinely, which makes it one of the highest-value availability settings a team can add.

Q: What is an AKS node pool actually made of, and why are nodes called replaceable?

A node pool is a group of identical worker virtual machines, and underneath it is an Azure virtual machine scale set, the same primitive that powers ordinary autoscaling fleets elsewhere on the platform. That single fact explains most node behavior. When the cluster needs capacity it asks the scale set to grow; when it sheds capacity it asks the scale set to shrink; and when a single machine misbehaves, the remedy is often to reimage or replace that scale set instance rather than to nurse it back to health. This is why nodes are described as replaceable cattle rather than pets: the orchestrator assumes any node can be destroyed and recreated without ceremony, because the state that matters lives in the control plane and on durable volumes, not on the node itself. The practical consequence is that you should never hand-configure a node and expect the change to persist, because the next scale or upgrade operation can replace it. Anything that must be true of every node belongs in the node pool’s configuration or in the image, not in a manual tweak applied to one machine.

Q: Can I change the network plugin or subnet of an existing AKS cluster?

Largely no, and that is exactly why the networking choice deserves care at creation time. The network plugin is a foundational property set when the cluster is created, and switching between fundamentally different models, for example from a translated model to one where pods hold routable subnet addresses, is not a flip of a setting but generally a migration or rebuild. Likewise, the subnet a cluster lives in is not something you cleanly resize underneath a running cluster, so a subnet that turns out too small for the cluster’s growth typically forces a rebuild in a larger one. There are supported migration paths for specific transitions, such as moving toward the overlay model, and you should verify the current supported migrations against official documentation because the platform adds capabilities over time. The safe default is to treat the plugin and the address plan as decisions you make once, deliberately, sized for the cluster’s maximum future scale, rather than as settings you expect to adjust later when you discover they were too small.

Q: How is AKS billed, and why is my control plane nearly free while the bill is large?

AKS bills the managed control plane separately from everything underneath it, and the control plane is free on the entry tier and a modest fixed per-cluster fee on the paid tiers. The large part of the bill is the worker fleet and the resources it consumes: the virtual machines in your node pools, billed like any other compute, plus their disks, the load balancers fronting your services, the outbound networking, and the logs and metrics you ingest. This is why teams expecting a big Kubernetes charge are surprised to find the orchestrator brain costs little while the machines dominate the invoice. The pricing mirrors the responsibility model precisely: you pay for what you operate, which is the worker fleet, not for the control plane that Microsoft operates. The practical consequence is that cost control on AKS is overwhelmingly about the data plane, right-sizing pod requests so nodes are packed efficiently, using spot capacity for interruptible work, scheduling non-production clusters to shut down when idle, and being deliberate about log ingestion volume. Verify current tier fees against the official pricing page, as they change.

Q: What happens to my workloads if the AKS control plane becomes unavailable?

If the control plane is genuinely unreachable, your already-running pods generally keep running, because the actual work happens in the data plane on the nodes, and the kubelet continues to run the containers it was already told to run. What you lose is the ability to make changes and the self-healing loop: you cannot deploy, scale, or reschedule, and if a pod or node fails during the outage, the controllers that would normally replace it cannot act until the control plane returns. So a control plane outage is not an immediate application outage for steady-state traffic, but it removes the cluster’s ability to react to anything, which becomes an application problem the moment something in the data plane fails while the brain is offline. This is part of why the paid-tier guarantee and the more robust control plane it provisions matter for production, and why availability zones for the control plane reduce the chance that a single zone fault takes the brain down. In practice, genuine control plane outages are rare, and most reports of one turn out to be a local network path problem reaching the API server.

Q: What happens to my pods when an AKS node fails?

The control plane’s reconciliation loop notices that a node has stopped reporting healthy, marks it unavailable, and reschedules the pods that were running on it onto other nodes that have room, recreating them fresh rather than moving the originals. This is the self-healing behavior that makes Kubernetes resilient: the desired state said a certain number of replicas should run, the node failure violated that, and a controller acts to restore it. Two conditions decide whether the failure is invisible or painful. First, there must be somewhere for the displaced pods to go, which means either spare capacity on existing nodes or a cluster autoscaler with headroom to add a node, or the rescheduled pods sit Pending until capacity appears. Second, the workload must run more than one replica spread so that the loss of one node does not take the only copy offline during the gap before rescheduling completes. A single-replica workload on a failed node is briefly down even though the platform recovers it, whereas a multi-replica, well-spread workload rides through the failure with the surviving replicas serving traffic. The compute heals automatically; any data that lived only on the failed node does not, which is why durable state belongs on Azure-backed volumes.

Q: What is the difference between a pod, a node, and the control plane?

These three terms name the layers you must keep distinct to operate and debug a cluster. A pod is the smallest deployable unit, one or more containers that share a network identity and run together, and it is where your application code actually executes. A node is a worker virtual machine that hosts pods, running the agent that talks to the control plane, the container runtime, and the network proxy; nodes are grouped into node pools backed by scale sets and are treated as replaceable cattle rather than pets. The control plane is the managed brain that decides which node each pod runs on, records desired state, and runs the loops that keep reality matching that desired state. The relationship is hierarchical: pods run on nodes, and the control plane orchestrates which pods run on which nodes. For debugging, this hierarchy is the diagnostic order. Start at the pod, the layer that fails most often and is entirely yours; climb to the node if the pod is healthy but its host is not; and consider the control plane last, since it fails rarely and is largely Microsoft’s to resolve.