Two virtual networks peered to the same central network do not talk to each other, and that single sentence is where most engineers first misread the hub-spoke topology. They build a clean diagram, draw a hub in the middle, draw three workload networks around it, connect every workload to the center, and assume the picture means full connectivity. Then a virtual machine in one workload network tries to reach a database in another, the packet leaves and never returns, and the diagram that looked correct turns out to describe a network that was never built to behave the way it was drawn. The gap is not a bug. It is the defining property of the design, and understanding why it exists is the difference between a hub-spoke network that scales to fifty workloads and one that quietly breaks the first time two workloads need to reach each other.

The hub-spoke topology is the most common enterprise networking pattern in Azure, and it is common precisely because it answers a question that flat networks cannot: how do you give dozens of isolated workloads a shared set of central services, a single controlled path to the internet, and one connection back to the corporate data center, without wiring every workload to every other workload and to every external endpoint by hand. The answer is a center that owns the shared infrastructure and a set of edges that own the work, joined by virtual network peering, and governed by a routing rule that most people never read closely until it costs them an afternoon of troubleshooting. This article builds the model from the platform behavior up, so that by the end the topology is not a diagram you copied but a set of rules you can reason from, design against, and debug when a packet goes missing.
What a hub-spoke network topology actually is
A hub-spoke network topology is a connectivity pattern in which one central virtual network, the hub, holds the services that every workload shares, and multiple workload virtual networks, the spokes, each connect to the hub through peering. The hub centralizes the firewall, the gateways, and shared name resolution, while each spoke stays isolated and carries only its own application. Traffic between spokes and out to other networks routes through the hub.
That definition packs in four decisions, and pulling them apart is the whole job of the rest of this article. The first decision is what lives at the center. The second is what stays at the edges. The third is how the center and the edges connect. The fourth, and the one that surprises people, is how the edges reach each other and the outside world once they are connected only to the center. Each decision follows from how Azure virtual networks actually behave, not from an arbitrary convention, and once the behavior is clear the design stops feeling like a recipe and starts feeling like the only sensible arrangement of the parts.
Start with the center. The hub is a virtual network whose job is not to run workloads but to host the infrastructure that every workload would otherwise have to duplicate. A firewall belongs in the hub because you want one place to inspect and control traffic leaving for the internet, not one firewall per workload. A VPN gateway or an ExpressRoute gateway belongs in the hub because the connection back to the corporate network is expensive to stand up and you want every workload to share the single circuit rather than paying for one gateway each. Shared DNS resolution belongs in the hub because name resolution should be consistent across the whole estate. Identity infrastructure, jump hosts, monitoring collectors, and any service that the organization treats as central rather than per-application all have a natural home in the hub. The principle is duplication avoidance: anything that every spoke would otherwise need its own copy of is a candidate for the hub.
Now the edges. A spoke is a virtual network that holds one workload or one environment, and nothing shared. A spoke might be the production environment for a single application, with its own subnets for web tier, application tier, and data tier. Another spoke might be the staging environment for the same application, deliberately kept in a separate network so that staging traffic and production traffic never mix on the same wire. A third spoke might belong to an entirely different team running an unrelated system. The point of putting each workload in its own spoke is isolation by default: a spoke cannot see another spoke unless the design explicitly routes that traffic, which means a misconfiguration or a compromise in one workload does not automatically expose the others. The spoke owns its address space, its subnets, its network security groups, and its application, and it borrows everything central from the hub.
What security benefit does putting each workload in its own spoke provide?
Isolation by default. A spoke cannot reach another spoke unless the design explicitly routes that traffic through the hub, so a workload is closed to its peers until you deliberately open a path. A compromise or a misconfiguration in one spoke does not automatically expose the others, because the absence of a route is the absence of reachability.
This default-closed posture is the quiet reason the topology is favored for regulated and multi-team estates, and it is worth contrasting with the alternative to see the value. In a single flat virtual network holding many workloads, every resource can reach every other resource at the network layer by default, and segmentation depends entirely on network security groups being correct on every subnet, with no structural backstop if one is misconfigured. The hub-spoke inverts the default: nothing reaches a spoke from outside it unless a route and a policy both permit it, so a forgotten rule fails closed rather than open. The cost of that safety is the routing work this article is largely about, because the same non-transitivity that keeps a compromised spoke contained is what forces you to build the deliberate paths between the spokes that should talk. Containment and connectivity are two faces of the same property, and the topology gives you the first for free and makes you earn the second.
The connection between a hub and a spoke is a virtual network peering. Peering joins two virtual networks so that resources in one can reach resources in the other using private addresses, with traffic staying on the Microsoft backbone rather than traversing a gateway or the public internet. In a hub-spoke design, you peer each spoke to the hub, and only to the hub. You do not peer spokes to each other, even when two spokes need to communicate, and the reason for that restraint is the rule that the entire topology is built around. If you want the full mental model of how peering establishes a connection, what the peering states mean, and how the two sides of a peering relationship have to agree, the Azure Virtual Network deep dive lays out the address-space and peering mechanics that this topology assumes you already understand.
Why does peering each spoke to the hub not connect the spokes to each other?
Virtual network peering is not transitive. If spoke A is peered to the hub and spoke B is peered to the hub, that does not create a path from A to B. Each peering connects exactly two networks, and Azure does not automatically forward traffic from one peering across the hub into another. A packet from A reaches the hub and then has nowhere to go, because the hub has no instruction to pass it onward to B.
This single property, non-transitivity, is the reason the hub-spoke topology has the shape it has and the reason most first builds fail in the same way. People reason by analogy to a physical switch: plug three devices into one switch and they all reach each other, because a switch forwards frames between any two ports. A virtual network is not a switch. A peering is a point-to-point relationship between two address spaces, and the hub does not act as a transit forwarder for free. When spoke A sends a packet destined for an address in spoke B, the packet arrives at the hub’s edge of the A-to-hub peering, and the hub’s effective routes contain no entry that says traffic for spoke B’s range should be forwarded into the hub-to-B peering. The packet is dropped. Nothing logs an error in the obvious place, because nothing is broken from Azure’s point of view; the network is doing exactly what the routes tell it to do, which is nothing.
The hub-routes-for-the-spokes rule
Everything in this topology reduces to one claim, and it is worth stating in a form you can quote in a design review: because peering is non-transitive, the hub must carry spoke-to-spoke and on-premises routing through an appliance or a gateway, and that obligation is the entire reason the hub exists. Call it the hub-routes-for-the-spokes rule. The hub is not merely a convenient place to park shared services. The hub is the only thing that can make the spokes reach each other, reach the corporate network, and reach the internet through a controlled path, because non-transitive peering means the spokes cannot do any of those things on their own.
The rule reframes the whole design. A beginner sees the hub as a shared-services container and the peering as the connectivity, and is then baffled when the connectivity does not connect. The correct view inverts the emphasis: peering provides reachability between a spoke and the hub and nothing more, and the hub provides connectivity to everywhere else by actively routing on the spokes’ behalf. The shared services are real and important, but they are not why the topology works. The topology works because the hub carries a network virtual appliance, a firewall, or a gateway that forwards traffic the peerings alone cannot, and because user-defined routes in the spokes point that traffic at the hub in the first place.
Hold this rule in mind as the lens for the rest of the article. Every failure mode is a violation of it. Spoke-to-spoke traffic fails because the hub has no appliance to forward between spokes. On-premises traffic fails because the hub has no gateway, or because the spokes are not told to use it. Internet egress escapes inspection because no route forces spoke traffic through the hub firewall. Each problem is the same problem wearing different clothes: the hub was expected to route, and it was not built or instructed to do so.
What is the difference between reachability and routing in this design?
Reachability is whether a path exists between two endpoints; routing is the set of decisions that send a packet down a specific path. Peering gives a spoke reachability to the hub. It does not give the spoke routing to a third network. The hub-spoke topology depends on adding routing, through appliances and user-defined routes, on top of the reachability that peering provides.
The distinction matters because the two are easy to conflate and the symptoms of confusing them are misleading. When spoke A cannot reach spoke B, the instinct is to check whether the networks are connected, and the peering status shows Connected, so the connection looks healthy. The peering is healthy. Reachability between A and the hub, and between the hub and B, both exist. What does not exist is a routing decision that carries an A-to-B packet across the hub. The fix is never to re-create the peering. The fix is to give the hub something that forwards between the two peerings and to give the spokes routes that send the traffic to that forwarder. Engineers who internalize the reachability-versus-routing split stop reaching for peering when the real gap is a missing route, which is the single most common wasted hour in hub-spoke troubleshooting.
The InsightCrunch hub-spoke blueprint
The findable artifact for this topology is a blueprint that answers, in one place, the four questions a design must settle: what belongs in the hub, what belongs in a spoke, how each piece connects, and what routing makes the connection actually carry traffic. The table below is that blueprint. Treat it as the reference you check a design against, not as a deployment script; the addresses are illustrative and the specific resources depend on your estate.
| Element | Where it lives | How it connects | Routing required |
|---|---|---|---|
| Azure Firewall or NVA | Hub, dedicated subnet | N/A, it is the forwarder | Spokes send default and inter-spoke traffic here via UDR |
| VPN or ExpressRoute gateway | Hub, GatewaySubnet | N/A, it is the gateway | Gateway transit shares it; spokes learn on-premises routes through it |
| Shared DNS resolver | Hub, dedicated subnet | Spokes set custom DNS to its address | Spoke-to-hub peering carries DNS queries |
| Workload (web, app, data) | Spoke, workload subnets | Peered to hub only | UDR sends non-local traffic to the hub firewall |
| Spoke-to-hub link | Both sides of the peering | Peering on hub and on spoke | “Allow forwarded traffic” enabled on the peerings |
| Spoke-to-spoke path | Through the hub appliance | No direct spoke peering | UDR in each spoke routes the other spoke’s range to the hub |
| On-premises path | Through the hub gateway | Gateway transit on the peerings | Spokes use remote gateway; gateway propagates on-premises routes |
| Internet egress | Through the hub firewall | No direct spoke egress | UDR 0.0.0.0/0 in each spoke points at the firewall private IP |
The blueprint encodes the hub-routes-for-the-spokes rule line by line. Notice that every row in the lower half has a routing-required entry that points back at the hub. That column is the topology. Strip it away and you have eight virtual networks peered to a center that does nothing useful; keep it and you have an enterprise network that routes every spoke’s traffic through one controlled, observable, central path. The remaining sections walk each row, explain the platform behavior that forces it, and give the command or configuration that realizes it.
How the routing logic actually works
The hub-spoke topology is governed by three routing mechanisms working together: system routes that Azure creates automatically, gateway transit that shares the hub’s gateway with the spokes, and user-defined routes that override the system routes to force traffic through the hub appliance. Understanding what each one does, and the order in which Azure evaluates them, is what lets you predict where a packet will go before you send it.
Every subnet in Azure starts with a set of system routes. These cover the subnet’s own virtual network address space, the peered networks, the gateway-propagated ranges, and a default route to the internet. When you peer a spoke to the hub, Azure adds a system route in the spoke for the hub’s address space and a system route in the hub for the spoke’s address space, which is what gives the two networks reachability without any manual configuration. These system routes are why a spoke can reach a shared DNS resolver in the hub the moment the peering is established: the route to the hub’s range exists automatically, and the DNS server sits inside that range.
System routes do not, however, create a route in spoke A for spoke B’s range. The peering between A and the hub teaches A about the hub’s range only, not about anything beyond it. So a packet in A destined for B falls through to the default system route, which sends unknown traffic toward the internet, where B’s private address does not exist, and the packet dies. This is the platform-level reason behind the non-transitivity you read about earlier: it is not that Azure refuses to connect the spokes, it is that the automatic routes only ever describe the directly peered network, never the network one hop beyond it.
Gateway transit is the mechanism that lets the spokes use the hub’s gateway without each spoke owning a gateway of its own. When you enable gateway transit on the hub-to-spoke peering and “use remote gateway” on the spoke-to-hub peering, the spoke begins to receive the routes that the hub’s VPN or ExpressRoute gateway has learned from on-premises. Those routes are propagated into the spoke’s effective routes, so a packet in the spoke destined for a corporate subnet now has a route that points at the hub gateway, and the packet is carried across the peering, through the gateway, and down the tunnel. Gateway transit is the row in the blueprint that makes on-premises connectivity work for every spoke from a single shared circuit, and it is the cleanest example of the hub routing on the spokes’ behalf.
There is a wrinkle in how gateway-propagated routes behave that trips up engineers who expect them to act like static entries. The on-premises ranges arrive in the spoke through Border Gateway Protocol, advertised by the gateway and learned dynamically, which means they appear in the effective routes as a route type of VirtualNetworkGateway rather than as a system or user-defined entry. If the corporate network advertises a range and later withdraws it, the spoke’s route for that range disappears with it, and connectivity to that range vanishes without anyone touching the Azure configuration. The dynamic nature is a strength, because it keeps the spokes current with whatever the corporate network advertises, but it also means a routing problem in the spoke can originate entirely on the on-premises side, in a router that stopped advertising a prefix. When a spoke loses its path to a corporate subnet and the Azure configuration is unchanged, the on-premises advertisement is the place to look before anything in Azure.
How does Azure choose between competing routes?
Azure selects a route by longest prefix match first: the most specific route that matches the destination address wins, regardless of route type. When two matching routes share the same prefix length, route type breaks the tie in a fixed order, with user-defined routes taking priority over routes learned through Border Gateway Protocol, which in turn take priority over the system routes Azure creates automatically.
This two-stage selection is the engine behind every intentional behavior and every accidental one in the topology, so it pays to work an example by hand. Imagine a spoke whose effective routes contain three entries that could match traffic headed to a corporate subnet at 192.168.10.0/24. The system default route covers 0.0.0.0/0 and sends unknown traffic to the internet. A user-defined route you added covers 0.0.0.0/0 and sends everything to the hub firewall. A gateway-propagated route covers exactly 192.168.10.0/24 and points at the hub gateway. For a packet bound for 192.168.10.5, longest prefix match runs first and the /24 route wins outright over both /0 routes, so the packet goes to the gateway, not the firewall, even though your default UDR was meant to capture everything. That is usually the correct outcome, because on-premises traffic should reach the corporate network directly, but it is also the source of a recurring surprise: people add a broad UDR to the firewall expecting it to capture on-premises traffic for inspection, and the more specific propagated route silently wins, so the on-premises flow never reaches the firewall. To inspect on-premises traffic you must add a UDR that is at least as specific as the propagated route, because a route of equal or lesser specificity will lose the longest-prefix comparison.
Now consider two routes of the same prefix length, where the tie-break by route type decides the winner. A spoke has the system default to the internet and your user-defined default to the firewall, both covering 0.0.0.0/0. Same prefix length, so route type decides, and the user-defined route wins, which is exactly why placing that single UDR captures all egress. The mental shorthand that holds in nearly every case is this: specificity first, then your hand-written intent, then the dynamic learning from the corporate network, then the platform’s defaults. Reading the effective routes through that lens turns a confusing table into a predictable decision, and it is the skill that lets you say with confidence where a packet will go before you ever send one.
How do user-defined routes force traffic through the hub firewall?
A user-defined route is a manual route you attach to a spoke subnet that overrides the system routes. To force traffic through the hub firewall, you create a route table with a route for 0.0.0.0/0 whose next hop is a virtual appliance at the firewall’s private IP, and associate it with the spoke subnets. Azure then sends all non-local spoke traffic to the firewall instead of straight to the internet.
The order of evaluation is what makes this work, and it is worth committing to memory because it explains both the intended behavior and the most common misconfigurations. Azure selects routes by longest prefix match first, and when two routes have the same prefix, by route type in a fixed priority: user-defined routes win over BGP-propagated routes, which win over system routes. So when you place a UDR for 0.0.0.0/0 pointing at the hub firewall, it beats the system default route to the internet, and all egress traffic from the spoke now lands on the firewall. When you place a UDR for another spoke’s specific range pointing at the firewall, its longer prefix beats the broad default, and inter-spoke traffic is steered to the firewall as well, where the firewall, which is reachable from both spokes through the hub, forwards it onward. The firewall, or any network virtual appliance in that role, is the transit forwarder that the non-transitive peering cannot provide on its own. The route table is how the spokes are told to use it.
There is a subtlety in the UDR for the firewall itself and for the gateway subnet that catches people. If you blanket every subnet, including the hub’s own firewall subnet, with a default route pointing at the firewall, you create a loop: traffic arrives at the firewall, the firewall’s subnet route says send it to the firewall, and it never leaves. The firewall subnet must keep its system default route to the internet so that inspected traffic can actually egress. Likewise, when you force on-premises-bound traffic through the firewall for inspection, the gateway subnet needs a route table that sends spoke-bound return traffic back through the firewall rather than directly, or the forward and return paths become asymmetric and a stateful firewall drops the reply. Asymmetric routing through a stateful appliance is one of the quieter hub-spoke failures, and it is always a routing problem, never a peering problem.
The configuration that realizes the topology
With the model and the routing logic in place, the configuration is mostly a matter of translating the blueprint into resource definitions. The order of operations matters: create the networks and the hub services first, establish peerings with the correct transit and forwarding flags, then layer the route tables that force traffic through the hub. Building the route tables before the appliance they point at exists will simply blackhole traffic until the appliance is up.
What order should the resources be deployed in?
Networks and the hub services first, then peerings with the correct flags, then the route tables that force traffic through the hub. The order matters because each layer depends on the one below it: a route table that points at a firewall private IP does nothing useful until the firewall exists at that address, and a peering with gateway transit does nothing until the gateway is deployed and running.
Getting the order wrong does not usually produce an error; it produces a window of broken connectivity that resolves itself once the missing piece appears, which is more confusing than a clean failure because the symptom changes as the deployment progresses. If you apply a spoke’s default route to the firewall before the firewall has an address, the spoke sends its egress to a next hop that does not answer, and every outbound connection times out until the firewall comes up at the expected IP. If you enable use-remote-gateways on a spoke before the hub gateway finishes provisioning, the spoke has no propagated on-premises routes and corporate connectivity is dark until the gateway is ready. The practical defense, beyond simply sequencing the steps, is to pin the firewall’s private IP and the gateway’s placement deterministically so that the route tables can be written against known addresses, which is another argument for defining the whole topology as code: a declarative definition expresses these dependencies explicitly and the deployment engine orders the work correctly, where a human clicking through the portal has to remember the sequence and gets a partially broken network whenever the memory fails. Treat the deployment order as part of the design, not as an afterthought, because the topology is only correct when every layer is present and pointing at the layer beneath it.
Begin with the peering, because the peering flags are where half of the topology’s behavior is decided. A peering has four switches that matter here, and each one maps to a row in the blueprint. The command below creates a peering from a spoke to the hub with the flags a routed hub-spoke needs.
# Spoke side of the peering: allow forwarded traffic and use the hub's gateway
az network vnet peering create \
--name spoke1-to-hub \
--resource-group rg-network \
--vnet-name vnet-spoke1 \
--remote-vnet vnet-hub \
--allow-vnet-access true \
--allow-forwarded-traffic true \
--use-remote-gateways true
# Hub side of the peering: allow gateway transit so the spoke can use the hub gateway
az network vnet peering create \
--name hub-to-spoke1 \
--resource-group rg-network \
--vnet-name vnet-hub \
--remote-vnet vnet-spoke1 \
--allow-vnet-access true \
--allow-forwarded-traffic true \
--allow-gateway-transit true
The flag named allow-forwarded-traffic is the one people skip and then spend an hour chasing. Allow-vnet-access governs whether the two networks can reach each other at all, and it defaults to enabled. Allow-forwarded-traffic governs whether the network will accept traffic that did not originate in the peered network but was forwarded by an appliance there. When spoke A’s traffic reaches the hub firewall and the firewall forwards it toward spoke B, that traffic arrives at the hub-to-B peering with a source address from spoke A, not from the hub. If allow-forwarded-traffic is off on that peering, Azure treats the forwarded packet as foreign and drops it. Spoke-to-spoke routing through a hub appliance therefore depends on allow-forwarded-traffic being enabled on the peerings the forwarded traffic crosses. This is a flag, not a route, and it is invisible in a routing table, which is why it is such a reliable source of confusion.
Allow-gateway-transit on the hub side and use-remote-gateways on the spoke side are the pair that implements gateway transit. They must be set together and in the right direction: the hub, which owns the gateway, allows transit, and the spoke, which borrows it, uses the remote gateway. Setting only one of the pair leaves the spoke without the propagated on-premises routes, and traffic to the corporate network falls through to the default route and dies in the same way unknown spoke traffic does. A spoke can use the remote gateway from exactly one peering, which is one of the structural reasons a spoke peers only to the hub and not to multiple hubs at once.
Next, the route table that forces egress and inter-spoke traffic through the firewall. The following creates a route table, adds the default route to the firewall, adds a route for a sibling spoke, and associates the table with a spoke subnet.
# Create the route table for spoke subnets
az network route-table create \
--name rt-spoke1 \
--resource-group rg-network
# Default route: send all egress to the hub firewall private IP
az network route-table route create \
--name to-firewall-default \
--resource-group rg-network \
--route-table-name rt-spoke1 \
--address-prefix 0.0.0.0/0 \
--next-hop-type VirtualAppliance \
--next-hop-ip-address 10.0.1.4
# Inter-spoke route: send spoke2 traffic to the hub firewall as well
az network route-table route create \
--name to-spoke2-via-firewall \
--resource-group rg-network \
--route-table-name rt-spoke1 \
--address-prefix 10.2.0.0/16 \
--next-hop-type VirtualAppliance \
--next-hop-ip-address 10.0.1.4
# Associate the route table with the workload subnet
az network vnet subnet update \
--name snet-workload \
--resource-group rg-network \
--vnet-name vnet-spoke1 \
--route-table rt-spoke1
The next-hop-type of VirtualAppliance is what tells Azure to forward the packet to a specific private IP inside the network rather than to a named gateway or out to the internet, and that private IP is the firewall’s address in the hub. The inter-spoke route is technically optional if your default route already covers it, because 0.0.0.0/0 will catch spoke2’s range too and send it to the firewall, but an explicit route for each spoke range makes the intent visible in the table and avoids surprises when someone later narrows the default route. Whichever you choose, the firewall must have a rule that permits the inter-spoke flow, or it will receive the packet and drop it on policy rather than on routing, which is a different failure to diagnose.
Which flag is responsible when spoke-to-spoke traffic fails?
When spoke-to-spoke traffic fails in a routed hub-spoke, three settings are the usual culprits: the user-defined route that should send the sibling spoke’s range to the firewall is missing, allow-forwarded-traffic is disabled on the peering the forwarded packet crosses, or the firewall has no rule permitting the flow. Check them in that order.
The reason this trio appears so consistently is that each one sits at a different layer and each one fails silently. The missing route means the packet never heads toward the firewall at all; it falls to the default and is lost. The missing forwarded-traffic flag means the packet reaches the firewall and is forwarded, but the receiving peering rejects it as foreign. The missing firewall rule means routing and forwarding both succeed but policy denies the connection. Because the three failures look similar from the source virtual machine, which simply sees no response, the discipline is to walk the layers in order rather than guess. Effective routes tell you whether the route exists. Connection tracking on the firewall tells you whether the packet arrived. The firewall logs tell you whether policy allowed it. The companion article on VNet peering failures covers the peering-state and flag failures in depth, including the forwarded-traffic case, and is the right next stop when the peering itself looks suspect.
A worked end-to-end diagnosis
Reading the rules in the abstract is one thing; applying them under pressure when a production workload cannot reach a database is another. The walkthrough below takes a single concrete failure from the first symptom to the confirmed fix, naming the command at each step, so that the method is reproducible rather than a matter of intuition. The scenario is the most common one teams report: a virtual machine in spoke 1 cannot reach a database server in spoke 2, both spokes are peered to the hub, and the hub runs Azure Firewall. The symptom is a connection timeout from the application, with no error in the Azure portal anywhere obvious.
The first move is to resist re-creating the peering, which is the instinct and the wrong one. A timeout means packets are leaving and not coming back, which points at routing or policy rather than at a broken connection. So the diagnosis starts at the routing layer, on the source, by reading the effective routes for the virtual machine’s network interface in spoke 1 and looking specifically for the route that matches spoke 2’s address range.
# Step 1: read the source VM's effective routes and find spoke 2's range
az network nic show-effective-route-table \
--name nic-spoke1-vm \
--resource-group rg-network \
--query "value[?contains(addressPrefix[0], '10.2')]" \
--output table
There are two possible outcomes, and they branch the diagnosis cleanly. If the route for spoke 2’s range shows a next hop of Internet, the user-defined route is missing: the spoke has no instruction to send spoke 2 traffic to the firewall, so it falls to the default and is sent toward the internet, where spoke 2’s private address does not exist. The fix is to add the route table from the configuration section and associate it with the subnet. If, instead, the route shows a next hop of the firewall’s private IP, routing is correct and the problem is downstream, which moves the diagnosis to the next layer.
Assume the route is correct and points at the firewall. The next question is whether the packet actually reaches spoke 2 after the firewall forwards it, and the cleanest way to test the real path rather than the intended one is Network Watcher connection troubleshoot, which sends genuine traffic from the source to the destination and reports the hop at which it fails and the reason.
# Step 2: test the real path from source VM to the destination in spoke 2
az network watcher test-connectivity \
--source-resource nic-spoke1-vm \
--dest-address 10.2.1.10 \
--dest-port 1433 \
--resource-group rg-network
The result names the failure. If it reports the traffic blocked at the firewall, policy is denying the flow and the firewall needs a rule permitting spoke 1 to reach spoke 2 on the database port. If it reports the traffic reaching the firewall and being forwarded but never arriving at spoke 2, the suspicion turns to the allow-forwarded-traffic flag on the hub-to-spoke-2 peering, because a forwarded packet carrying spoke 1’s source address will be rejected by the receiving peering if that flag is off. Confirm the flag directly.
# Step 3: confirm allow-forwarded-traffic on the hub-to-spoke-2 peering
az network vnet peering show \
--name hub-to-spoke2 \
--resource-group rg-network \
--vnet-name vnet-hub \
--query "allowForwardedTraffic" \
--output tsv
If the value is false, the receiving peering is dropping the forwarded traffic, and enabling the flag completes the path. If it is true, and the route was correct, and connection troubleshoot showed the firewall forwarding, then the remaining suspect is the firewall policy itself, which is read from the firewall rules and the firewall logs. The order is deliberate and it is the whole point of the method: route first, real-path test second, forwarding flag third, firewall policy last. Each step rules out one layer with a single command, and because the layers fail in ways that look identical from the application, walking them in order is faster than any amount of inspired guessing. The same discipline applies to every hub-spoke failure, not just spoke-to-spoke; the destination and the specific flag change, but the layered method does not.
One more refinement makes this method robust against the subtlest failure, the asymmetric return path. Even when the forward path is perfect, a stateful firewall will drop the reply if the return traffic from spoke 2 does not come back through the same firewall, because the firewall has no record of a connection it never saw the start of. To rule this out, read the effective routes on a network interface in spoke 2 and confirm that the route back to spoke 1’s range also points at the firewall. If spoke 2 routes the return traffic directly or out to the internet, the path is asymmetric and the stateful firewall is dropping a reply to a request it forwarded, which presents as a one-way connectivity failure that is genuinely confusing until the return route is checked. Symmetry is a property of both ends, so a complete diagnosis reads the routes on both spokes, not only on the one that reported the problem.
The failure modes and the tools that expose them
A hub-spoke network has a small, well-defined set of failure modes, and each maps to a specific diagnostic. The discipline that separates fast diagnosis from slow guessing is matching the symptom to the layer and then using the tool that reads that layer directly rather than inferring from the application’s silence. The recurring cases below are the ones engineers actually report, framed as patterns the model explains.
The first pattern is spoke-to-spoke traffic failing with no appliance in the hub. A team builds the hub-spoke, peers the spokes to the hub, and then discovers that two workloads cannot reach each other. The diagram showed them connected through the center, but the center has no forwarder. There is no firewall, no network virtual appliance, and no route pointing anywhere, so the spokes have reachability to the hub and to nothing beyond it. The fix is structural, not a setting: the hub needs an appliance to forward between spokes, and the spokes need routes pointing the sibling range at that appliance. Until the hub has something that routes, the spokes cannot reach each other, and no amount of peering re-creation changes that, because peering was never the missing piece.
The second pattern is gateway transit configured on only one side. The spoke needs to reach the on-premises network, the hub has a working VPN gateway, but the spoke still cannot reach corporate subnets. The peering shows Connected, the gateway shows up, and the tunnel is healthy, yet the route to on-premises is absent from the spoke. The cause is almost always that allow-gateway-transit was set on the hub but use-remote-gateways was not set on the spoke, or the reverse. The two flags are a matched pair, and a single one accomplishes nothing. The fix is to set both, after which the on-premises routes propagate into the spoke and the path opens.
The third pattern is internet egress escaping inspection. The organization stood up a hub firewall, intending all spoke traffic to the internet to pass through it for inspection and logging, and then discovers in the firewall logs that some spoke is reaching the internet directly without any record on the firewall. The cause is a missing user-defined route. Without a UDR for 0.0.0.0/0 pointing at the firewall, the spoke subnet uses its system default route straight to the internet, bypassing the hub entirely. The fix is the route table from the configuration section, associated with every spoke subnet that must be inspected. This failure is dangerous precisely because it is silent: the application works, traffic flows, and nothing breaks, so the gap is discovered only when someone audits the firewall logs and finds traffic missing.
The fourth pattern is the routing loop created by over-applying the default route. An engineer, wanting to be thorough, associates the same firewall-pointing route table with every subnet in the estate, including the firewall’s own subnet in the hub. Now traffic that the firewall is trying to send out to the internet hits the firewall subnet’s route table, which says send everything to the firewall, so the firewall sends its own egress back to itself. The traffic never leaves, and the symptom is total egress failure that appeared the moment the route table was applied broadly. The fix is to exclude the firewall’s subnet from the firewall-pointing route table, leaving it with its system default route to the internet so inspected traffic can actually exit. The rule of thumb is that the subnet hosting the appliance must never be told to route through the appliance, because that is the one place the loop closes on itself.
The fifth pattern is configuration drift across a growing estate, which is less a single failure than a class of intermittent ones. As spokes are added by hand over months, small inconsistencies accumulate: one spoke’s subnet was created later and never got its route table associated, another spoke had allow-forwarded-traffic left at the default on one of its peerings, a third was given a route table that points at an old firewall IP from before the firewall was rebuilt. Each inconsistency produces a connectivity failure for exactly one spoke or one flow, which makes it hard to attribute because the topology works for everything else. The diagnosis is the same layered method as any single failure, but the prevention is structural: define the topology as code so that every spoke is provisioned identically, and the drift that causes these one-off failures cannot accumulate in the first place. Drift is the failure mode that scales with the estate, and it is the strongest practical argument for not building a large hub-spoke by hand.
How do I confirm where a spoke’s traffic is actually going?
The authoritative tool is the effective routes view on the network interface. It shows the complete route table Azure has computed for that interface, merging system routes, peering routes, gateway-propagated routes, and user-defined routes in priority order, so it tells you exactly which next hop a packet to any destination will take. If the next hop is wrong, the topology is misconfigured at the routing layer.
# Show the effective routes for a spoke VM's network interface
az network nic show-effective-route-table \
--name nic-spoke1-vm \
--resource-group rg-network \
--output table
Reading the effective routes is the fastest way to settle the reachability-versus-routing question for any specific destination. If you are diagnosing failed spoke-to-spoke traffic, look up the sibling spoke’s range in the output. If the next hop is the firewall’s private IP, the route is correct and the problem is downstream at the forwarded-traffic flag or the firewall policy. If the next hop is Internet, the UDR is missing and the packet is being sent to die. If you are diagnosing failed egress inspection, look up 0.0.0.0/0. If its next hop is Internet rather than the firewall appliance, egress is bypassing the hub. The effective routes view collapses a great deal of guesswork into a single command, because it shows the result of all the route-priority arithmetic rather than the inputs you have to reason about by hand.
Network Watcher complements the effective routes view with two tools that test the path rather than just describe it. Connection troubleshoot sends real traffic from a source to a destination and reports where it succeeds or fails, including the hop at which a packet is dropped and whether the drop was a security rule or a route. Next hop, given a source virtual machine and a destination address, returns the next hop the platform would choose, which is the effective-routes lookup distilled to a single answer for one destination. Used together, the effective routes view tells you what the routing table says, and the Network Watcher tools tell you whether traffic actually behaves the way the table claims, which is how you catch a stateful-firewall asymmetry that the route table alone cannot reveal. The diagnostic patterns here generalize; the broader treatment of route tables, next-hop types, and how the platform composes a final routing decision is worth reading alongside this article in the route-tables reference within the series.
What is the single most common hub-spoke mistake?
Treating peering as connectivity. Engineers peer every spoke to the hub, see the status read Connected, and assume the network is finished, when peering has only given each spoke reachability to the hub. The missing pieces are the appliance in the hub that forwards between spokes and the user-defined routes that send traffic to it, and their absence is invisible until a flow that needs them fails.
This mistake is common because the platform makes the incomplete state look complete. The peerings show healthy, the spokes can reach the shared services in the hub, and a quick test of spoke-to-hub traffic succeeds, so the topology appears to work. It is only when a flow that depends on the hub routing actively, spoke-to-spoke or inspected egress, is exercised that the gap shows, and by then the design has often been declared done and handed off. The defense is to test the flows that depend on the hub-routes-for-the-spokes rule explicitly, as part of standing up the topology, rather than assuming that healthy peerings imply a working network. Stand up two spokes, force a connection between them through the hub, and confirm it on the firewall logs before trusting the design; the five minutes that takes saves the afternoon the silent gap would otherwise cost. Every other mistake in this article is a specific instance of this general one: the hub was expected to route, and the routing was never built.
How the topology interacts with the rest of the network
A hub-spoke does not exist in isolation. It connects to the corporate data center, to the public internet, to Azure platform services, and sometimes to other hubs in other regions, and each of those connections is shaped by the same hub-routes-for-the-spokes rule. The hub is the seam between the spokes and everything external, which means every external interaction is an opportunity to apply the rule cleanly or to violate it and create a gap.
On-premises connectivity is the clearest case. The corporate network reaches Azure through a VPN tunnel or an ExpressRoute circuit, and that connection terminates on a gateway in the hub. Gateway transit then shares that single gateway with every spoke, so a thirty-spoke estate needs exactly one gateway rather than thirty. The on-premises routes the gateway learns are propagated into each spoke that uses the remote gateway, and traffic from a spoke to a corporate subnet routes to the hub gateway and down the circuit. The reverse direction matters too: the corporate network must know the address ranges of the spokes, which it learns through the gateway’s route advertisement, so that return traffic finds its way back. Centralizing on-premises connectivity in the hub is one of the strongest reasons to adopt the topology in the first place, because it turns a per-workload connectivity problem into a single shared circuit.
The return direction deserves its own attention, because on-premises connectivity is where one-way failures most often hide. For a spoke to reach a corporate subnet, the spoke needs a route to that subnet through the hub gateway, which gateway transit supplies. For the reply to come back, the corporate routers need a route to the spoke’s address range, and they learn it through the gateway’s route advertisement, which means the spoke ranges must be summarized or advertised in a way the corporate side accepts. When a spoke can send to on-premises but never hears back, the forward routes in Azure are usually fine and the gap is that the corporate network never learned the spoke’s range, or learned it and a corporate firewall is dropping the return. The lesson is that on-premises connectivity is a two-party agreement: Azure must route to the corporate network and the corporate network must route back to the spokes, and a complete design verifies both directions rather than assuming the tunnel coming up means traffic flows both ways. The choice between a VPN gateway and an ExpressRoute circuit changes the performance and the cost of this connection but not its structure; either terminates in the hub and either shares with the spokes through the same gateway-transit mechanism, so the routing reasoning is identical regardless of which physical connection carries it.
Central egress through the hub firewall is the second external interaction, and it is where the topology earns its security value. By forcing every spoke’s internet-bound traffic through a firewall in the hub, the organization gains one place to apply egress rules, one place to log outbound connections, and one place to apply threat intelligence and fully qualified domain name filtering. The user-defined routes that send 0.0.0.0/0 to the firewall are what make this happen, and the firewall’s own rules decide what is allowed out. The choice of what sits in that central position, whether the managed Azure Firewall, a third-party network virtual appliance, or a combination with network security groups at the subnet edge, is a real design decision with cost and capability trade-offs; the comparison of Azure Firewall against an NVA and an NSG lays out which control belongs where and why an NSG alone cannot do the egress filtering the central firewall provides.
What the central firewall buys beyond a simple allow-or-deny is control over where traffic may go by name rather than by address, and that capability is the practical reason organizations route egress through it. A workload that must reach a specific software repository, a package feed, or an external partner endpoint can be permitted to reach exactly those fully qualified domain names and nothing else, so a compromised process inside the spoke cannot quietly exfiltrate data to an arbitrary destination because the firewall has no rule permitting that destination. Address-based filtering cannot achieve this cleanly, because the addresses behind a name change and because many services sit behind shared address ranges, so a name-based rule is both more precise and more durable. Threat-intelligence filtering adds a second layer, denying traffic to and from addresses and names known to be malicious without anyone having to maintain the list by hand. Centralizing these controls in one firewall in the hub means the policy is written once and applied to every spoke’s egress, rather than reinvented per workload, which is both less work and less likely to leave a gap.
Why does central egress matter for compliance and incident response?
Because it produces one authoritative record of everything that left the estate and one place to change what is allowed to leave. When every spoke’s outbound traffic passes through a single firewall, the firewall logs answer the auditor’s question of what connected to what, and an incident responder has one chokepoint to tighten rather than dozens of scattered egress paths to chase.
The value compounds during an actual incident, which is when a scattered network design hurts most. If a workload is suspected of being compromised, a central egress point lets you see exactly what destinations it has been reaching and, if needed, cut its outbound access at the hub immediately, across every path it might use, by changing one set of rules. A flat or per-workload egress design forces the responder to find and close every path the compromised workload could use, under time pressure, with incomplete knowledge of what paths exist. The hub-spoke’s central funnel turns that frantic search into a single, confident action, and the same logs that satisfy a routine audit become the timeline that reconstructs what happened. The topology’s security value and its operational value are the same property viewed from two angles: everything goes through one place, so everything can be seen and controlled from one place.
How does the hub mediate access to Azure platform services?
Spokes typically reach Azure platform services such as storage and databases through private endpoints, and the private DNS that resolves those endpoints is centralized in the hub. The spoke’s traffic to a private endpoint stays on the backbone, and the private DNS zones linked to the hub, or shared with the spokes, ensure the service name resolves to the private address rather than the public one across the whole estate.
Centralizing private DNS in the hub follows the same duplication-avoidance logic as every other shared service. Each workload could maintain its own private DNS zones, but that quickly becomes inconsistent and error-prone as the estate grows, with one spoke resolving a storage account privately and another resolving it publicly because someone forgot to link a zone. Putting the private DNS resolution in the hub, and pointing the spokes’ DNS settings at a resolver there, gives one consistent answer for every name across every workload. The traffic to the resolved private endpoint still flows according to the routing rules already described, through the firewall if a route forces it there, which means private connectivity to platform services and central egress inspection compose cleanly rather than fighting each other.
How do multiple hubs connect across regions?
A single hub serves a single region cleanly, but an estate that spans regions needs more than one hub, and the hubs themselves are connected by peering so that a spoke in one region can reach a spoke in another. Each region keeps its own hub with its own firewall and gateway, the hubs peer to each other, and routing between regions traverses the hub-to-hub link rather than a direct spoke-to-spoke peering across regions.
The cross-region case stretches the non-transitivity rule across two hops instead of one, and the consequence is more routing to maintain, not a new mechanism. A packet from a spoke in region A bound for a spoke in region B travels to its local hub, across the hub-to-hub peering to region B’s hub, and then to the destination spoke, which means every leg needs its routes and forwarding flags lined up. The spoke in region A needs a route for region B’s spoke range pointing at its local firewall, the region A hub needs to forward to region B’s hub, the hub-to-hub peering needs allow-forwarded-traffic, and region B’s hub needs to forward into its local spoke. Each additional region multiplies the route tables and the flags that have to be correct, and the manual burden grows faster than the number of hubs because the connections between them grow combinatorially. This is the scale at which a hand-built design starts to lose to a managed backbone, because the managed service computes the inter-region routing for you instead of leaving it as a mesh of route tables to maintain.
There are also platform limits that shape a large hub-spoke and that a production design must respect rather than discover. A virtual network supports a bounded number of peerings, which caps how many spokes a single hub can directly serve and forces very large estates toward either multiple hubs or a managed backbone. A route table supports a bounded number of routes, which caps how many explicit inter-spoke entries a spoke subnet can carry and is one more reason to prefer a broad default route to the firewall over an exhaustive per-spoke list. These limits are raised over time and should be verified against the current official figures rather than memorized as constants, but their existence, not their exact value, is the design input: a topology that ignores them will hit a wall at scale that no amount of correct routing logic can route around. The interaction between the hub-spoke and the broader Azure network, including how subnets, address space, and the default routing behave before you add any of this topology on top, is laid out in the Azure Virtual Network deep dive, which is the foundation this whole pattern assumes.
How to design a hub-spoke for production
Designing a production hub-spoke is mostly a matter of planning for growth and for the operational reality that the topology is managed by hand. The pattern that works at three spokes does not automatically work at fifty, and the difference is not the routing logic, which is identical, but the amount of manual configuration the routing logic requires and the discipline needed to keep it consistent.
Address planning comes first and is hard to change later. The hub and every spoke draw from a single planned address space, with no overlaps, because peering between overlapping ranges is not allowed and renumbering a live workload is painful. A common approach reserves a contiguous block for the whole topology, carves the hub from one end, and assigns spokes from the rest with room for each spoke to grow its subnets. Leaving generous gaps between spoke ranges costs nothing in an empty address space and saves a migration later when a workload needs more room. The hub itself needs dedicated subnets for the firewall, the gateway, and any shared resolver, each sized according to the appliance’s requirements, which are not all the same and some of which cannot be resized after creation.
Should spokes be split by environment or by application?
Both patterns are valid and they answer different needs. Splitting by application gives each system its own spoke so teams operate independently and a problem in one application stays in its spoke. Splitting by environment puts production, staging, and development in separate spokes so traffic and blast radius never cross environment boundaries. Large estates often combine the two, with a spoke per application per environment.
The decision turns on what you most need to keep apart, and the routing model makes either choice mechanical rather than special. If the dominant concern is team autonomy and per-application isolation, a spoke per application lets each team own its network without touching a neighbor’s, and the hub routes between them only where two applications genuinely integrate. If the dominant concern is keeping environments from contaminating each other, which matters for compliance and for preventing a staging mistake from reaching production data, a spoke per environment draws the hard line where it counts. The combined approach, a spoke for each application in each environment, gives the finest isolation at the cost of the most spokes to manage, which loops directly back to the scaling pressure: the more finely you slice, the more peerings and route tables you maintain, and the sooner infrastructure as code stops being optional. There is no universally correct slice; there is the slice that matches the boundary your organization most needs to enforce, chosen deliberately rather than inherited from a diagram someone copied.
The routing configuration scales linearly with the number of spokes, and that is the operational pressure point. Each new spoke needs its peering pair with the correct flags, its route table associated with each subnet, and, if you route inter-spoke traffic explicitly, a route entry for every sibling it must reach. At three spokes this is trivial. At thirty spokes, hand-maintaining peerings and route tables becomes a source of drift and error, where one spoke has a flag set wrong or a route table that was never associated, and the inconsistency surfaces as an intermittent connectivity problem that is hard to attribute. The defense is to stop configuring by hand. Define the topology as infrastructure as code, so that every spoke is created from the same template with the same flags and the same route-table associations, and the consistency is enforced by the definition rather than by the operator’s memory. This is the point in a hub-spoke’s life where the manual design starts to feel like a burden, and it is the natural moment to evaluate whether a managed alternative would carry the routing for you.
Defining the topology as code is worth treating concretely rather than as advice, because the value is in the specifics. A spoke module that takes the spoke’s address range, the hub’s firewall IP, and the list of sibling ranges, and produces the virtual network, the subnets, the peering pair with the correct flags, the route table, and the subnet associations, turns adding a spoke into a single parameterized call that cannot forget a flag or skip an association. The flags that humans get wrong, allow-forwarded-traffic and the gateway-transit pair, are set once in the module and applied identically every time, so the drift that produces one-off connectivity failures has no way to creep in. A Bicep or Terraform definition of the spoke also makes the topology reviewable: a change to a spoke’s routing shows up as a diff in version control rather than as an undocumented portal edit, and the whole estate’s routing can be reasoned about by reading the definitions rather than by clicking through dozens of route tables. The investment in a spoke module pays for itself the moment the estate grows past the handful of spokes a person can hold in their head.
Network security groups belong in the design too, and their placement follows the same layered logic as the routing. The central firewall in the hub handles egress and inter-spoke policy, but it does not replace network security groups at the subnet level inside each spoke, which enforce the workload’s own segmentation: the web tier may reach the application tier, the application tier may reach the data tier, and nothing reaches the data tier from outside its spoke. The firewall is the coarse, central control for traffic crossing the hub; the network security groups are the fine, local control for traffic within a spoke. They compose rather than compete, and a production design uses both, with the firewall deciding what may cross between spokes and to the internet, and the network security groups deciding what may move between the tiers of a single workload. Trying to enforce intra-spoke tier segmentation at the central firewall is possible but wasteful, because it routes local traffic out to the hub and back for a decision a local rule could make in place, and trying to enforce egress filtering with network security groups fails outright because a network security group cannot filter by fully qualified domain name.
How should a production hub-spoke be monitored?
A production hub-spoke is monitored at the choke point it creates: the hub. Because all inter-spoke and egress traffic passes through the hub firewall, the firewall logs are a near-complete record of what the estate is doing, and flow logs on the network security groups capture what moves within and at the edge of each spoke. Connection metrics and the effective routes view give the live picture when something fails.
The deeper point is that the topology’s central choke point, which is its security value, is also its observability value. A flat network with workloads talking directly to each other and to the internet has traffic scattered across many paths that no single log captures. The hub-spoke deliberately funnels everything through one firewall, which means one set of logs answers most questions about who talked to whom and what left for the internet. Pointing those logs and the network security group flow logs at a central log workspace, and alerting on the patterns that matter, such as a spoke suddenly attempting egress it never made before, turns the topology into something you can watch rather than something you hope is behaving. The monitoring story is one more reason the central path is worth the routing effort it demands: the same funnel that lets you control traffic lets you see it.
When does Virtual WAN replace a hand-built hub-spoke?
Virtual WAN becomes the better choice when the manual management of peerings, route tables, and gateways across many spokes and multiple regions outweighs the control of hand-building the topology. Virtual WAN provides a Microsoft-managed hub with managed routing and routing intent, so the platform carries the spoke-to-spoke and on-premises routing that you otherwise wire by hand, in exchange for less granular control.
The decision is genuinely a trade, not an upgrade, which is why it deserves a real comparison rather than a default. A hand-built hub-spoke gives you complete control over the appliance in the hub, the exact routes, and the firewall, at the cost of building and maintaining all of it yourself. Virtual WAN gives you a managed hub that handles the routing automatically and connects regions and branches without the manual peering mesh, at the cost of fitting your design into what the managed service supports. The deciding factors are scale and operational appetite: a handful of spokes in one region rarely justifies the move, while many spokes across several regions, or a need to connect branch offices and software-defined wide-area networks, pushes hard toward the managed option. The full decision, with the deciding factor named branch by branch, is the subject of the dedicated comparison of hub-spoke against Virtual WAN, which is where to go when you are weighing the switch for a real estate rather than learning the manual pattern. To build either design hands-on, stand up the networks, peer them, and route spoke traffic through a hub appliance in a sandbox, you can run the hands-on Azure labs and command library on VaultBook, where the peering flags, route tables, and effective-routes checks from this article can be exercised against live resources until the behavior is second nature.
The closing verdict
The hub-spoke topology is not a diagram you copy; it is a routing rule you apply. Peering gives a spoke reachability to the hub and nothing more, and because peering is non-transitive, the hub must actively carry spoke-to-spoke, on-premises, and internet traffic through an appliance or a gateway, with user-defined routes in the spokes pointing the traffic at the hub. Internalize that one rule and the entire design follows: the shared services live in the hub because duplication is wasteful, the workloads live in isolated spokes because isolation is the default safe posture, the peerings connect spokes only to the center, and the routing does the rest of the work that the peering deliberately does not. Every failure mode you will meet is a violation of the rule, and every diagnosis starts at the effective routes, because the routing table is where the topology is true or false. Build it by hand while the estate is small and the control is worth the maintenance; reach for the managed alternative when the manual routing across many spokes and regions becomes the thing you spend your time fighting. Either way, the hub routes for the spokes, and that is the whole topology.
If you carry one practical habit away from this article, let it be the order of diagnosis, because it is where the rule meets the keyboard. When a flow fails, do not re-create the peering and do not widen a firewall rule on a hunch. Read the effective routes on the source first and learn where the packet is actually told to go, test the real path with Network Watcher second, check the forwarded-traffic flag on the receiving peering third, and read the firewall policy and logs last. That sequence walks the layers from routing to policy in the order they fail, and it turns a confusing timeout into a single answered question at each step. The topology rewards engineers who reason from how the platform behaves rather than from how the diagram looks, and the effective routes view is where the platform tells you the truth. Build the design deliberately, test the flows that depend on the hub before trusting it, define it as code once it outgrows what one person can hold in mind, and revisit the managed alternative when the manual routing across many spokes and regions becomes the work itself. The hub-spoke is a small set of rules applied with discipline, and discipline is what makes it scale.
Frequently Asked Questions
Q: What is a hub-spoke network topology in Azure?
A hub-spoke network topology is a connectivity pattern in which a central virtual network called the hub holds the services every workload shares, such as a firewall, a VPN or ExpressRoute gateway, and shared name resolution, while multiple workload virtual networks called spokes each connect to the hub through virtual network peering. The spokes stay isolated from one another and carry only their own application. Traffic between spokes, traffic to the corporate network, and traffic out to the internet all route through the hub, because the hub owns the appliances and gateways that forward it. The pattern exists to give many isolated workloads a shared set of central services and one controlled path to external networks without wiring every workload to every other workload by hand, which is why it is the most common enterprise networking shape in Azure.
Q: What belongs in the hub versus the spokes?
The hub holds anything that every workload would otherwise have to duplicate: the central firewall or network virtual appliance that inspects and controls outbound traffic, the VPN or ExpressRoute gateway that connects back to the corporate network, shared DNS resolution and private DNS zones, and shared infrastructure such as jump hosts, identity components, and monitoring collectors. A spoke holds exactly one workload or one environment and nothing shared. A spoke owns its own address space, its own subnets for the application tiers, its own network security groups, and its own application, and it borrows every central service from the hub. The dividing principle is duplication avoidance paired with isolation: if every workload would need its own copy, it belongs in the hub, and if it is specific to one workload, it stays in that workload’s spoke so a problem there does not spread.
Q: Why can two spokes peered to the same hub not reach each other?
Because virtual network peering is not transitive. A peering connects exactly two networks, and Azure does not automatically forward traffic from one peering across the hub into another. When spoke A is peered to the hub and spoke B is peered to the hub, A has a route to the hub’s range and B has a route to the hub’s range, but A has no route to B’s range and the hub has no instruction to forward A’s traffic into the B peering. The packet from A reaches the hub and is dropped because no route carries it onward. To connect the spokes you must place an appliance or firewall in the hub that forwards between them, add user-defined routes in each spoke that point the sibling’s range at that appliance, and enable allow-forwarded-traffic on the peerings the forwarded traffic crosses. The non-transitivity is the defining property of the topology, not a defect.
Q: How do peering and gateway transit work together to connect spokes?
Peering gives each spoke reachability to the hub by adding automatic system routes for the hub’s address range. Gateway transit then lets the spokes use the hub’s gateway without owning one each. You enable allow-gateway-transit on the hub side of the peering and use-remote-gateways on the spoke side, and the spoke begins to receive the on-premises routes that the hub’s VPN or ExpressRoute gateway has learned. Those propagated routes give the spoke a path to corporate subnets that points at the hub gateway. The two flags are a matched pair and must be set together in the correct direction, the hub allowing transit and the spoke using the remote gateway. A spoke can use the remote gateway from only one peering, which is one reason a spoke peers solely to the hub. Together, peering provides the local reachability and gateway transit provides the shared path to everything the gateway reaches.
Q: How does a central firewall in the hub control egress for every spoke?
You force every spoke’s outbound traffic through the firewall with user-defined routes. In each spoke subnet you associate a route table containing a route for 0.0.0.0/0 whose next-hop type is VirtualAppliance and whose next-hop address is the firewall’s private IP in the hub. Because user-defined routes take priority over the system default route to the internet, all non-local spoke traffic now lands on the firewall instead of egressing directly. The firewall then applies its rules, logs the connections, and forwards the allowed traffic out. This gives the organization one place to apply egress policy, one set of logs for outbound connections, and one place to enforce fully qualified domain name and threat-intelligence filtering. The firewall’s own subnet must keep its system default route so inspected traffic can actually leave, and the firewall must have rules permitting the flows, or routing will succeed while policy silently denies the connection.
Q: What is the difference between allow-vnet-access and allow-forwarded-traffic on a peering?
Allow-vnet-access controls whether the two peered networks can reach each other at all, and it is enabled by default, so disabling it severs the peering’s basic connectivity. Allow-forwarded-traffic controls whether a network will accept traffic that did not originate in the peered network but was forwarded into it by an appliance, such as a firewall. In a routed hub-spoke, when spoke A’s traffic is forwarded by the hub firewall toward spoke B, the packet arrives at the hub-to-B peering carrying spoke A’s source address rather than the hub’s. If allow-forwarded-traffic is disabled on that peering, Azure treats the forwarded packet as foreign and drops it, which breaks spoke-to-spoke routing even when every route is correct. The flag is invisible in a routing table, so a missing allow-forwarded-traffic setting is one of the most reliably confusing hub-spoke failures, presenting as dropped traffic with healthy peerings and correct effective routes.
Q: How do I diagnose spoke-to-spoke traffic that is failing?
Walk the layers in order rather than guessing, because the three usual causes fail at different points and look identical from the source virtual machine. First, check the effective routes on the source interface for the sibling spoke’s range; if the next hop is not the hub firewall, the user-defined route is missing and the packet is being sent to the internet to die. Second, if the route is correct, verify that allow-forwarded-traffic is enabled on the peerings the forwarded packet crosses, because a disabled flag causes the receiving peering to reject the forwarded traffic. Third, if routing and forwarding are both correct, check the firewall logs and rules, because the firewall may be receiving the packet and denying it on policy. The discipline is route first, forwarding flag second, firewall policy third. Network Watcher connection troubleshoot and next hop confirm where real traffic actually goes when the static views leave any doubt.
Q: Why is my spoke reaching the internet without going through the hub firewall?
The spoke subnet is missing the user-defined route that forces egress through the firewall. Without a UDR for 0.0.0.0/0 pointing at the firewall’s private IP, the subnet uses its system default route, which sends unknown and internet-bound traffic straight out to the internet, bypassing the hub entirely. The application keeps working, so nothing appears broken, which is why this gap is usually discovered only when someone audits the firewall logs and finds traffic that should be there is missing. The fix is to create a route table with the 0.0.0.0/0 route whose next-hop type is VirtualAppliance and next-hop address is the firewall, then associate that route table with every spoke subnet that must be inspected. Verify the result by reading the effective routes for an interface in the subnet and confirming that the next hop for 0.0.0.0/0 is now the firewall appliance rather than Internet.
Q: What causes asymmetric routing through a hub firewall, and why does it drop traffic?
Asymmetric routing happens when the forward path of a connection passes through the stateful firewall but the return path does not, or takes a different route, so the firewall sees only one direction of the flow. A stateful firewall tracks connections and expects to see both the request and the matching reply; when the reply arrives by a path that skips the firewall, or the reply never reaches the firewall because the return route points elsewhere, the firewall has no record of the connection and drops the reply. In a hub-spoke this commonly appears when on-premises-bound traffic is forced through the firewall but the gateway subnet lacks a route table that sends the spoke-bound return traffic back through the same firewall. The fix is to give the gateway subnet a route table that routes the spoke ranges back through the firewall, so the forward and return paths are symmetric and the stateful appliance sees the complete flow.
Q: Can a spoke use gateways in more than one hub at the same time?
No. A spoke can use the remote gateway from exactly one peering. The use-remote-gateways setting can be enabled on only one of a spoke’s peerings, because the spoke can borrow on-premises connectivity from a single hub gateway, not from several at once. This is one of the structural reasons a spoke peers only to its hub in the classic single-hub design. Topologies that need a spoke to reach gateways in multiple hubs, such as multi-region designs, solve it differently, often by connecting the hubs to each other and routing between regions through the hub-to-hub link, or by moving to a managed backbone such as Virtual WAN that handles cross-region routing for you. Trying to enable use-remote-gateways on two peerings of the same spoke will not give you redundant gateway access; the configuration is simply not permitted, and the design has to account for the single-gateway constraint from the start.
Q: How does a hub-spoke topology scale as the number of spokes grows?
The routing logic is identical at three spokes and at fifty, but the manual configuration scales linearly and becomes the operational pressure point. Each new spoke needs its peering pair with the correct flags, a route table associated with each of its subnets, and route entries for any sibling spokes it must reach. At small numbers this is trivial; at large numbers, hand-maintaining peerings and route tables across the estate becomes a source of configuration drift, where one spoke has a flag set wrong or a route table that was never associated, surfacing as an intermittent connectivity problem that is hard to attribute. The defense is to stop configuring by hand and define the whole topology as infrastructure as code, so every spoke is created from the same template with the same flags and associations, and consistency is enforced by the definition. When the manual management across many spokes and regions outweighs the control, that is the signal to evaluate Virtual WAN.
Q: Do I need a firewall in the hub, or is a network virtual appliance enough?
You need something in the hub that forwards traffic and applies policy, and that something can be the managed Azure Firewall or a third-party network virtual appliance, depending on the capabilities you require. The managed firewall gives you application and network rules, threat intelligence, and fully qualified domain name filtering without managing the underlying instances, and it is the common default for central egress. A third-party network virtual appliance makes sense when you need a specific capability the managed firewall does not offer, such as a particular vendor’s deep inspection or an existing operational toolchain. Both play the same structural role in the topology: they are the transit forwarder that non-transitive peering cannot provide, the next hop your spoke route tables point at. Network security groups complement either one at the subnet edge but cannot replace it, because an NSG does not perform fully qualified domain name egress filtering or act as a routing next hop for inter-spoke traffic.
Q: What address planning mistakes break a hub-spoke later?
The two most damaging mistakes are overlapping address ranges and under-sized allocations. Peering between two networks with overlapping address space is not permitted, so if a spoke’s range overlaps the hub’s or another spoke’s, you cannot peer it without renumbering, and renumbering a live workload is painful. Allocate the entire topology from a single planned block with no overlaps and leave generous gaps between spoke ranges so a workload can grow its subnets without colliding with a neighbor. The second mistake is under-sizing the hub’s dedicated subnets for the gateway and the firewall; some of these subnets have minimum-size requirements and cannot be resized after creation, which forces a rebuild if they were created too small. Plan the address space once, with room to grow, before deploying anything, because address changes after workloads are live are the hardest changes to make in any virtual network design.
Q: Why does the effective routes view matter more than the peering status when debugging?
Because the peering status only tells you that two networks are connected, while the effective routes view tells you where a packet to any specific destination will actually go. A hub-spoke failure is almost always a routing problem, not a connectivity problem, and the peering status will read Connected throughout the failure, which is misleading. The effective routes view on a network interface shows the complete computed route table, merging system routes, peering routes, gateway-propagated routes, and user-defined routes in priority order, so a single lookup tells you the real next hop for the destination you care about. If a spoke cannot reach its sibling and the effective route for the sibling’s range shows a next hop of Internet rather than the firewall, you have found the problem in one command. The peering status would have told you only that the network you do not need to fix is healthy.
Q: How does the hub centralize on-premises connectivity for all spokes?
The corporate network connects to Azure through a VPN tunnel or an ExpressRoute circuit that terminates on a single gateway in the hub. Gateway transit then shares that one gateway with every spoke that enables use-remote-gateways, so a thirty-spoke estate needs exactly one gateway rather than thirty. The on-premises routes the gateway learns are propagated into each participating spoke, giving every spoke a path to corporate subnets that points at the hub gateway. The corporate network, in turn, learns the spoke address ranges through the gateway’s route advertisement, so return traffic finds its way back to the right spoke. This centralization is one of the strongest reasons to adopt the topology, because it converts a per-workload connectivity problem, where every workload would otherwise need its own expensive gateway and circuit, into a single shared circuit that the hub owns and every spoke borrows through transit.
Q: Should inter-spoke routes be explicit per spoke, or covered by the default route?
Either approach works, and the choice is about clarity and future-proofing rather than function. A default route of 0.0.0.0/0 pointing at the hub firewall already catches every sibling spoke’s range, because that range is more specific than the destinations the spoke reaches directly but is still covered by the broad default, so inter-spoke traffic will route to the firewall without any explicit per-spoke entry. The argument for adding an explicit route for each sibling spoke’s range is that it makes the intent visible in the route table and protects you if someone later narrows or removes the broad default route, at which point the sibling traffic would otherwise lose its path. The cost is more route entries to maintain as spokes are added. Many teams use the default route for egress and add explicit inter-spoke routes only where the inter-spoke flow is important enough to document in the table, balancing clarity against maintenance.
Q: How does private DNS fit into a hub-spoke topology?
Private DNS is centralized in the hub for the same reason every other shared service is: consistency and duplication avoidance. Spokes reach Azure platform services such as storage and databases through private endpoints, and resolving those endpoints to their private addresses requires private DNS zones. Rather than each spoke maintaining its own zones, which drifts into inconsistency as the estate grows, the private DNS zones are linked to the hub or shared with the spokes, and the spokes point their DNS settings at a resolver in the hub. Every workload then gets one consistent answer for every name, so a storage account resolves to its private address from every spoke rather than privately from one and publicly from another. The traffic to the resolved private endpoint still follows the routing rules of the topology, flowing through the firewall if a route forces it there, so private connectivity to platform services and central egress inspection compose without conflict.
Q: Do network security groups replace the central firewall in a hub-spoke?
No, they serve different layers and a production design uses both. The central firewall in the hub is the coarse control for traffic crossing between spokes and out to the internet, and it can filter by fully qualified domain name, apply threat intelligence, and act as the routing next hop that non-transitive peering needs. Network security groups are the fine, local control inside each spoke, enforcing the workload’s own segmentation between its web, application, and data tiers. A network security group cannot perform fully qualified domain name egress filtering and cannot act as a routing next hop for inter-spoke traffic, so it cannot replace the firewall. Conversely, enforcing intra-spoke tier segmentation at the central firewall is wasteful, because it routes local traffic out to the hub and back for a decision a local rule could make in place. The two controls compose: the firewall decides what crosses the hub, the network security groups decide what moves within a spoke.
Q: Why should I define a hub-spoke as infrastructure as code rather than building it in the portal?
Because the topology’s correctness depends on a small set of easily forgotten flags and associations that scale linearly with the number of spokes, and human configuration drifts. A spoke module in Bicep or Terraform that takes the spoke’s address range, the hub firewall IP, and the sibling ranges, and produces the network, subnets, peering pair with the correct flags, route table, and associations, makes adding a spoke a single parameterized call that cannot skip allow-forwarded-traffic or forget to associate a route table. It also makes the routing reviewable, since a change appears as a diff in version control rather than as an undocumented portal edit, and the whole estate’s routing can be understood by reading the definitions. Portal builds are fine for learning the pattern or for a two-spoke proof of concept, but a production estate that grows over time accumulates the one-off inconsistencies that produce intermittent, hard-to-attribute connectivity failures, and the code definition is what prevents them.
Q: How do spokes in different regions reach each other in a multi-region hub-spoke?
Through the hubs, not through a direct cross-region spoke peering. Each region has its own hub with its own firewall and gateway, the hubs are peered to each other, and a packet from a spoke in one region to a spoke in another travels to its local hub, across the hub-to-hub peering to the remote hub, and then to the destination spoke. The non-transitivity rule now applies across two hops, so every leg needs its routing aligned: the source spoke routes the remote spoke’s range to its local firewall, the local hub forwards to the remote hub, the hub-to-hub peering has allow-forwarded-traffic enabled, and the remote hub forwards into its local spoke. Each region added multiplies the route tables and flags that must be correct, and the connections between hubs grow combinatorially, which is the scale at which a hand-built design tends to lose to a managed backbone such as Virtual WAN that computes the inter-region routing for you rather than leaving it as a mesh to maintain by hand.