VNet Peering vs VPN vs ExpressRoute

Three Azure connectivity options sit in front of every engineer who needs to join two networks, and the choice between VNet peering, a VPN gateway, and ExpressRoute is where a surprising number of designs go wrong before a single packet moves. The mistake is rarely a misconfigured route. It is reaching for the wrong primitive at the start, because all three promise to connect things and the marketing language blurs what each one is actually for. One links virtual networks inside Azure at near wire speed. One builds an encrypted tunnel across the public internet to reach an office or another cloud. One leases a private circuit through a connectivity provider so your traffic never touches the internet at all. Treat them as interchangeable and you end up paying for a private circuit when a peering link would have done, or pushing production database replication through an internet tunnel that throttles under load.

This guide settles the choice with a rule you can defend in a design review, a decision table you can paste into a wiki, and the routing and transitivity behavior that explains why the obvious arrangement sometimes fails to pass traffic. The aim is not to memorize a feature matrix. It is to leave able to name what you are connecting and how private and fast that connection must be, then watch the option fall out of those two answers before cost ever enters the conversation.

VNet peering vs VPN vs ExpressRoute Azure connectivity decision - Insight Crunch

The connect-what-and-how-private rule

Here is the claim this whole article defends, and it is worth stating plainly before the detail buries it. The connectivity choice reduces to two questions, and answering them picks the option before you compare a single price. The first question is what you are connecting: two Azure virtual networks, an on-premises site, or another provider’s network. The second question is how private and how fast that path must be: does the traffic need to stay off the public internet, and does it need predictable bandwidth and low jitter rather than best-effort throughput.

Answer those two and the option is almost forced. Connecting Azure virtual networks to each other, where both endpoints already live on the Microsoft backbone, points at VNet peering, because it keeps the traffic on that backbone at the latency and bandwidth the backbone provides, with nothing to provision but the link itself. Connecting an on-premises site or a remote network where the requirement is “reachable and encrypted” rather than “fast and guaranteed” points at a VPN gateway, because a site-to-site VPN rides the internet you already pay for and stands up in under an hour. Connecting an on-premises site or datacenter where the requirement is “private, predictable, and high bandwidth” points at ExpressRoute, because only a private circuit through a provider keeps the traffic off the internet with a committed bandwidth tier behind it.

The rule matters because it inverts the usual reasoning. Engineers tend to start from cost, see that peering is cheap and ExpressRoute is expensive, and reverse into a justification. That reasoning produces a VPN where an ExpressRoute circuit was needed, because the bill looked friendlier, and then a painful migration eight months later when the tunnel cannot carry the replication traffic. Start instead from what and how-private, let those two answers nominate the option, and only then check whether the cost is acceptable or whether a cheaper option genuinely meets the requirement. Cost is a constraint you apply to the nominee, not the criterion that selects it.

What each option actually connects

Before the comparison can mean anything, you have to hold an accurate model of what each option joins and how the bytes travel. The three options operate at different layers of the connectivity stack, and conflating them is the source of most early design errors.

What does VNet peering connect, and how?

VNet peering connects two Azure virtual networks so that resources in each reach the other by private IP, as though they shared one address space. Azure stitches the two networks together at the backbone, so traffic between peered virtual networks stays on the Microsoft network and never traverses a gateway or the public internet. There is no tunnel and no encryption overhead in the data path, and the link adds essentially no latency beyond the backbone itself.

Peering comes in two forms that behave the same in the data path but differ in scope. Regional VNet peering joins two virtual networks in the same Azure region. Global VNet peering joins virtual networks in different regions, so a virtual network in East US can reach one in West Europe by private IP, with the traffic crossing Microsoft’s global backbone rather than the internet. Both forms require non-overlapping address spaces, because once the networks share a routing fabric, two subnets claiming the same range cannot be told apart. The relationship is established on both sides, and a link configured on only one virtual network sits in an Initiated state until the other side completes it, at which point both move to Connected and traffic flows.

The behavior that defines peering, and the one engineers most often get wrong, is that it is non-transitive. If network A peers with a hub network B, and B also peers with network C, A cannot reach C through B by default. The link joins exactly the two networks named in it and propagates nothing further. Making A reach C requires either a direct A-to-C peering or a routing appliance in the hub plus user-defined routes that send A’s traffic to that appliance and onward to C. The transitivity question deserves its own treatment later, because the assumption that it chains through a hub breaks more hub-and-spoke designs than any other single mistake. The deeper mechanics of the virtual network itself, including how address space and subnets and the default routing behavior are laid out, are covered in the Azure Virtual Network deep dive that explains the address space and routing boundary, and peering builds directly on that foundation.

What does a VPN gateway connect, and how?

A VPN gateway connects a virtual network to a network outside Azure, most often an on-premises site, across an encrypted tunnel that rides the public internet. The gateway is a managed pair of virtual machines that Azure deploys into a dedicated gateway subnet, and it terminates IPsec or IKE tunnels from a peer device at the other end, encrypting every packet so the traffic is protected even though it crosses shared internet links.

There are several connection shapes the gateway supports. A site-to-site VPN connects a whole network, such as an office or datacenter, to the virtual network through a tunnel between the Azure gateway and an on-premises VPN device with a public IP. A point-to-site VPN connects an individual client machine, such as a laptop, to the virtual network without any on-premises device, which suits developers and small remote-access cases. A VNet-to-VNet VPN connects two Azure virtual networks through gateways on each side, which is occasionally used where it does not fit, though peering is almost always the better choice for joining two Azure networks because it avoids the gateway and the encryption overhead entirely. The throughput a VPN gateway can carry depends on its SKU, and that ceiling is the lever that decides whether a VPN remains adequate as a workload grows. The full gateway model, including the SKU tiers and the active-active and zone-redundant arrangements, is treated in the Azure VPN Gateway deep dive that walks the gateway and tunnel model.

The defining trait of the VPN path is that it traverses the public internet, encrypted. That is its strength and its limit at once. The strength is that it needs no provider relationship and no physical circuit: if both ends have internet connectivity and a public IP, the tunnel can be built quickly and cheaply. The limit is that internet transit is best-effort, so bandwidth, latency, and jitter vary with conditions outside your control, and the gateway SKU caps aggregate throughput well below what a dedicated circuit delivers.

What does ExpressRoute connect, and how?

ExpressRoute connects an on-premises network to Azure over a private circuit provisioned through a connectivity provider, so the traffic never touches the public internet. Rather than tunneling across shared internet links, you establish a dedicated connection from your network into a Microsoft edge location through a partner, and BGP sessions exchange routes so your on-premises ranges and the Azure ranges become mutually reachable over that private path.

The circuit has a provisioned bandwidth tier, and that committed bandwidth is the reason ExpressRoute exists where a VPN would not suffice. Because the path is private and the bandwidth is provisioned rather than borrowed from the internet, the connection delivers predictable throughput and far steadier latency than an internet tunnel, which is what makes it the right choice for heavy, latency-sensitive, or compliance-bound traffic such as database replication, large data transfer, or workloads that must demonstrably stay off the public internet. ExpressRoute also supports peering types that determine what the circuit reaches: private peering carries traffic to your virtual networks by private IP, and Microsoft peering carries traffic to Microsoft public services over the circuit rather than the internet. The architecture, the provider models, and the resiliency design are covered in the Azure ExpressRoute deep dive that lays out the circuit and provider model.

The defining trait is privacy with a committed bandwidth tier behind it, bought at the price of a provider relationship and a higher recurring cost. ExpressRoute is not faster than peering for intra-Azure traffic, and it is not a replacement for peering between virtual networks. It is the option for the on-premises-to-Azure path when that path must be private and predictable rather than merely reachable.

The InsightCrunch connectivity decision table

The decision table below is the findable artifact for this article, and it is built to be paste-able into a design document. Each row scores one option on what it connects, the latency it adds, the bandwidth it offers, whether the path is private, the relative cost, and whether the connection is transitive, and the final column names the single deciding signal that points at that option. Read across a row to confirm a candidate; read down the deciding-signal column to find the option a requirement nominates.

Option	What it connects	Latency added	Bandwidth	Private path	Relative cost	Transitive	Deciding signal
VNet peering	Azure VNet to Azure VNet (same or cross region)	Negligible, backbone only	Backbone capacity, no SKU cap in the link	Yes, stays on Microsoft backbone	Low, charged on data transferred	No, point to point only	You are joining two Azure virtual networks
VPN gateway (site-to-site)	Azure VNet to on-premises or remote network	Variable, internet plus encryption	Capped by gateway SKU, internet bound	No, encrypted over public internet	Low to moderate, gateway hours plus egress	Via gateway transit and routing	You need on-premises reach quickly and cheaply, bandwidth modest
ExpressRoute	On-premises or datacenter to Azure over a provider circuit	Low and steady, private path	Provisioned tier, high and predictable	Yes, never touches the internet	Higher, circuit plus port plus provider	Via gateway transit and routing	You need private, predictable, high bandwidth to on-premises

A few clarifications keep the table honest. The latency column is relative, not a guaranteed number: peering inherits the backbone latency between the two regions, a VPN inherits whatever the internet path between the sites delivers plus the cost of encryption, and ExpressRoute inherits the steadier latency of a private circuit. The bandwidth column reflects the structural ceiling: a peering link does not impose its own throughput cap the way a gateway SKU does, a VPN gateway’s aggregate is bounded by its SKU, and ExpressRoute is bounded by the provisioned circuit tier you pay for. The transitive column is the one to read most carefully, because it is non-transitive by itself while the gateway-based options can carry traffic onward when gateway transit and user-defined routes are configured, and that asymmetry shapes every hub-and-spoke design. Verify the exact gateway SKU throughput figures and the ExpressRoute circuit tiers against the current official Azure limits before you commit a number to a capacity plan, because both change over time.

How the three compare on latency and bandwidth

Latency and bandwidth are where the structural differences between the options become operational. The numbers are not the point; the shape of each path is. Peering rides the Microsoft backbone with no tunnel and no SKU-imposed cap on the link, so intra-Azure traffic between peered networks runs at backbone latency and at whatever throughput the workload and the underlying network can sustain. A VPN gateway rides the public internet, so its latency is whatever the internet path between the two sites happens to be on a given day, plus the small but real cost of encrypting and decrypting every packet, and its aggregate throughput is bounded by the gateway SKU you chose. ExpressRoute rides a private circuit with provisioned bandwidth, so it delivers steadier latency than the internet and a throughput ceiling you bought rather than one you hope the internet provides.

How much latency does each option add?

Peering adds essentially nothing beyond the backbone path between the two networks. A VPN gateway adds the variable internet round-trip between the sites plus per-packet encryption overhead, so latency moves with internet conditions. ExpressRoute adds the latency of the private circuit, which is low and far steadier than internet transit because the path is dedicated rather than shared.

That difference in shape matters more than any single millisecond figure. A chatty application that makes many small round trips is punished by jitter, where the round-trip time swings rather than staying flat, and an internet-bound VPN is the most likely of the three to deliver jitter because it shares links with everyone else’s traffic. A bulk transfer that moves large volumes cares more about sustained throughput than about jitter, and there the question becomes whether the gateway SKU or the circuit tier can keep up. The practical reading is that latency-sensitive, chatty workloads on the on-premises path favor ExpressRoute for its steadiness, while modest, bursty, or tolerant traffic can live on a VPN, and intra-Azure traffic between virtual networks belongs on peering regardless because nothing else competes with the backbone for that path.

Bandwidth follows the same logic. The peering link itself does not throttle; throughput between peered virtual networks is governed by the resources at each end and the backbone, not by a per-link SKU. A VPN gateway has an aggregate throughput ceiling set by its SKU, and exceeding it does not fail loudly so much as degrade: tunnels saturate, retransmits climb, and the application slows under load in a way that is easy to misread as an application problem rather than a gateway ceiling. ExpressRoute’s bandwidth is the tier you provisioned, and scaling it means moving to a larger circuit tier rather than tuning a knob. The single most common bandwidth surprise is a VPN chosen for a modest initial workload that grows past the gateway SKU, which is exactly the migration the connect-what-and-how-private rule is designed to prevent: if the workload was always going to need predictable high bandwidth to on-premises, the rule nominates ExpressRoute at the start.

How privacy differs: what stays off the public internet

Privacy is the axis where the three options separate most cleanly, and for many designs it is the axis that decides everything because a compliance requirement leaves no room for negotiation. The question is simple to ask and consequential to answer: does the traffic ever traverse the public internet, even encrypted?

Which options keep traffic off the public internet?

VNet peering keeps traffic on the Microsoft backbone, so peered virtual networks communicate without touching the internet. ExpressRoute keeps on-premises traffic on a private provider circuit, so it never touches the internet either. A site-to-site VPN does traverse the public internet, encrypted by IPsec, so the data is protected in transit but the path itself is the shared internet rather than a private one.

That distinction is not academic when a regulatory regime or an internal security policy forbids transiting the public internet for certain data, regardless of encryption. In that case the VPN is disqualified on the privacy axis alone, even though its encryption protects confidentiality, because the requirement is about the path and not only about whether the bytes are readable. ExpressRoute exists precisely for that requirement on the on-premises path, and peering satisfies it natively for the intra-Azure path because the backbone is private by construction. Where the policy cares about confidentiality rather than the path itself, an encrypted VPN can be acceptable and far cheaper, so the privacy decision turns on reading the requirement precisely: is it “must not traverse the internet” or “must be encrypted in transit,” because those two requirements nominate different options.

There is a second privacy nuance worth naming. Keeping traffic between an Azure resource and a platform service such as storage or a database off the internet is a different problem from connecting whole networks, and it is usually solved with a private endpoint rather than with any of the three options here. Reaching for ExpressRoute to make a single service private is a category error, because a private endpoint projects that one service into your virtual network with a private IP at a fraction of the cost. The reflexive-ExpressRoute trap is exactly this confusion of “I need this private” with “I need ExpressRoute,” and it deserves direct treatment later, because it is one of the two most expensive misreadings in this space.

Transitivity and why it does not chain

Transitivity is the behavior that most often defeats an otherwise sound topology, because the intuitive expectation is wrong. Engineers assume that if A connects to a hub and the hub connects to C, then A reaches C through the hub. For peering, that assumption is false by default, and discovering it during an incident rather than during design is a familiar and avoidable pain.

Is VNet peering transitive?

No. VNet peering joins exactly the two virtual networks in the link and propagates no further. If a spoke peers with a hub and the hub peers with another spoke, the two spokes cannot reach each other through the hub by default. Making spoke-to-spoke traffic flow requires either a direct peering between the spokes or a routing appliance in the hub with user-defined routes that forward the traffic.

Understanding why peering behaves this way makes the fix obvious. Peering exchanges the address ranges of the two directly peered networks and installs the routes for those ranges, but it does not re-advertise a third network’s ranges learned through a separate peering. The hub knows about spoke A because it peers with A, and it knows about spoke C because it peers with C, but A’s route table never learns C’s range from the hub, so A simply has no route to C. There are two production-grade ways to make the traffic flow. The first is to place a network virtual appliance or Azure Firewall in the hub and add user-defined routes on each spoke that send the other spoke’s range to that appliance’s private IP, so the appliance becomes the next hop that bridges the spokes. The second, where the only requirement is reaching on-premises rather than another spoke, is gateway transit: a spoke can use the hub’s VPN or ExpressRoute gateway to reach on-premises by enabling gateway transit on the hub peering and the corresponding setting on the spoke, which lets the spoke borrow the hub’s gateway without its own.

The gateway options behave more transitively than raw peering because the gateway and the route table can be configured to carry traffic onward, which is exactly why hub-and-spoke designs put the gateway and the firewall in the hub and route the spokes through them. The mistake to avoid is assuming peering alone delivers any of this. It gives you fast, private, point-to-point links between named networks, and everything beyond a single hop is a routing decision you make on purpose with user-defined routes and an appliance, not a property you get for free. The way these pieces assemble into a full topology, and the choice between building it by hand and letting a managed service do it, is the subject of the hub-spoke versus Virtual WAN comparison that weighs the manual topology against the managed one.

Cost: how the three compare and what drives the bill

Cost is the constraint you apply after the connect-what-and-how-private rule has nominated an option, and understanding what actually drives each bill keeps you from selecting the wrong primitive to save money that the design will spend later anyway. The three options have structurally different cost models, and the cheapest sticker is not the cheapest outcome when it forces a migration.

VNet peering is billed on the data transferred between the peered networks, with rates that differ for traffic within a region versus traffic that crosses regions on a global peering. There is no gateway to pay for and no fixed circuit charge; you pay for the bytes that move. That makes peering inexpensive for most intra-Azure communication, and it makes the cost scale with usage rather than sitting as a fixed monthly line. The number to watch is cross-region data transfer on global peering, because a chatty cross-region pattern can run up egress in a way that a same-region design would not, which is a reason to keep tightly coupled components in one region where the topology allows.

A VPN gateway has two cost components: the gateway itself, billed per hour for as long as it is deployed, at a rate that rises with the SKU, plus the data egress charged on traffic leaving Azure. The gateway hours are a fixed monthly floor regardless of how much traffic flows, so a lightly used VPN still carries the gateway cost. That fixed floor is modest compared to ExpressRoute, which is why a VPN is the budget-friendly option for on-premises reach where the bandwidth requirement is modest. The trap is choosing the VPN purely because the gateway hours are cheap, then discovering the workload needs more than the SKU can carry, at which point you are either paying for a larger gateway SKU or migrating to ExpressRoute under pressure.

ExpressRoute carries the highest recurring cost because there are more components: the ExpressRoute circuit charged at a metered or unlimited data plan, the port or bandwidth tier you provision, and the connectivity provider’s own charge for the physical link into the Microsoft edge, which is billed by the provider and not by Azure. The total is materially higher than a VPN, and that is the cost you accept in exchange for a private, predictable, high-bandwidth path. The way to read the ExpressRoute bill is that you are buying determinism: predictable bandwidth, steadier latency, and a path that stays off the internet, and those properties have a price that a best-effort internet tunnel does not. The mistake is comparing the ExpressRoute sticker to the VPN sticker as though they buy the same thing; they do not, and the connect-what-and-how-private rule exists to keep the comparison from happening at all when the requirement already rules out the VPN.

The durable cost discipline is to size the option to the requirement and revisit it on a schedule. Peering costs track usage, so the lever there is topology and region placement. VPN costs are a fixed gateway floor plus egress, so the lever is the SKU and the amount of traffic leaving Azure. ExpressRoute costs are circuit plus provider, so the lever is the provisioned tier and the data plan, and the discipline is to provision the tier the workload genuinely needs rather than the largest one available. Confirm the current rates and SKU prices against the official Azure pricing at the time you build the plan, because connectivity pricing is revised periodically and a figure that was right last year may mislead a capacity plan today.

Configuring each option: the commands that realize the connection

A model is only useful if it translates into the configuration that builds it, so this section walks the minimal working setup for each option using Azure CLI, with the verification step that proves the path is up. These commands are the skeleton; the deep dives linked above carry the full SKU and resiliency detail.

Establishing a regional or global VNet peering requires a link on each side, because peering is bidirectional and a one-sided link sits in an Initiated state until the partner completes it. The pattern looks like this:

# Peer VNet A to VNet B (run for the A side)
az network vnet peering create \
  --name peer-a-to-b \
  --resource-group rg-network \
  --vnet-name vnet-a \
  --remote-vnet vnet-b \
  --allow-vnet-access true \
  --allow-forwarded-traffic true

# Complete the link from the B side
az network vnet peering create \
  --name peer-b-to-a \
  --resource-group rg-network \
  --vnet-name vnet-b \
  --remote-vnet vnet-a \
  --allow-vnet-access true \
  --allow-forwarded-traffic true

# Verify both links report Connected
az network vnet peering list \
  --resource-group rg-network \
  --vnet-name vnet-a \
  --query "[].{name:name, state:peeringState}" -o table

The verification step is the one that matters: both peerings must report a peeringState of Connected. A link stuck at Initiated means the partner side was never created, and a link in Disconnected means one side was deleted or the address space changed. The allow-forwarded-traffic flag is the one engineers miss in hub-and-spoke designs, because it permits the network to accept traffic that originated elsewhere and was forwarded through an appliance, which is precisely what spoke-to-spoke routing through a hub firewall depends on.

Standing up a site-to-site VPN requires a gateway subnet, a public IP, the gateway itself, a local network gateway that represents the on-premises side, and the connection that ties them together with a shared key. The gateway deployment is the slow step, often taking the better part of an hour, so it is worth scripting:

# A dedicated subnet named GatewaySubnet is mandatory
az network vnet subnet create \
  --resource-group rg-network \
  --vnet-name vnet-a \
  --name GatewaySubnet \
  --address-prefixes 10.0.255.0/27

# Public IP for the gateway
az network public-ip create \
  --resource-group rg-network \
  --name vpngw-pip \
  --allocation-method Static --sku Standard

# Create the VPN gateway (this step is slow)
az network vnet-gateway create \
  --resource-group rg-network \
  --name vpngw-a \
  --vnet vnet-a \
  --public-ip-addresses vpngw-pip \
  --gateway-type Vpn --vpn-type RouteBased \
  --sku VpnGw1 --no-wait

# Represent the on-premises side
az network local-gateway create \
  --resource-group rg-network \
  --name onprem-lng \
  --gateway-ip-address 203.0.113.10 \
  --local-address-prefixes 192.168.0.0/16

# Tie them together with a shared key
az network vpn-connection create \
  --resource-group rg-network \
  --name s2s-onprem \
  --vnet-gateway1 vpngw-a \
  --local-gateway2 onprem-lng \
  --shared-key "REPLACE_WITH_STRONG_PSK"

The subnet must be named exactly GatewaySubnet, because Azure looks for that name and will not deploy the gateway into an arbitrarily named subnet. The verification step is to query the connection status and confirm it reaches Connected once the on-premises device is configured with the matching parameters:

az network vpn-connection show \
  --resource-group rg-network \
  --name s2s-onprem \
  --query "{status:connectionStatus, ingress:ingressBytesTransferred, egress:egressBytesTransferred}" -o table

A connectionStatus of Connecting that never reaches Connected almost always points at a mismatch between the Azure side and the on-premises device: the shared key differs, the IKE or IPsec parameters do not align, or the local address prefixes do not match what the device advertises. ExpressRoute setup is necessarily more involved because it begins with the connectivity provider rather than with Azure: you create the circuit, hand the service key to the provider who provisions the physical link, configure BGP peering, and then connect a virtual network gateway of the ExpressRoute type to the circuit. Because the provider step lives outside Azure, the ExpressRoute commands realize only the Azure side, and the circuit reaches a provisioned state only after the provider completes their portion, which is the structural reason ExpressRoute has a longer lead time than a VPN that you can build end to end yourself in an afternoon.

How hub-and-spoke composes peering with a gateway

The three options are not mutually exclusive, and the most common production topology uses two of them together. A hub-and-spoke design places shared services in a central hub virtual network and connects each workload virtual network, the spokes, to that hub by peering. The hub holds the things every spoke needs to share: the VPN or ExpressRoute gateway that reaches on-premises, a firewall or network virtual appliance that inspects and controls traffic, and often shared platform services such as DNS forwarders. The spokes hold the workloads and peer to the hub, borrowing the hub’s gateway through gateway transit rather than each deploying its own.

This composition is where peering and a gateway combine to produce behavior neither delivers alone. The link joins each spoke to the hub privately and at backbone speed. Gateway transit lets a spoke reach on-premises through the hub’s single gateway, which is both cheaper and simpler than putting a gateway in every spoke. Enabling it requires two settings that work as a pair: on the hub-to-spoke peering you allow gateway transit, and on the spoke-to-hub peering you allow the use of the remote gateway, after which the spoke learns the on-premises routes through the hub’s gateway and sends that traffic across the link to the hub and onward. The arrangement scales because adding a workload means adding a spoke and a link, not a new gateway, and the on-premises connectivity is provisioned once in the hub.

The catch that the transitivity discussion already flagged returns here in its operational form. Spoke-to-spoke traffic does not flow through the hub by default even when both spokes peer with the hub, because it is non-transitive and the hub does not re-advertise one spoke’s range to another. To make spoke A reach spoke C, you route both spokes through a firewall or appliance in the hub using user-defined routes: each spoke gets a route that sends the other spoke’s range to the hub appliance’s private IP as the next hop, the appliance receives the forwarded traffic, and it relays it to the destination spoke. This is why hub-and-spoke designs that intend any spoke-to-spoke communication must plan the hub appliance and the route tables from the start, and why allow-forwarded-traffic must be enabled on the peerings, because the appliance is forwarding traffic that did not originate in the network it is forwarding from. A design that assumed peering would carry spoke-to-spoke traffic for free discovers the gap the first time two workloads in different spokes need to talk, and the fix is a routing change rather than a link change, which is exactly the distinction the connect-what-and-how-private rule and the transitivity behavior are meant to make obvious in advance.

Real-world scenarios and the deciding factor in each

Patterns repeat across the cases engineers report, and naming the deciding factor in each one turns the abstract rule into a set of recognizable situations. Each scenario below is a problem shape with the factor that points at the option.

The first pattern is choosing a VPN for a quick on-premises link and outgrowing its bandwidth. A team needs to reach an on-premises system, the requirement is modest, and a site-to-site VPN stands up in an afternoon for a small monthly cost. Months later the workload grows, replication or batch transfer saturates the gateway SKU, and the application slows in a way that looks like an application problem until someone reads the gateway metrics. The deciding factor that should have been weighed at the start is the trajectory of the bandwidth requirement: if the path was always going to carry heavy or growing traffic, the connect-what-and-how-private rule nominated ExpressRoute from the beginning, and the VPN was a false economy. Where the bandwidth genuinely will stay modest, the VPN remains the right call, and the lesson is to forecast the requirement rather than the launch-day load.

The second pattern is needing ExpressRoute for predictable private throughput. A workload moves large volumes between on-premises and Azure, or it has a compliance requirement that the traffic stay off the public internet, or it is latency-sensitive enough that internet jitter is unacceptable. Here the deciding factor is the combination of privacy and predictable bandwidth, and it points cleanly at ExpressRoute despite the higher cost, because no internet tunnel delivers a private path with a committed bandwidth tier. The cost is the price of determinism, and the rule treats it as a constraint to accept, not a reason to reconsider, once privacy and predictable throughput are genuine requirements.

The third pattern is peering for fast intra-Azure connectivity. Two virtual networks in Azure need to communicate by private IP at backbone speed, whether they sit in the same region or in different regions. The deciding factor is simply that both endpoints are Azure virtual networks, which the rule resolves immediately to peering, because nothing else competes with the backbone for an intra-Azure path and a gateway between two Azure networks adds cost and overhead for no benefit. The only sub-decision is regional versus global peering, decided by whether the networks share a region, and the only number to watch is cross-region data transfer cost on global peering.

The fourth pattern is a hub-and-spoke design composing peering with a gateway. An organization with multiple workloads wants shared on-premises connectivity and centralized inspection. The deciding factor is the need to share a gateway and a firewall across many workloads, which points at a hub holding the gateway and appliance, spokes peered to it, and gateway transit configured so the spokes borrow the hub’s gateway. The composition is the answer, not any single option, and the design hinges on planning the route tables for spoke-to-spoke traffic up front.

The fifth pattern is expecting transitive peering through a hub and being surprised when it fails. Two spokes peer with a hub, a workload in one spoke tries to reach a workload in the other, and the traffic does not flow because it is non-transitive. The deciding factor is recognizing that any spoke-to-spoke requirement is a routing requirement, not a link requirement, so it demands a hub appliance and user-defined routes rather than an additional peering. The fix is to route both spokes through the hub firewall, or to peer the two spokes directly if the topology and the inspection policy allow it.

The sixth pattern is cost driving a VPN-versus-ExpressRoute decision. A team weighs the two for an on-premises path and is tempted by the VPN’s lower bill. The deciding factor is whether the requirement includes privacy from the internet or predictable high bandwidth, because if either is a genuine requirement the VPN is disqualified regardless of cost, and if neither is, the VPN is the right economical choice. The rule keeps the cost comparison from selecting the option: cost decides between two options that both meet the what-and-how-private requirement, and it never overrides a privacy or bandwidth requirement that already ruled one of them out.

The counter-readings that waste money and break topologies

Two misreadings cause most of the expensive mistakes in this space, and engaging them directly is more useful than restating the rule, because they are intuitive enough that smart engineers fall into them.

The first counter-reading is reaching for ExpressRoute reflexively whenever a requirement says private. Privacy is the headline property of ExpressRoute, so the word triggers the association, and a team provisions an expensive circuit to satisfy a requirement that a cheaper primitive already meets. The correction is to ask what specifically must be private. If the requirement is that a single platform service such as a storage account or a database be reachable privately rather than over its public endpoint, the answer is a private endpoint, which projects that one service into the virtual network with a private IP at a tiny fraction of an ExpressRoute circuit’s cost. If the requirement is that two Azure virtual networks communicate privately, the answer is peering, because the backbone is already private. ExpressRoute earns its cost only when the private path that must be built is the on-premises-to-Azure path and the alternative is an internet tunnel that the requirement forbids. Naming the exact thing that must be private, and the exact path it travels, dissolves the reflex: most things that need to be private do not need ExpressRoute, and conflating the property with the product is what runs up the bill.

The second counter-reading is assuming it chains transitively through a hub. The mental model of a hub as a meeting point implies that anything connected to the hub can reach anything else connected to it, and for many networking technologies that is true. Peering breaks the model because it joins exactly two networks and propagates no further. The correction is to treat every multi-hop path as a routing decision: it gives single-hop private connectivity between named networks, and any path that traverses a third network requires user-defined routes and a forwarding appliance, or a direct peering between the endpoints. The danger of this misreading is that the topology looks correct in a diagram, with every spoke connected to the hub, while the spoke-to-spoke path silently does not exist until traffic tries to use it. Designing the route tables and the hub appliance from the start, rather than discovering the gap in production, is the discipline this counter-reading demands.

A third misreading is quieter but just as costly: ignoring the bandwidth ceiling of a VPN gateway until it bites. Because a VPN does not fail loudly when it saturates, a team can run for months without realizing the gateway SKU is the bottleneck, attributing the slowdown to the application or the on-premises network. The correction is to monitor the gateway throughput against its SKU ceiling from day one and to forecast the bandwidth trajectory rather than the launch-day load, so the move to a larger SKU or to ExpressRoute is a planned decision rather than an emergency.

Failure modes and the diagnostic tools that expose them

Each option fails in characteristic ways, and knowing the diagnostic that exposes each one turns a frustrating outage into a quick confirmation. The failures are distinct enough that the symptom usually names the option.

Peering failures are almost always one of three things: a link that never reached Connected, an address-space overlap, or a missing route for spoke-to-spoke traffic. A link stuck in Initiated means only one side was created, and the fix is to create the partner peering. A link in Disconnected means one side was removed or the address space changed under it, and the fix is to recreate the link with consistent address spaces. The diagnostic is to list the peerings and read the peeringState, and to check the effective routes on a network interface to confirm the expected ranges are present. When two networks cannot communicate despite a Connected peering, the effective routes on the source interface reveal whether a route to the destination range exists at all, which immediately distinguishes a link problem from a routing problem.

VPN gateway failures cluster around the tunnel and the parameters. A connection that stays in Connecting and never reaches Connected points at a mismatch with the on-premises device: a wrong shared key, misaligned IKE or IPsec settings, or local address prefixes that do not match what the device advertises. A tunnel that connects but passes no traffic points at routing or at security rules blocking the flow. A tunnel that connects and then degrades under load points at the gateway SKU ceiling. The diagnostics are the connection status and the ingress and egress byte counters, which show whether traffic is moving at all, plus the gateway metrics that reveal saturation against the SKU. Network Watcher’s connection troubleshooting and packet capture confirm where a flow stops when the status alone is ambiguous.

ExpressRoute failures are usually about BGP or the provider side. If the circuit is provisioned but routes are not exchanged, the BGP session is the suspect, and the diagnostic is to inspect the route table the circuit advertises and receives. If the circuit is not provisioned at all, the provider has not completed their portion, which is outside Azure and resolved with the provider. Because the path is private and the provider owns the physical link, ExpressRoute troubleshooting often involves the provider’s support in a way that peering and VPN do not. The common thread across all three is to confirm the control-plane state first, the link or connection or circuit status, then the routing, the effective routes or the BGP advertisements, and only then the data plane with a connectivity test, because diagnosing in that order isolates the layer that failed rather than guessing. The filtering layer that an NSG imposes on top of all this is a frequent culprit when the connectivity layer is healthy but traffic still does not flow, and the Network Security Group deep dive that explains how NSG rules evaluate traffic is the reference for confirming a rule is not silently dropping the flow.

How connectivity interacts with the rest of the network

None of the three options operates in isolation, and the connection working at the link or tunnel or circuit layer does not guarantee that traffic flows, because routing, filtering, and name resolution all sit on top. Understanding these interactions prevents the frustrating case where the connection reports healthy but the application still cannot reach across it.

Routing is the first interaction. Establishing connectivity installs routes for the connected ranges, but user-defined routes can override them, and a route table that sends traffic to a firewall or appliance changes the path regardless of what the link or gateway provides. In a hub-and-spoke design the route tables are doing as much work as the peerings, because they decide whether spoke-to-spoke traffic reaches the hub appliance and whether on-premises-bound traffic uses the hub gateway. Reading the effective routes on an interface, rather than the configured routes, shows the path traffic will actually take after Azure resolves system routes, peering routes, gateway-propagated routes, and user-defined routes together, and that effective view is the authoritative answer when a path is in doubt.

Filtering is the second interaction. Network security groups evaluate traffic independently of the connectivity layer, so a healthy peering or tunnel still carries nothing if an NSG rule denies the flow. A frequent confusion is a Connected peering with no traffic, where the link is fine and an NSG on the destination subnet is dropping the source range. Confirming the NSG rules and the effective security rules on the interface separates a filtering problem from a connectivity problem, and the order to check is connectivity first, then routing, then filtering, because each layer can independently stop traffic.

Name resolution is the third interaction, and it is the one that most often makes a working connection look broken to an application. Connecting two networks at the IP layer does nothing for name resolution; an application that resolves a hostname needs DNS that returns the right private IP across the connected networks. In a hub-and-spoke design this usually means a DNS forwarder in the hub or Azure Private DNS zones linked to the relevant virtual networks, so that names resolve to private IPs consistently across spokes and on-premises. A connection that passes IP traffic fine but fails for an application that connects by name is almost always a DNS problem rather than a connectivity problem, and recognizing that early saves hours of inspecting tunnels that are working correctly.

Designing connectivity for production

Pulling the threads together, a production-grade connectivity design follows from the rule and the interactions rather than from a default. Start by enumerating what must connect to what: which Azure virtual networks, which on-premises sites, which other providers. For each pair, apply the connect-what-and-how-private rule: Azure-to-Azure pairs get peering, on-premises pairs get a VPN or ExpressRoute depending on the privacy and bandwidth requirement, and the requirement is read precisely so that a privacy or bandwidth need that disqualifies the VPN is honored rather than overridden by cost.

Then design the topology that composes the chosen options. For more than a couple of workloads sharing on-premises connectivity, a hub-and-spoke with the gateway and firewall in the hub, spokes peered to it, and gateway transit enabled is the durable shape, because it provisions the expensive on-premises connectivity once and scales by adding spokes. Plan the route tables alongside the peerings, because spoke-to-spoke traffic and inspected egress are routing decisions that must exist from the start, and enable allow-forwarded-traffic on the peerings that carry forwarded flows. Decide name resolution deliberately, with a hub forwarder or linked private DNS zones, so applications that connect by name work across the topology.

Build in the verification and the headroom. For peering, confirm both links report Connected and the effective routes carry the expected ranges. For a VPN, confirm the connection reaches Connected and monitor the throughput against the SKU ceiling so a bandwidth ceiling is a planned upgrade rather than a surprise. For ExpressRoute, confirm the circuit is provisioned and the BGP session exchanges the expected routes, and design the resiliency the workload needs, because a single circuit is a single point of failure for the on-premises path. Across all three, treat the connectivity layer, the routing layer, the filtering layer, and the DNS layer as four things that each must be correct, and verify them in that order when something does not work. A design built this way rarely produces the migration-under-pressure that the false-economy VPN produces, because the option was chosen to fit the requirement and the trajectory from the start.

Regional versus global VNet peering, and when the distinction bites

Peering looks like one feature, but the regional and global variants differ in ways that matter once a design spans more than one Azure region. Regional peering joins two virtual networks in the same region, and global peering joins virtual networks in different regions, with the data path crossing Microsoft’s global backbone rather than the internet in both cases. The functional behavior is the same: resources reach each other by private IP, the link is non-transitive, and the address spaces must not overlap. The differences live in cost and in a few capability nuances that a multi-region design has to account for.

The cost difference is the one that shows up on the bill. Data transferred over a global peering crosses regions and is charged at the inter-region rate, which is higher than the intra-region rate that applies to a regional peering. A design that scatters tightly coupled, chatty components across regions and stitches them with global peering can accumulate inter-region transfer cost that a same-region placement would avoid entirely. The discipline is to keep components that talk constantly in the same region where the topology allows, and to reserve global peering for the traffic that genuinely needs to cross regions, such as replication to a secondary region or a global service reaching a regional backend. Global peering is the right tool for those cross-region paths; it is the wrong tool for components that only ended up in different regions by accident of deployment rather than by design.

There are also historical capability nuances that a careful design verifies against current behavior. Some services and features that work across a regional peering have, at various points, had restrictions when reached across a global peering, particularly around certain load-balancing and basic-tier scenarios. The platform evolves, and a restriction that applied in one year may be lifted in another, so the right posture is to confirm the current cross-region behavior of any service you intend to reach over a global peering against the official documentation at the time you design, rather than assuming the regional behavior carries over unchanged. The durable point is that regional and global peering are the same primitive with different reach and different cost, and a multi-region design treats the region boundary as a real cost and capability boundary rather than as a transparent extension of one flat network.

What the gateway SKU actually governs

Because the VPN gateway’s SKU is the lever that decides whether a VPN remains adequate, it is worth understanding what the SKU governs beyond a single throughput number. The SKU sets the aggregate throughput the gateway can carry across all its tunnels, the number of tunnels and connections it supports, and whether features such as active-active configuration and zone redundancy are available. A larger SKU raises the aggregate throughput ceiling and the connection count, and it enables the resiliency arrangements that a production gateway usually needs.

The reason the SKU matters so much in the VPN-versus-ExpressRoute decision is that it is the structural ceiling the connect-what-and-how-private rule is asking you to forecast. When the rule asks how fast the path must be, the honest answer for a VPN is bounded by the SKU, and the question becomes whether the largest practical VPN SKU carries the workload’s projected traffic with headroom. If the projected traffic approaches or exceeds what the SKU range can deliver, the VPN is not the right primitive regardless of how cheap the smallest SKU looks, because the path will saturate and the only remedies are a larger SKU at higher cost or a migration to ExpressRoute. If the projected traffic sits comfortably inside a modest SKU’s ceiling with room to grow, the VPN is the economical and correct choice. Active-active configuration deserves a note because it serves resiliency rather than raw throughput: it runs two gateway instances so a single instance failure or maintenance event does not drop the on-premises connectivity, which a production VPN generally wants. The combination of throughput headroom and resiliency is what a production VPN SKU is sized for, and reading the SKU as merely a throughput number misses the connection-count and resiliency dimensions that a real design depends on.

The verification habit that keeps a VPN honest is to monitor the gateway’s throughput metric against the SKU’s published ceiling continuously, so that the gateway approaching its limit is a signal you act on with a planned change rather than a degradation users report first. A gateway that routinely runs near its ceiling is telling you the bandwidth trajectory has outgrown the primitive, and that signal, caught early, turns the eventual move to a larger SKU or to ExpressRoute into a scheduled project rather than an incident. Confirm the current SKU throughput figures, tunnel counts, and feature availability against the official Azure documentation when you size a gateway, because these values are revised as the platform evolves and a stale figure can undersize a production path.

ExpressRoute resiliency and the peering types

ExpressRoute carries more architectural weight than a VPN because it is usually the private path for an entire organization’s on-premises connectivity, so its resiliency and its peering types deserve a closer look than the comparison table can hold. An ExpressRoute circuit is provisioned through a connectivity provider, and a single circuit, however reliable, is still a single path. Production designs that depend on ExpressRoute for critical on-premises connectivity build redundancy, whether through a second circuit in a different peering location, a VPN as a backup path that takes over if the circuit fails, or both, because the cost of the on-premises path going dark usually justifies the redundancy investment. Treating a single ExpressRoute circuit as inherently highly available is a mistake; the resiliency comes from the design around the circuit, not from the circuit alone.

The peering types on an ExpressRoute circuit determine what the circuit reaches, and conflating them causes confusion about what traffic the circuit carries. Private peering carries traffic to your Azure virtual networks by private IP, which is the peering type most people mean when they say ExpressRoute: it is how on-premises systems reach virtual machines, databases, and other resources in your virtual networks over the private circuit. Microsoft peering carries traffic to Microsoft public services, such as certain platform endpoints, over the circuit rather than over the internet, for organizations that want even that traffic kept on the private path. Understanding which peering type a given traffic flow uses clarifies what the circuit is actually carrying and prevents the assumption that establishing a circuit automatically routes every kind of traffic privately. A circuit with only private peering configured reaches your virtual networks privately but does not change how on-premises systems reach Microsoft public services, which still go over the internet unless Microsoft peering is configured for them.

The structural reason ExpressRoute has a longer lead time and a more involved lifecycle than a VPN traces back to the provider relationship. Standing up the circuit means coordinating with the connectivity provider who provisions the physical link into a Microsoft edge location, configuring BGP so routes are exchanged, and connecting an ExpressRoute-type gateway in your virtual network to the circuit. Each of those steps can introduce a failure mode the VPN does not have: the provider has not completed provisioning, the BGP session is not established, or the gateway is not connected to the circuit. Diagnosing ExpressRoute therefore often spans Azure and the provider, and the right first question when an ExpressRoute path misbehaves is which side owns the layer that failed, because a provider-owned physical link problem and an Azure-owned BGP or gateway problem are resolved in entirely different places.

A worked decision: choosing connectivity for a representative environment

To make the rule concrete, walk a representative environment through it end to end. An organization runs three Azure workloads in two regions, has an on-premises datacenter that holds a database the workloads replicate from, and has a compliance requirement that the replication traffic not traverse the public internet. The connectivity design follows from applying the rule pair to each connection rather than from picking a favorite option.

Start with the Azure-to-Azure connections. The three workloads need to reach shared services and, in some cases, each other. Both endpoints are Azure virtual networks, so the rule resolves immediately to peering. The workloads in the same region are joined by regional peering, and the workload in the second region is joined to the shared hub by global peering, with the inter-region transfer cost noted and the chatty components kept in one region where possible to limit it. Because more than one workload shares connectivity, a hub-and-spoke shape is the durable choice: a hub virtual network holds the shared services and the on-premises gateway, the three workloads are spokes peered to the hub, and gateway transit is enabled so the spokes reach on-premises through the hub’s single gateway. Spoke-to-spoke communication, where two workloads must talk, is planned as a routing decision with user-defined routes through a hub firewall, because peering will not carry it transitively.

Now the on-premises connection, which is where the compliance requirement decides everything. The replication traffic must not traverse the public internet, so the rule’s privacy axis disqualifies a VPN regardless of its lower cost, because an encrypted internet tunnel still travels the internet and the requirement is about the path rather than the encryption. The on-premises path is therefore ExpressRoute, configured with private peering so on-premises systems reach the virtual networks privately, and the higher cost is accepted as the price of the compliance requirement and the predictable replication bandwidth. The design adds a backup path for resiliency, because a single circuit is a single point of failure for the on-premises connectivity that the whole environment depends on. The gateway lives in the hub so it is provisioned once and shared by all three spokes through gateway transit.

The result is a design that composes peering and ExpressRoute, each doing its one job: the link joins the Azure networks privately and at backbone speed, and ExpressRoute carries the on-premises replication over a private circuit that satisfies the compliance requirement. No single option was chosen as a favorite; each connection was resolved by asking what it connects and how private and fast it must be, and the cost entered only to confirm the nominees rather than to select them. The same environment, if the compliance requirement were absent and the replication volume modest, would have used a VPN for the on-premises path instead, and the only thing that changed the answer was reading the requirement precisely. That sensitivity to the requirement, rather than to the price tag, is the whole point of the rule.

Common configuration mistakes that produce silent failures

Several configuration mistakes recur across connectivity designs, and they share the unpleasant property of producing silent failures rather than loud errors, which is what makes them expensive to diagnose. Naming them turns a multi-hour investigation into a quick check.

The first is the one-sided peering. Peering requires a link on each virtual network, and creating only one leaves it in an Initiated state that looks almost like success in a casual glance at the portal. The traffic does not flow, and the cause is not an error message but the absence of the partner link. The check is to confirm both peerings report Connected, and the habit is to always create both sides as a pair, ideally in the same script, so a one-sided peering never ships.

The second is forgetting allow-forwarded-traffic on the peerings in a hub-and-spoke design. When a hub appliance forwards traffic between spokes, the spoke networks receive traffic whose source is another network, and a link that does not allow forwarded traffic drops it. The connectivity looks correct, the routes look correct, and the traffic still vanishes, because the link is silently discarding forwarded packets. The check is to enable allow-forwarded-traffic on the peerings that carry forwarded flows, and the symptom that points at it is spoke-to-spoke traffic that fails even though the hub appliance and the routes are configured.

The third is the address-space overlap that surfaces only when two networks are joined. Two virtual networks designed independently may use the same private range, which is harmless until they are peered, at which point the overlapping range cannot be routed because two destinations claim it. The link may refuse to establish, or routing may behave unpredictably for the overlapping range. The check is to plan address spaces across the whole estate so that any networks that might ever be joined have non-overlapping ranges, which is far cheaper than re-addressing a network after the fact.

The fourth is the missing or misconfigured DNS that makes a working connection look broken to applications. IP connectivity says nothing about name resolution, and an application that connects by name across joined networks needs DNS that returns the right private IP. A design that joins the networks but leaves each with its own isolated DNS produces applications that cannot find each other by name despite a healthy connection. The check is to plan name resolution deliberately, with a hub forwarder or linked private DNS zones, and the symptom that points at it is IP-level tests succeeding while name-based application connections fail.

The fifth, specific to gateway transit, is enabling only one of the two settings the transit pair requires. Gateway transit needs allow gateway transit on the hub-side peering and use remote gateways on the spoke-side peering, and enabling one without the other leaves the spoke unable to use the hub gateway. The check is to treat the two settings as a pair that must both be set, and the symptom is a spoke that cannot reach on-premises even though the hub gateway is healthy and the link is Connected.

Monitoring connectivity so failures announce themselves

A connectivity design is only as good as the visibility you have into it, because all three options can fail quietly, and the difference between a five-minute fix and an afternoon of guessing is usually whether the right signal was already being watched. Each option has metrics and tools that, watched continuously, turn a degradation into an alert rather than a user complaint.

For peering, the signals are the peering state and the effective routes. A monitoring posture that alerts when a link leaves the Connected state catches the one-sided or broken peering before an application notices, and periodic verification that the effective routes on key interfaces carry the expected ranges catches the case where a user-defined route or an address change quietly removed a path. Peering itself does not throttle, so there is no throughput ceiling to alarm on, but cross-region data transfer on global peering is worth tracking as a cost signal, because a chatty cross-region pattern shows up as rising transfer charges before anyone calls it a problem.

For a VPN gateway, the signals are the connection status, the tunnel ingress and egress byte counters, and the gateway throughput against its SKU ceiling. Alerting when a connection leaves Connected catches a dropped tunnel immediately, and watching throughput against the SKU ceiling catches the saturation that otherwise degrades silently. The byte counters confirm whether traffic is actually moving when the status alone is ambiguous, distinguishing a tunnel that is up but idle from one that is up and carrying load. Network Watcher’s connection monitor can continuously probe a path and record when it stops passing traffic, which converts an intermittent connectivity problem into a timeline you can correlate with changes rather than a vague report that it is sometimes slow.

For ExpressRoute, the signals are the circuit provisioning state, the BGP session state, and the throughput against the provisioned tier. A circuit that leaves the provisioned state or a BGP session that drops is the kind of event that should page someone, because the on-premises path the whole environment depends on is down. Throughput against the provisioned tier tells you whether the circuit is approaching its ceiling, which, like the VPN SKU, is a signal to plan an upgrade before the ceiling becomes a constraint. Because ExpressRoute spans Azure and the provider, monitoring both the Azure-side signals and any health information the provider exposes gives the complete picture, and knowing which side a signal comes from tells you immediately where to take the problem when something fails. Across all three, the principle is that the control-plane state, the routing, and the throughput are the three families of signal, and watching them in advance is what lets a connectivity failure announce itself rather than hide until an application surfaces it.

When a VNet-to-VNet VPN makes sense over peering

The comparison table treats joining two Azure virtual networks as peering’s job, which is almost always correct, but there is a narrow case where a VNet-to-VNet VPN is chosen instead, and understanding it sharpens the rule rather than contradicting it. A VNet-to-VNet VPN connects two Azure virtual networks through a VPN gateway on each side, encrypting the traffic between them, where peering would connect them directly on the backbone without a gateway or encryption.

The reason to prefer peering for nearly every Azure-to-Azure connection is that it avoids the gateway cost, the gateway throughput ceiling, and the encryption overhead, while delivering the same private-IP reachability at backbone speed. A VNet-to-VNet VPN adds a gateway on each side, caps the throughput at the gateway SKU, and adds encryption overhead, all to accomplish something it does more cheaply and faster. So the default for two Azure networks is peering, full stop, and reaching for a VNet-to-VNet VPN as the first choice is usually a mistake.

The narrow case where the VPN approach is considered is when there is a specific requirement that peering cannot satisfy and the gateway can. The most common is a requirement that the traffic between the two Azure networks be encrypted in transit at the network layer, which it does not do because it relies on the backbone’s isolation rather than on encryption. If a policy demands that even intra-Azure traffic between these particular networks be encrypted, a VNet-to-VNet VPN provides that encryption where it does not, and the cost and the throughput ceiling are accepted as the price of meeting the policy. Even then, the question to ask is whether the policy truly requires network-layer encryption between Azure networks on the private backbone, or whether application-layer encryption such as TLS already satisfies the intent at lower cost, because the latter is often the case and avoids the gateway entirely. The rule still holds: two Azure networks are peering by default, and the VPN is the exception only when a specific requirement, almost always network-layer encryption, makes the gateway’s cost worth paying.

Security considerations across the three options

Connectivity and security are separate concerns that interact, and a connectivity design that ignores the security posture of each option leaves gaps that the connection itself does not reveal. Each option has a different security profile, and reading it correctly is part of choosing well.

Peering’s security profile is that it relies on the isolation of the Microsoft backbone rather than on encryption in the data path. Traffic between peered networks is private in the sense that it stays on the backbone and does not traverse the internet, but it is not encrypted by the link itself. For most designs the backbone isolation is sufficient, and applications that need confidentiality use application-layer encryption such as TLS on top. The security control that most often matters for peering is the network security group, because peering makes two networks reachable to each other and the NSG decides which flows are actually permitted. a link without considered NSG rules opens broad reachability between the networks, so the discipline is to pair the connectivity with the filtering that limits it to the flows the design intends.

The VPN gateway’s profile is that it encrypts traffic in transit across the internet, which is its security strength, but the encryption protects confidentiality on a path that is still the public internet. The security questions for a VPN are the strength of the shared key or the certificate-based authentication, the IKE and IPsec parameters that govern the encryption, and the on-premises device’s own posture, because the tunnel is only as secure as both ends. A weak pre-shared key or outdated cipher parameters undermine the encryption the VPN exists to provide, so the security posture of a VPN is an active configuration concern rather than a property you get for free by establishing the tunnel.

ExpressRoute’s profile is that the path is private but, by default, not encrypted, because the privacy comes from the circuit being dedicated rather than from cryptography. For most organizations the private circuit’s isolation is the security property they wanted, but where a requirement demands encryption in addition to a private path, that encryption is layered on top, whether through IPsec over the circuit or through application-layer encryption, rather than assumed from the circuit alone. The common thread is that none of the three options is a complete security solution by itself: peering relies on backbone isolation and needs NSGs to limit reachability, a VPN provides encryption whose strength depends on configuration, and ExpressRoute provides a private path whose encryption, if required, is added separately. A connectivity design that treats the connection as the security boundary, rather than as one layer that NSGs and encryption complete, leaves the gaps that an audit eventually finds.

Connecting to another cloud or a third-party network

The rule’s first question, what you are connecting, included a third case alongside Azure networks and on-premises sites: another provider’s network, such as a second public cloud or a partner’s environment. This case is common in multi-cloud designs, and it resolves through the same reasoning, with the path usually landing on a VPN or, in specific arrangements, on a private interconnect that behaves like ExpressRoute for that path.

The most common way to reach another cloud is a site-to-site VPN between the Azure VPN gateway and a compatible gateway in the other provider, treating the other cloud’s network the way you would treat an on-premises site: an encrypted tunnel across the internet that makes the two networks mutually reachable. The deciding factor is the same privacy-and-bandwidth question. If the cross-cloud traffic is modest and an encrypted internet path is acceptable, the VPN is the straightforward choice and stands up quickly because both clouds offer compatible gateway options. If the cross-cloud traffic must be private and high bandwidth, the design reaches for a private interconnect, which the major clouds and the connectivity providers offer as a way to join two clouds over a dedicated path rather than the internet, behaving for that path much as ExpressRoute does for the on-premises path. VNet peering does not apply across clouds, because it is an Azure-internal primitive that joins Azure virtual networks and has no meaning for a network outside Azure.

The reason this matters for the comparison is that it confirms the generality of the rule. The option is not selected by which cloud you are using but by what you are connecting and how private and fast the path must be. An Azure-to-Azure connection is peering, an Azure-to-anything-outside-Azure connection where reach suffices is a VPN, and an Azure-to-anything-outside-Azure connection where the path must be private and predictable is a dedicated circuit or interconnect. The same two questions resolve a multi-cloud connection that they resolve for an on-premises one, and recognizing that keeps a multi-cloud design from inventing special cases when the existing rule already covers them. The practical caution unique to cross-cloud paths is that the other side is owned by a different provider with its own gateway behavior, throughput limits, and pricing, so the design coordinates two providers’ constraints rather than one, and the troubleshooting spans both, which makes the clarity of the connect-what-and-how-private rule even more valuable because it fixes the option before the two-sided complexity begins.

How the right choice changes as an environment grows

A connectivity choice that is correct on day one can become wrong as an environment grows, and the durable designs are the ones that anticipate the trajectory rather than optimizing only for the launch. The growth pressures that change the answer are predictable enough to plan for.

The most common trajectory is the VPN that outgrows its bandwidth. A workload launches with modest on-premises traffic, a VPN is the correct economical choice, and over months or years the traffic grows as data accumulates and replication intensifies, until the gateway SKU becomes the ceiling. The design that anticipated this either chose ExpressRoute at the start because the trajectory was clear, or it built the VPN with a planned migration path and monitored the throughput so the move happened on schedule rather than under duress. The failure is the design that treated the launch-day load as permanent and discovered the ceiling as an incident. The discipline is to forecast the bandwidth trajectory when applying the rule, because how fast the path must be is a question about the future as much as the present.

A second trajectory is the flat network that should have been hub-and-spoke. An environment starts with one or two virtual networks joined by peering, more workloads appear, each peers to the others or to a growing tangle, and the topology becomes unmanageable because peering is point-to-point and the number of links grows faster than the number of networks. The design that anticipated growth introduced a hub early, so new workloads join as spokes to the hub rather than peering to every existing network, and the shared gateway and firewall live in the hub from the start. Migrating a flat mesh to a hub-and-spoke after it has grown is more disruptive than building the hub when the second or third workload appears, so the discipline is to adopt the hub shape before the mesh becomes painful.

A third trajectory is the single ExpressRoute circuit that becomes a single point of failure as the environment’s dependence on it grows. A circuit that was acceptable when one workload used it becomes a critical dependency when the whole estate routes on-premises traffic through it, and the resiliency that was optional becomes mandatory. The design that anticipated this planned the redundant circuit or the backup VPN path before the dependence became critical, so the resiliency was in place rather than retrofitted after an outage made the case for it. Across all three trajectories, the pattern is the same: the right choice depends on where the environment is heading, not only on where it is, and the rule’s second question, how private and fast the path must be, is answered honestly only when the trajectory is part of the answer.

Closing verdict

The verdict is the rule restated as a decision you can defend. Connecting two Azure virtual networks is VNet peering, every time, because the backbone is private and fast and a gateway between two Azure networks buys nothing. Connecting an on-premises site where the requirement is reach rather than guaranteed private bandwidth is a VPN gateway, because an encrypted internet tunnel is cheap and quick and adequate when the traffic is modest. Connecting an on-premises site where the requirement is a private path off the internet or predictable high bandwidth is ExpressRoute, because only a provider circuit delivers both, and its cost is the price of determinism rather than an extravagance.

The two answers that select the option are what you are connecting and how private and fast the path must be, and cost is the constraint you apply to the nominee rather than the criterion that picks it. The two traps to refuse are reaching for ExpressRoute whenever something must be private, when a private endpoint or peering usually meets the need, and assuming it chains transitively through a hub, when any multi-hop path is a routing decision that demands user-defined routes and an appliance. Hold the rule, read the requirement precisely, plan the routing and the DNS alongside the connectivity, and the three options stop competing and start composing, each doing the one job it is built for. To build each option, peer two networks, stand up a tunnel, and watch the routing behave, work through the hands-on Azure labs and command library on VaultBook, where the configuration in this article runs against a live environment so the behavior becomes something you have seen rather than only read.

Frequently Asked Questions

Q: VNet peering vs VPN vs ExpressRoute, which should I choose?

Choose by answering two questions before you compare cost. First, what are you connecting: two Azure virtual networks, an on-premises site, or another provider’s network. Second, how private and fast must the path be: must it stay off the public internet, and does it need predictable high bandwidth. Two Azure virtual networks point at peering, because the backbone is already private and fast. An on-premises site where reach is the requirement and bandwidth is modest points at a VPN gateway, because an encrypted internet tunnel is cheap and quick. An on-premises site where the path must be private off the internet or carry predictable high bandwidth points at ExpressRoute, because only a provider circuit delivers both. Cost is the constraint you apply to that nominee, not the criterion that selects it, so a privacy or bandwidth requirement that disqualifies the VPN holds even when the VPN looks cheaper.

Q: How do peering, VPN, and ExpressRoute compare on latency and bandwidth?

Peering rides the Microsoft backbone with no tunnel and no link-level throughput cap, so it adds negligible latency and is bounded only by the endpoints and the backbone. A VPN gateway rides the public internet, so its latency is whatever the internet path delivers plus encryption overhead, and its aggregate throughput is capped by the gateway SKU. ExpressRoute rides a private circuit with a provisioned bandwidth tier, so it delivers steadier latency than the internet and a throughput ceiling you purchased rather than one the internet provides. The practical reading is that chatty, latency-sensitive on-premises traffic favors ExpressRoute for steadiness, modest or tolerant traffic can live on a VPN, and intra-Azure traffic belongs on peering because nothing competes with the backbone for that path. Verify exact SKU and circuit-tier figures against current official limits before committing them to a capacity plan.

Q: How do peering, VPN, and ExpressRoute compare on cost?

The cost models differ structurally. Peering is billed on data transferred, with a higher rate for cross-region traffic on global peering and no gateway or circuit charge, so it scales with usage and stays inexpensive for most intra-Azure communication. A VPN gateway costs the gateway itself, billed per hour at a rate that rises with the SKU, plus egress on traffic leaving Azure, so it carries a modest fixed floor regardless of traffic. ExpressRoute costs the most because it adds the circuit, the provisioned bandwidth tier, and the connectivity provider’s own charge for the physical link. The trap is comparing the ExpressRoute sticker to the VPN sticker as though they buy the same thing; they do not, because ExpressRoute buys a private predictable path that a best-effort tunnel cannot match. Confirm current rates against official pricing when you plan, because connectivity pricing is revised periodically.

Q: Which connectivity options keep traffic off the public internet?

VNet peering keeps traffic on the Microsoft backbone, so peered virtual networks never touch the internet. ExpressRoute keeps on-premises traffic on a private provider circuit, so it never touches the internet either. A site-to-site VPN does traverse the public internet, encrypted by IPsec, so the data is protected in transit but the path is the shared internet rather than a private one. The distinction matters when a compliance requirement forbids transiting the internet regardless of encryption, because that requirement disqualifies the VPN on the path alone even though its encryption protects confidentiality. Read the requirement precisely: must the traffic not traverse the internet, or must it be encrypted in transit, because those two phrasings nominate different options, and an encrypted VPN is acceptable and cheaper when only confidentiality is required.

Q: Which option connects an on-premises network to Azure?

Both a VPN gateway and ExpressRoute connect on-premises networks to Azure; it does not, because peering is an Azure-internal primitive that joins virtual networks. The choice between the two on-premises options turns on privacy and bandwidth. A VPN gateway builds an encrypted tunnel from your on-premises VPN device across the internet to the Azure gateway, which is quick and cheap and adequate when the bandwidth requirement is modest and an internet path is acceptable. ExpressRoute provisions a private circuit through a connectivity provider, which never touches the internet and carries a committed bandwidth tier, making it the choice when the path must be private or must carry predictable high bandwidth. Many production environments use both, with ExpressRoute as the primary private path and a VPN as a backup that takes over if the circuit fails.

Q: How does transitivity differ across peering, VPN, and ExpressRoute?

VNet peering is non-transitive: it joins exactly the two networks in the link and propagates no further, so two spokes peered to a common hub cannot reach each other through the hub by default. The gateway-based options behave more transitively because a gateway and a route table can be configured to carry traffic onward, which is why hub-and-spoke designs put the gateway in the hub and route spokes through it with gateway transit. To make spoke-to-spoke traffic flow through a hub, you place a firewall or network virtual appliance in the hub and add user-defined routes on each spoke that send the other spoke’s range to the appliance, or you peer the spokes directly. The key reframing is that any multi-hop path is a routing decision you make on purpose, not a property peering provides for free, and designing the route tables up front prevents the silent spoke-to-spoke gap.

Q: Is global VNet peering different from regional peering?

Functionally they behave the same: resources reach each other by private IP over the Microsoft backbone, the link is non-transitive, and address spaces must not overlap. The differences are cost and a few capability nuances. Global peering joins virtual networks in different regions and charges data transfer at the higher inter-region rate, while regional peering joins networks in one region at the lower intra-region rate, so a chatty cross-region pattern on global peering can accumulate transfer cost a same-region design would avoid. Some services and features have, at various points, had restrictions when reached across a global peering, so confirm the current cross-region behavior of any service you intend to reach over global peering against the official documentation. The discipline is to keep tightly coupled components in one region where the topology allows and reserve global peering for traffic that genuinely must cross regions.

Q: Why does my hub-and-spoke design fail to route spoke-to-spoke traffic?

Because it is non-transitive, so two spokes peered to a common hub do not reach each other through the hub by default; the hub does not re-advertise one spoke’s range to another, and each spoke simply lacks a route to the other. The fix is to make spoke-to-spoke a routing decision: place a firewall or network virtual appliance in the hub, add user-defined routes on each spoke that send the other spoke’s range to the appliance’s private IP as the next hop, and enable allow-forwarded-traffic on the peerings so the spokes accept traffic forwarded by the appliance. Alternatively, peer the two spokes directly if the topology and inspection policy allow it. The symptom that points here is spoke-to-spoke traffic failing while spoke-to-hub traffic works, which tells you the connectivity is fine and the missing piece is a route.

Q: Why is my VPN connection stuck in Connecting and never reaching Connected?

A connection that stays in Connecting almost always reflects a mismatch between the Azure gateway and the on-premises VPN device. The usual culprits are a shared key that differs between the two sides, IKE or IPsec parameters that do not align, or local address prefixes configured on the Azure side that do not match what the on-premises device actually advertises. Confirm the shared key is identical on both ends, align the IKE and IPsec settings so the proposals match, and check that the local network gateway’s address prefixes reflect the real on-premises ranges. The diagnostics are the connection status and the ingress and egress byte counters, which show whether any traffic moves, plus Network Watcher’s tools to confirm where a flow stops. Once the parameters align on both ends, the connection moves to Connected and traffic flows.

Q: Do I need ExpressRoute just to make a service private?

Usually not, and reaching for ExpressRoute whenever something must be private is one of the most expensive misreadings in this space. If the requirement is that a single platform service such as a storage account or database be reachable privately rather than over its public endpoint, the answer is a private endpoint, which projects that one service into your virtual network with a private IP at a tiny fraction of an ExpressRoute circuit’s cost. If the requirement is that two Azure virtual networks communicate privately, the answer is peering, because the backbone is already private. ExpressRoute earns its cost only when the private path you must build is the on-premises-to-Azure path and an internet tunnel is forbidden. Name the exact thing that must be private and the exact path it travels, and the reflex usually dissolves, because most things that need to be private do not need an ExpressRoute circuit.

Q: Can I use a VPN gateway to connect two Azure virtual networks?

Yes, through a VNet-to-VNet VPN, but it is almost never the right default, because peering connects two Azure virtual networks directly on the backbone without a gateway, without an encryption-overhead penalty, and without a SKU-imposed throughput ceiling. A VNet-to-VNet VPN adds a gateway on each side, caps throughput at the gateway SKU, and adds encryption overhead, all to accomplish what it does more cheaply and faster. The narrow case where the VPN approach is considered is a requirement that the intra-Azure traffic be encrypted at the network layer, which it does not provide because it relies on backbone isolation rather than cryptography. Even then, ask whether application-layer encryption such as TLS already satisfies the intent at lower cost. The default for two Azure networks is peering; the VPN is the exception only when a specific encryption requirement makes the gateway’s cost worth paying.

Q: What happens if two virtual networks have overlapping address spaces?

They cannot be peered usefully, because once two networks share a routing fabric, two subnets claiming the same range cannot be distinguished and the overlapping range cannot be routed. The link may refuse to establish, or routing for the overlapping range behaves unpredictably. The prevention is to plan address spaces across the whole estate so that any networks that might ever be joined use non-overlapping private ranges, which is far cheaper than re-addressing a network after it is in production. If an overlap already exists between networks that must connect, the options are to re-address one network, which is disruptive, or to introduce network address translation through an appliance for the overlapping ranges, which adds complexity. The durable answer is upfront address planning, treating the private address space as a shared resource across the organization rather than a per-network decision made in isolation.

Q: Does gateway transit let a spoke use the hub’s gateway?

Yes, gateway transit lets a spoke reach on-premises through the hub’s single VPN or ExpressRoute gateway rather than deploying its own, which is cheaper and simpler than a gateway per spoke. It requires two settings that work as a pair: on the hub-to-spoke peering you allow gateway transit, and on the spoke-to-hub peering you allow the use of the remote gateway. Enabling only one of the two leaves the spoke unable to use the hub gateway, which is a common silent failure where the gateway is healthy, the link is Connected, and the spoke still cannot reach on-premises. With both settings enabled, the spoke learns the on-premises routes through the hub’s gateway and sends that traffic across the peering to the hub and onward. This is the mechanism that lets a hub-and-spoke design provision expensive on-premises connectivity once and share it across many workloads.

Q: How do I keep global VNet peering costs from surprising me?

Global peering charges data transfer at the higher inter-region rate, so the surprise comes from chatty components scattered across regions running up cross-region transfer that a same-region placement would avoid. The first lever is topology and region placement: keep tightly coupled, frequently communicating components in the same region where the design allows, and reserve global peering for traffic that genuinely must cross regions, such as replication to a secondary region. The second lever is visibility: track cross-region data transfer as a cost signal so a chatty cross-region pattern shows up as rising charges before it becomes a budget problem rather than after. The third is to question whether a component truly needs to live in a different region or only ended up there by deployment accident, because consolidating an accidentally split workload into one region removes the inter-region transfer entirely. Confirm current transfer rates against official pricing when you model the cost.

Q: Is a single ExpressRoute circuit highly available on its own?

No. A single circuit, however reliable, is still a single path, and treating it as inherently highly available is a mistake when the on-premises connectivity it carries is critical. Resiliency comes from the design around the circuit, not from the circuit alone. Production designs build redundancy through a second circuit in a different peering location, a VPN as a backup path that takes over if the circuit fails, or both, because the cost of the on-premises path going dark usually justifies the redundancy investment. The right time to plan this is before the environment’s dependence on the circuit becomes critical, because retrofitting resiliency after an outage is more disruptive than designing it in. The first question when an ExpressRoute path misbehaves is which side owns the failed layer, since a provider-owned physical link problem and an Azure-owned BGP or gateway problem are resolved in entirely different places.

Q: My peering shows Connected but traffic does not flow, what is wrong?

A Connected peering with no traffic almost always means the problem is in a layer above the connectivity itself, because the link being connected only confirms the two networks are joined. Check the layers in order. First, routing: read the effective routes on the source interface to confirm a route to the destination range actually exists, since a user-defined route or a missing route can leave traffic with nowhere to go. Second, filtering: confirm the network security group on the destination subnet or interface is not denying the source range, because NSGs evaluate independently of the connectivity layer and a deny rule silently drops the flow. Third, in hub-and-spoke designs, confirm allow-forwarded-traffic is enabled if the traffic was forwarded by an appliance, since a peering that disallows forwarded traffic drops it. If IP-level tests pass but name-based connections fail, the problem is DNS rather than connectivity.

Q: How do I choose between a VPN and ExpressRoute when cost is the concern?

Let the privacy and bandwidth requirements decide first, and let cost decide only between options that both meet those requirements. If the on-premises path must stay off the public internet for a compliance reason, the VPN is disqualified regardless of its lower cost, because an encrypted internet tunnel still travels the internet. If the path must carry predictable high bandwidth that exceeds what a VPN gateway SKU can deliver, the VPN is again disqualified, because it will saturate. Only when neither privacy from the internet nor predictable high bandwidth is a genuine requirement does the cost comparison legitimately select the VPN as the economical choice. The mistake is letting the lower VPN sticker override a real privacy or bandwidth requirement, which produces a painful migration to ExpressRoute later. Cost is a constraint applied to qualified nominees, never the criterion that selects the option over a genuine requirement.

Q: Does establishing connectivity also handle name resolution between networks?

No, and this catches many engineers off guard. Connecting two networks at the IP layer, whether by peering, VPN, or ExpressRoute, does nothing for DNS, so an application that connects by hostname needs name resolution that returns the right private IP across the connected networks. A connection that passes IP traffic fine but fails for an application connecting by name is almost always a DNS problem rather than a connectivity problem, and recognizing that early saves hours of inspecting healthy tunnels. In a hub-and-spoke design the usual solution is a DNS forwarder in the hub or Azure Private DNS zones linked to the relevant virtual networks, so names resolve to private IPs consistently across spokes and on-premises. Plan name resolution deliberately as a layer alongside connectivity, routing, and filtering, because all four must be correct for an application to work across joined networks.

Q: Can I connect Azure to another public cloud, and which option fits?

Yes, and the same rule applies. It does not work across clouds because it is an Azure-internal primitive, so the choice is between a VPN and a private interconnect. The most common approach is a site-to-site VPN between the Azure VPN gateway and a compatible gateway in the other provider, treating the other cloud like an on-premises site: an encrypted internet tunnel that makes the two networks mutually reachable, which suits modest cross-cloud traffic. Where the cross-cloud path must be private and high bandwidth, the design reaches for a private interconnect that the major clouds and connectivity providers offer, behaving for that path much as ExpressRoute does for the on-premises path. The caution unique to cross-cloud paths is that the other side is owned by a different provider with its own gateway behavior, limits, and pricing, so the design and the troubleshooting span two providers, which makes resolving the option early with the rule even more valuable.