Azure Networking Fundamentals for Engineers

A request leaves a client, crosses a virtual network, passes through a security boundary, resolves a name, and arrives at a service. When it works, nobody thinks about the path. When it fails, an engineer stares at a timeout and starts guessing, and the guessing is where hours disappear. The reason the guessing happens is almost always the same: two distinct decisions, made at two distinct layers, get treated as one. Routing decides where a packet goes. Filtering decides whether it is allowed to go there. Name resolution decides what address the packet aims at in the first place. These are three separate questions answered by three separate mechanisms, and an engineer who keeps them separate can place any connectivity problem on a map within a minute, while an engineer who blends them widens a security rule to fix a routing fault, or rewrites a route table to fix a name that never resolved.

Azure networking fundamentals packet path map - Insight Crunch

This article builds the model that the rest of the networking work in this series leans on. It is the reasoning layer underneath every troubleshooting guide: once you can say which route a packet takes and which rule evaluates it, the specific errors stop being mysteries and become positions on a known path. The claim it defends is small and load-bearing. In Azure networking, routing chooses the path and security groups filter it, so almost every connectivity question reduces to two sub-questions, which route and which rule, and keeping those two apart is the core skill. Call it the route-then-filter model. Everything below is an elaboration of that one sentence, with the configuration that realizes it, the failure modes that break it, and the diagnostic tools that expose where on the path a packet actually died.

The route-then-filter model, stated precisely

Picture a single packet. It originates at a network interface somewhere, carries a source address and a destination address, and wants to reach the destination. Two things must happen for it to arrive, and they happen in a fixed order that the platform never reverses.

First, Azure consults the effective routes attached to the originating interface’s subnet and decides the next hop: where to send the packet on its way to the destination. This is a lookup against a routing table, and the answer is an address or a gateway or a special target like “Internet” or “None.” The route does not ask whether the packet is welcome. It only answers where the packet goes next. If no route matches, or if the matching route points at a next hop that drops the packet, the packet never reaches the destination regardless of how permissive every security rule is. A route problem is a geography problem: the packet went to the wrong place or to a place that swallowed it.

Second, assuming a route exists and points somewhere reachable, a Network Security Group evaluates the packet against an ordered set of rules and decides allow or deny. This evaluation happens on the way out of the source interface and again on the way in to the destination interface, and a deny at either point ends the packet’s journey with no reply. The NSG does not ask where the packet should go. It only answers whether this packet, with this source, destination, port, and protocol, is permitted on this segment. A filter problem is a permission problem: the packet reached the right boundary and was turned away.

The order matters because it tells you which question to ask first when something breaks. If the route is wrong, the filter never gets a chance to weigh in, because the packet is already lost. If the route is right and the filter denies, the packet arrives at the boundary and is dropped there. The symptoms differ in a way you can use. A routing failure that black-holes traffic usually looks like a timeout with no response at all, because the packet went nowhere a reply could come from. A filter denial on inbound also looks like a timeout, but the diagnostic tools can show you that the packet reached the destination’s effective rules and was denied by a specific rule, which a pure routing black-hole cannot show. Learning to read that difference is most of what separates a fast diagnosis from an afternoon of changing settings at random.

Name resolution sits before both of these, conceptually upstream of the packet itself. Before there is a destination address to route toward, something has to turn a name like myservice.database.windows.net into an address. That translation is a separate transaction against a DNS resolver, and it succeeds or fails on its own terms, with its own failure modes, none of which involve routing or filtering. A name that resolves to the wrong address sends a perfectly well-routed, perfectly well-permitted packet to the wrong place. A name that does not resolve at all means the client never even forms a packet to route. Treating DNS as part of connectivity is the third great conflation, and it is the one that produces the most circular debugging, because the engineer who assumes the address is correct spends the entire session examining routes and rules for a packet that was aimed at the wrong target from the start.

So the model has three layers, and they answer three questions in this order. What address am I aiming at, which is DNS. How does a packet get there, which is routing. Is the packet allowed through, which is filtering. Hold those three apart and every connectivity problem in Azure becomes a matter of identifying which layer failed, then using the tool that inspects that layer. Blend them and you will fix the wrong layer, which either does nothing or creates a new problem while the original persists.

The InsightCrunch packet-path map

The findable artifact for this article is a single table that traces one request from a client to a service and names, at each stop, what decision is made and what controls it. Keep this map in view while you debug. When a connectivity problem appears, you do not start by changing anything; you start by placing the failure on a row of this map, because the row tells you which mechanism owns the decision and therefore which tool inspects it.

Stop	Decision made here	What controls it	What a failure looks like	Tool that inspects it
1. Client resolves the name	What IP address is the destination?	DNS: client OS resolver, then the VNet DNS setting (Azure-provided or custom), then the zone that answers	Wrong IP returned, or no answer (NXDOMAIN/timeout). Packet aimed at the wrong target or never formed	`nslookup`/`dig` from the client, the VNet DNS server setting, the relevant DNS zone
2. Source interface picks the next hop	Where does the packet go on its way out?	Effective routes on the source subnet: system routes plus any user-defined route table	Packet sent to a black-hole next hop, or no matching route, or sent to an appliance that drops it	Effective routes on the NIC; Network Watcher Next Hop
3. Source NSG evaluates egress	Is the packet allowed to leave this segment?	Outbound rules of the NSG on the source NIC and the source subnet, evaluated by priority	Outbound deny by a rule (often a custom rule above the default allow)	Effective security rules on the NIC; Network Watcher IP Flow Verify
4. The path crosses the boundary	Does the packet stay on the Azure backbone, traverse a peering, a gateway, or the internet edge?	VNet peering, VPN/ExpressRoute gateway, or the internet edge, as selected by the route’s next hop type	Missing peering or gateway route; non-transitive peering with no route through a hub	Effective routes (next hop type); the peering and gateway configuration
5. Destination NSG evaluates ingress	Is the packet allowed in to this segment?	Inbound rules of the NSG on the destination subnet and the destination NIC, evaluated by priority	Inbound deny by a rule; the destination’s default-deny with no allow above it	Effective security rules on the destination NIC; IP Flow Verify against the destination
6. Destination processes and replies	Does the service accept the connection and answer?	The destination’s own listener, its host firewall, and the return path back through stops 2 to 5	Connection refused (nothing listening), or an asymmetric return path that a stateless hop drops	A listener check on the destination; the reverse-direction route and rule evaluation

The map repays a habit. When a ticket says “service A cannot reach service B,” you do not open the NSG first because NSGs are where you last found a problem. You walk the rows. Did the name resolve to the address you expected? If not, the problem is stop one and no amount of route or rule work will help. If the address is right, what is the next hop from A’s subnet toward B’s address? If the next hop is a black-hole or an appliance, the problem is stop two. Only if the address and the next hop are both correct do you reach for the security rules at stops three and five. The discipline of walking the map in order is the single most effective change most engineers can make to how they debug Azure networking, because it stops them from fixing stop five for a problem that lives at stop one.

The map also explains why some fixes feel like they work and then the problem returns. Widening an NSG rule appears to fix an intermittent connectivity issue if the real fault was an asymmetric route that only sometimes mattered; the rule change coincided with the route settling, and the engineer credits the rule. The map keeps you honest: a rule change can only ever fix a stop-three or stop-five problem, so if the symptom is consistent with stop two, the rule change is a coincidence and the route fault is still there, waiting.

The VNet and subnet: the address and routing boundary

Every other piece of the model hangs off the virtual network, so the model starts there. A VNet is a private address space you carve out of the RFC 1918 ranges (or any range you choose, since Azure does not police overlap with the public internet inside your own VNet, only with the ranges you peer or connect to). Within that space you cut subnets, and the subnet is the unit that matters for both routing and filtering, because route tables and Network Security Groups attach at the subnet level. A VNet does not route or filter on its own; it is the container that gives addresses meaning and the boundary within which the default behavior applies. For the full treatment of address planning, subnet sizing, reserved addresses, and the service and delegated subnet types, the dedicated Azure Virtual Network deep dive is the companion to this section; here the focus is only on what the VNet contributes to the packet path.

The first thing the VNet gives you is a default routing fabric. The moment you create a VNet, Azure populates a set of system routes that make the obvious things work without any configuration. Traffic destined for another address in the same VNet routes directly within the VNet. Traffic destined for a peered VNet routes across the peering. Traffic destined for the internet routes out the default internet path. Traffic destined for on-premises ranges routes toward a gateway if one exists. These system routes are invisible in the sense that you do not create them, but they are entirely visible in the effective routes view on any interface, and reading that view is how you confirm what the default fabric is actually doing. The default fabric is permissive about reach: by default, every resource in a VNet can route to every other resource in the same VNet, and to peered VNets, and out to the internet. Reach is the default. Restriction is something you add.

That last point is where the second great conflation hides, the assumption that a subnet isolates. It does not, by default. Two virtual machines in two different subnets of the same VNet can reach each other over any port the destination is listening on, because the system route for intra-VNet traffic carries the packet and there is no NSG denying it unless you placed one. Engineers who come from on-premises networks where a VLAN boundary implied a firewall are repeatedly surprised by this. The subnet is a routing and filtering attachment point, not a wall. If you want isolation between subnets, you create it with Network Security Groups; the subnet boundary alone buys you nothing in the way of filtering. This is not a flaw, it is the model: routing gives reach, and you subtract reach deliberately with filters. The whole article rests on that asymmetry.

The second thing the VNet gives you is a DNS setting, and this is the seam where name resolution attaches to the network. Every VNet has a DNS configuration that is either the default Azure-provided resolver or a custom set of DNS server addresses you specify. The VNet’s DNS setting determines what resolver the resources in that VNet use when they look up a name, which means the VNet is also the boundary at which name resolution behavior is decided. A VNet pointed at the Azure-provided resolver gets automatic resolution of Azure service names and of other resources in the VNet; a VNet pointed at a custom DNS server gets whatever that server is configured to answer, which is how hybrid name resolution is built. The point for the model is that DNS is a property of the VNet, set once and inherited by everything in it, and a misconfigured VNet DNS setting breaks name resolution for every resource at once, which is a distinctive symptom worth recognizing.

The third thing the VNet gives you is the set of edges through which it connects to the wider world: the peerings to other VNets, the gateways to on-premises networks, the private endpoints that project services into the address space, and the internet edge. Each edge appears in the routing fabric as a next hop type, which is the bridge between this section and the next. The VNet is the boundary; the edges are where the boundary opens; and routing is the mechanism that decides which edge a given packet uses. Hold the VNet in mind as the stage on which routing and filtering perform, and the next two sections, which take routing and filtering one at a time, fall into place.

Routing: how a packet decides where to go

Routing is the first half of the model and the half engineers understand least, because most of it happens through system routes they never see until something forces them to look. The mechanism is a longest-prefix-match lookup, the same idea that governs any IP network: among all the routes whose destination prefix contains the packet’s destination address, the most specific one wins, and its next hop is where the packet goes. Azure layers three sources of routes into the table that the lookup runs against, and the order of precedence among those sources is the thing to memorize.

System routes come first in existence, last in precedence. The platform creates them automatically for the destinations that should just work: the VNet’s own address space (next hop type VirtualNetwork), the internet (next hop type Internet), peered VNet ranges (next hop type VNetPeering or VirtualNetwork depending on how you read the effective view), and a handful of reserved ranges that are dropped (next hop type None) so they cannot be used as a covert path. You do not edit system routes. You override them, and the override is where user-defined routes come in.

User-defined routes, the UDRs that live in a route table you attach to a subnet, take precedence over system routes for the same or a more specific prefix. This is the lever you pull to change where traffic goes: send all internet-bound traffic to a firewall appliance, send a specific range to a network virtual appliance, or black-hole a range deliberately by routing it to None. A UDR’s next hop type can be VirtualAppliance (with an explicit next hop IP), VirtualNetworkGateway, VirtualNetwork, Internet, or None. The full treatment of building, prioritizing, and debugging route tables, including the subtle precedence interactions and forced tunneling, lives in Azure route tables and UDRs explained; the model only needs you to hold one fact: a UDR beats a system route, so the moment you attach a route table, you can change or break the default fabric for that subnet.

The third source is routes learned over BGP from a gateway, which matters for hybrid connectivity. When a VPN or ExpressRoute gateway is in the picture and BGP is enabled, the on-premises routes it advertises enter the table, and these interact with system routes and UDRs by the platform’s precedence rules. The practical effect is that on-premises destinations become reachable because a route to them exists with the gateway as next hop. A missing or withdrawn BGP route is a common reason on-premises connectivity fails while everything inside Azure works, and it is a routing problem, stop two on the map, not a firewall problem, even though it often gets escalated as one.

The precedence order, stated as a rule you can apply under pressure, is this: for a given destination, a UDR wins over a BGP-learned route, which wins over a system route, and among routes of the same source the longest prefix match wins. When two routes tie, the platform applies its documented tie-breaking, but you rarely want to depend on a tie; if you find yourself reasoning about ties, you usually have a route table you should simplify. The reason to know the order is diagnostic: when a packet goes somewhere you did not expect, you ask which route won, and the precedence rule tells you where to look first, which is almost always a UDR you or a policy added.

The single most useful routing concept for debugging is the effective route, because effective routes are the merged, post-precedence table that a specific network interface actually uses. You do not reason about system routes and UDRs separately when debugging; you ask the platform to show you the effective routes on the interface in question, and it returns the final answer for every destination prefix, with the winning next hop. The command pattern is short enough to keep in muscle memory.

# Show the effective routes on a specific NIC: the merged, post-precedence table.
az network nic show-effective-route-table \
  --name myvm-nic \
  --resource-group my-rg \
  --output table

# The output names, per destination prefix, the source (Default, User, VirtualNetworkGateway),
# the address prefix, the next hop type, and the next hop IP. Read the row whose
# prefix contains your destination: its next hop is where your packet actually goes.

Reading that table is the whole skill. Suppose a packet to 10.2.0.5 is not arriving. You pull the effective routes on the source NIC and look for the most specific prefix containing 10.2.0.5. If you find a User route with prefix 10.2.0.0/16 and next hop type VirtualAppliance pointing at 10.0.1.4, you now know the packet is being sent to an appliance at 10.0.1.4, and the next question is whether that appliance forwards it or drops it. If instead you find only a Default route with next hop VirtualNetwork, the routing is fine and the problem is downstream, at a filter or the destination itself. Either way you have converted a vague “it does not connect” into a precise statement about where the packet goes, which is exactly what the model is for.

The next hop type None deserves a special mention because it is both a tool and a trap. As a tool, routing a range to None is how you deliberately black-hole traffic you want to forbid at the routing layer rather than the filter layer; packets to that range are dropped silently. As a trap, a UDR with next hop None that someone added to block a test range, then forgot, will silently swallow production traffic the day that range starts being used, and because the drop is silent and at the routing layer, it produces a clean timeout with no NSG log to point at. When a timeout has no corresponding NSG denial in the flow logs, a None route is one of the first suspects, precisely because it leaves no filtering evidence.

Filtering: how a packet is allowed or denied

Filtering is the second half of the model, and it layers on top of routing rather than replacing it. A Network Security Group is an ordered set of rules, each with a priority number, a direction (inbound or outbound), a source, a destination, a port range, a protocol, and an action of Allow or Deny. When a packet reaches a point where an NSG applies, the platform walks the rules in priority order from lowest number to highest, and the first rule that matches the packet’s five-tuple decides its fate. Lower priority numbers are evaluated first, so a deny at priority 100 beats an allow at priority 200 for the same traffic, because the deny is reached first and matching stops there. The complete rule-evaluation model, including application security groups, service tags, augmented rules, and flow logs, is the subject of the Network Security Groups deep dive; the model needs three facts from it.

The first fact is where NSGs attach and how the attachments stack. An NSG can attach to a subnet and to a network interface, and both can apply to the same packet. For inbound traffic to a VM, the subnet NSG is evaluated first, then the NIC NSG; for outbound traffic from a VM, the NIC NSG is evaluated first, then the subnet NSG. The packet must be allowed at every layer that applies; a deny at any one of them stops it. This stacking is why “I allowed it on the subnet but it still fails” is a routine support pattern: an allow on the subnet means nothing if a NIC NSG denies, because both must permit. When you debug filtering, you never look at one NSG; you look at the effective security rules on the interface, which merge the subnet and NIC rules into the single ordered list the platform actually applies.

The second fact is the default rules, which are the baseline every NSG carries before you add anything. The defaults allow inbound traffic from within the VNet, allow inbound traffic from the Azure load balancer, and deny all other inbound traffic; they allow outbound traffic to the VNet, allow outbound traffic to the internet, and deny all other outbound. These defaults encode the route-then-filter philosophy directly: inbound from the VNet is allowed because intra-VNet reach is the default, and everything else inbound is denied because exposure should be deliberate. The default inbound DenyAll sits at the highest priority number among the defaults, which means any allow rule you add at a lower number takes effect before the catch-all deny is reached. Understanding the defaults tells you what an empty NSG does: it permits the VNet to talk to itself and the load balancer to reach its backends, and it blocks the internet from initiating inbound. Most of what people configure is poking specific holes in that default-deny for the traffic they actually want.

The third fact is that NSGs are stateful, which closes a loop that otherwise traps engineers. When an inbound rule allows a connection, the return traffic for that connection is allowed out automatically without a matching outbound rule, and when an outbound rule allows a connection, the return traffic is allowed back in automatically. You do not write paired rules for both directions of an established connection. This statefulness is why a single inbound-allow on port 443 is enough to serve HTTPS; the responses flow back on the established connection without an explicit outbound allow for the ephemeral ports. The trap appears with asymmetric routing, covered later, where the return traffic takes a different path than the request and arrives at a hop that never saw the outbound half of the connection, so its state table has no record and it drops the return as unsolicited. Statefulness is a per-hop property; it only helps when both directions traverse the same stateful device.

To inspect filtering, two tools matter, and both answer the question “would this specific packet be allowed?” rather than “what are the rules in the abstract.” The effective security rules view shows the merged, ordered rule set on an interface, and IP Flow Verify simulates a single five-tuple against that set and returns the verdict and the rule that decided it.

# The merged inbound and outbound rules actually applied to this NIC.
az network nic list-effective-nsg \
  --name myvm-nic \
  --resource-group my-rg \
  --output json

# Simulate one packet and get back ALLOW/DENY plus the deciding rule name.
az network watcher test-ip-flow \
  --vm myvm \
  --nic myvm-nic \
  --direction Inbound \
  --protocol TCP \
  --local 10.1.0.4:443 \
  --remote 10.2.0.5:51000 \
  --resource-group my-rg

IP Flow Verify is the tool that ends most filtering arguments, because it does not describe rules, it returns a verdict for the exact packet you describe and names the rule that produced it. If it returns Allow and the traffic still fails, the problem is not the NSG, and you have just saved yourself from editing rules that were never the issue; the failure is at routing, at DNS, or at the destination’s own listener or host firewall. If it returns Deny and names a rule, you have the exact rule to fix, and you fix that one rule rather than widening a range and hoping. The discipline the model asks for is to run IP Flow Verify before changing any rule, because the verdict tells you whether you are even on the right stop of the map. An engineer who edits NSG rules without first confirming a denial is, more often than not, fixing stop five for a problem that lives at stop two.

Name resolution: the separate concern that everyone blames on the network

Routing and filtering both operate on a packet that already has a destination address. Name resolution is what produces that address, and it is a wholly separate transaction with its own machinery, its own failure modes, and its own diagnostic tools. The reason it earns a full section is that it is the layer engineers most reliably misdiagnose, because a name resolution failure presents as a connectivity failure to anyone not looking for it. The application logs “could not connect to myservice.database.windows.net,” the engineer reads “could not connect,” and an hour later they are deep in route tables for a packet that was never even formed, because the name never resolved to an address to route toward.

The chain of resolution in Azure has a fixed shape. A client wants to resolve a name. Its operating system resolver sends the query to whatever DNS server the network interface is configured to use, and in a VNet that configuration is inherited from the VNet’s DNS setting. If the VNet uses the Azure-provided resolver, queries go to a platform resolver at a well-known virtual address, which answers for Azure service names, for public names by recursing out, and for the names of resources in the VNet. If the VNet uses custom DNS servers, queries go to those servers, and the answer depends entirely on how they are configured, which is the foundation of hybrid resolution where a custom server forwards some queries to on-premises DNS and some to Azure. The deep treatment of public and private zones, VNet links, auto-registration, conditional forwarding, and split-horizon resolution is in Azure DNS and Private DNS zones explained; the model needs you to internalize that the address the client ends up with is the product of this chain, and any link in it can return the wrong answer or no answer.

The most consequential interaction between DNS and the rest of the network is the private endpoint, because it is where a DNS mistake produces a connectivity failure that looks exactly like a firewall problem but is not. When you create a private endpoint for a service, you get a network interface in your subnet with a private IP, and the intent is that the service’s public name should resolve to that private IP for resources inside the VNet. That resolution does not happen by magic; it happens because a private DNS zone for the service answers the name with the private IP, and that zone is linked to the VNet. If the zone is missing, or not linked, or the VNet’s DNS setting bypasses it, the service name resolves to its public IP instead, and the client sends traffic to the public endpoint, which the service may be configured to reject now that a private endpoint exists. The symptom is a connection that fails or hangs, the engineer assumes the private endpoint’s networking is wrong, and they examine routes and NSGs that are all correct, because the actual fault is that the name resolved to the wrong address. This single pattern, the private endpoint that resolves publicly, is responsible for a large share of “private endpoint not working” tickets, and it is a stop-one problem every time.

The diagnostic move for DNS is to resolve the name from the exact place the application runs and compare the answer to what you expect. You do not infer the address; you ask for it.

# From inside the VNet (e.g. on the VM), resolve the name and read the IP returned.
nslookup myservice.database.windows.net

# Or with dig, which makes the answer section explicit.
dig +short myservice.database.windows.net

# Compare the returned IP to your expectation:
#  - A public IP when you expected a private endpoint IP => DNS, not networking.
#    The private DNS zone is missing, unlinked, or bypassed by the VNet DNS setting.
#  - NXDOMAIN or a timeout => the resolver chain is broken; check the VNet DNS server
#    setting and, for custom DNS, the forwarder configuration.
#  - The expected private IP => DNS is fine; move to routing (stop 2) and filtering.

The comparison is the whole technique. If the name resolves to the address you expected, name resolution is not your problem and you move to stop two on the map. If it resolves to an unexpected address, you have found the fault and no route or rule change would ever have helped, because the packet was correctly routed and correctly permitted to the wrong place. If it does not resolve at all, the client never formed a packet, so every route and every rule is irrelevant, and the fix is in the resolver chain: the VNet DNS setting, the custom DNS server, or the zone that should have answered. Running this one command before touching the network saves more debugging time than any other single habit, because it conclusively separates the upstream layer from the two layers the model spends most of its time on.

A subtle point closes this section. The Azure-provided resolver lives at a virtual address that is reachable from within the VNet by the platform, and traffic to it is not something you route or filter in the normal way; it is platform plumbing. This is why a VNet pointed at the Azure-provided resolver “just works” for Azure names, and why pointing a VNet at a custom DNS server that itself cannot reach the platform resolver or an on-premises forwarder breaks resolution in ways that look like a network outage. When you put a custom DNS server in the path, you have taken ownership of the resolver chain, and its reachability and forwarding become your responsibility, sitting upstream of everything the route-then-filter model describes.

The edges: peering, gateways, and how VNets reach beyond themselves

A VNet is a closed address space until you open an edge, and each edge is a different way for a packet to leave the VNet and reach somewhere else. The edges matter to the model because each one shows up in the routing fabric as a next hop type, and routing is what selects which edge a given destination uses. Understanding the edges is understanding where the path can extend and where it can silently fail to extend.

VNet peering is the edge between two VNets. When you peer VNet A with VNet B, the platform adds system routes so that A’s resources can reach B’s address ranges and the reverse, and the traffic travels on the Azure backbone without touching the internet. Peering is fast and private, but it has one property that trips up nearly everyone the first time: it is not transitive. If A peers with a hub VNet H, and H peers with B, A cannot reach B through H by default, because peering only establishes reach between the two directly peered VNets, not through an intermediary. Reachability through a hub requires either a gateway in the hub with gateway transit enabled, or routing through a network virtual appliance in the hub with UDRs that steer A’s traffic to it and the appliance forwarding to B. The assumption of transitive peering is a routing-model error: people imagine the hub “connects” the spokes the way a switch does, but peering establishes point-to-point reach, and a hub-and-spoke topology only carries spoke-to-spoke traffic when you build the transit path explicitly. The connectivity options across peering, VPN, and a private circuit, and how a hub composes them, are weighed in detail in the comparison work later in the series; here the load-bearing fact is non-transitivity, because it is the reason a spoke-to-spoke timeout so often turns out to be a missing route, not a missing rule.

A VPN gateway is the edge between a VNet and an on-premises network or another VNet over an encrypted tunnel that traverses the public internet. The gateway terminates the tunnel, and routes to the on-premises ranges appear in the VNet’s routing fabric with the gateway as next hop, either statically configured or learned over BGP. Because the tunnel rides the internet, a VPN gateway’s throughput and latency are bounded by the internet path and the gateway SKU, which is why it suits modest or bursty hybrid traffic and gets outgrown by steady high-volume workloads. The model’s interest in the VPN gateway is narrow: it is a next hop for on-premises destinations, and when on-premises connectivity fails while intra-Azure works, the first question is whether the route to on-premises exists and points at the gateway, which is stop two and stop four on the map, not a filtering question.

An ExpressRoute gateway is the edge to a private circuit provided through a connectivity provider, giving a dedicated path to on-premises that does not traverse the public internet and that offers higher and more predictable bandwidth than a VPN. From the model’s point of view it behaves like the VPN gateway in one respect that matters: it is a next hop for on-premises ranges, and the routes to those ranges are learned over BGP and must be present for connectivity to work. A withdrawn or missing ExpressRoute route is, again, a routing failure that gets escalated as a firewall failure, and the discipline of checking the effective routes before the rules catches it.

Private endpoints are a different kind of edge, one that brings a service into the VNet rather than connecting the VNet outward. A private endpoint is a network interface in your subnet with a private IP that represents a specific instance of an Azure service, so that traffic to the service stays within your address space and on the backbone. Its connectivity depends on two things in the model’s terms: a route to the private IP, which is the ordinary intra-VNet system route and rarely the problem, and DNS that resolves the service name to that private IP, which is frequently the problem as the DNS section described. The private endpoint is the clearest example of why DNS sits in the model as its own layer: the routing to a private endpoint is trivial and almost never broken, while the resolution to it is the entire game.

The internet edge is the default outbound path and, where you allow it, an inbound path. Outbound to the internet works by default through the system route with next hop Internet, subject to the outbound NSG rules and to whatever address translation the platform applies. Inbound from the internet requires a public IP and an NSG that permits the traffic, and it is the edge you most deliberately restrict, because exposure here is exposure to everyone. The model treats the internet edge like any other: routing decides whether traffic uses it, filtering decides whether the traffic is allowed, and the two questions stay separate.

The unifying idea across all the edges is that each one is a next hop type in the routing fabric. When you debug a connectivity problem that crosses a boundary, you are really asking which edge the routing selected and whether that edge is configured to carry the traffic. A spoke-to-spoke failure is a missing transit route. An on-premises failure is a missing or withdrawn gateway route. A private endpoint failure is almost always DNS. Reading the edge through the lens of routing, rather than treating each edge as its own mysterious subsystem, keeps the model coherent: there is one path, made of stops, and the edges are simply the stops where the packet leaves one VNet’s address space for another’s or for the world.

Load balancer and gateway touchpoints in the path

Most real traffic does not go straight from one interface to another; it passes through a load balancer or an application gateway that fronts a pool of backends. These devices are touchpoints on the packet path, and the model places them precisely so that their failure modes do not get confused with routing or filtering faults.

An Azure Load Balancer operating at layer four distributes connections across a backend pool by a hashing scheme and checks backend health with a probe. Its relevance to the model is twofold. First, the probe traffic originates from a platform source and must be allowed inbound by the backends’ NSGs, which is exactly why one of the default NSG rules permits the Azure load balancer: without that allow, the probe fails, the backend is marked unhealthy, and it receives no traffic, producing a “service is up but gets no requests” symptom that is a filtering problem at the probe, not at the request. Second, the load balancer’s distribution means the backend a given connection lands on is not fixed, so a problem that affects one backend appears intermittently, which can masquerade as a flaky network when it is actually one unhealthy member of the pool. The choice between a layer-four load balancer and a layer-seven application gateway, and the features each layer unlocks, is a decision the comparison articles in the series make; the model only needs the load balancer placed as a touchpoint where a health probe and a distribution decision happen.

An Application Gateway operating at layer seven terminates the connection, inspects the request, and can route by path or host, apply a web application firewall, and re-originate the connection to a backend. Because it terminates and re-originates, the backend sees the gateway’s address as the source, not the original client, which changes what the backend’s NSG rules must allow and changes what appears in logs. The gateway also has its own health probe and its own subnet requirements. For the model, the application gateway is a touchpoint where the connection is split into two: client to gateway, and gateway to backend, each with its own routing and filtering. A failure between client and gateway is a different stop from a failure between gateway and backend, and treating the gateway as a single opaque hop is how engineers miss that the front half is healthy while the back half is denied.

The lesson the touchpoints teach is that the packet path is sometimes two paths joined at a device that terminates connections, and each segment gets the full route-then-filter analysis on its own. When a load-balanced or gateway-fronted service misbehaves, you ask which segment failed before you ask whether it was a route or a rule, because the segment narrows the search before the layer does.

How the pieces compose: a worked end-to-end trace

The model is easiest to trust once you watch it carry a real request from end to end. Take a concrete case: an application on a VM in subnet app (10.1.1.0/24) of VNet prod needs to read from an Azure SQL Database reached through a private endpoint whose interface sits in subnet data (10.1.2.0/24) of the same VNet, and the application connects to sqlsrv-prod.database.windows.net. Walk the stops.

At stop one, the application resolves sqlsrv-prod.database.windows.net. The VM inherits VNet prod’s DNS setting. If prod points at the Azure-provided resolver and a private DNS zone for the database service is linked to prod, the name resolves to the private endpoint’s IP in 10.1.2.0/24, say 10.1.2.5. If that zone were missing or unlinked, the name would resolve to a public IP and the rest of the trace would describe a packet correctly sent to the wrong place. Assume it resolves to 10.1.2.5; stop one passes, and the application now has a destination address.

At stop two, the VM’s interface in subnet app picks the next hop for 10.1.2.5. The effective routes on that NIC contain a system route for the VNet’s own range with next hop VirtualNetwork, and 10.1.2.5 falls inside the VNet range, so unless a UDR overrides it, the next hop is VirtualNetwork, meaning the packet is delivered within the VNet directly to the destination interface. If a UDR on subnet app sent the VNet range or that specific prefix to a firewall appliance, the packet would instead go to the appliance, and whether it arrived would depend on the appliance forwarding it. Assume no such UDR; the route is VirtualNetwork and stop two passes.

At stop three, the outbound NSG rules on the VM’s NIC and on subnet app evaluate the packet leaving toward 10.1.2.5 on port 1433. The default outbound allow to the VNet permits it unless a custom outbound deny at a lower priority number matches first. Assume the default allow applies; stop three passes, and the packet is on the backbone toward the destination interface.

At stop four, the path stays inside the VNet because the destination is in the same VNet, so no peering, gateway, or internet edge is involved; the backbone carries the packet directly. Stop four is a pass-through in this case, which is the common situation for intra-VNet private endpoint traffic and part of why these failures are so reliably DNS rather than routing.

At stop five, the inbound NSG rules on subnet data and on the private endpoint’s effective rules evaluate the arriving packet. Private endpoints have particular behavior here that the NSG deep dive covers, but the model’s view is that inbound to the destination segment must be permitted from the source for the connection to be accepted. If subnet data has an inbound rule allowing 1433 from subnet app or from the VNet, the packet is allowed; if a custom inbound deny matches first, it is dropped and the connection fails at stop five with a verdict IP Flow Verify would name. Assume the allow applies; stop five passes.

At stop six, the database service accepts the connection on 1433 and replies, and the reply traverses the path in reverse, allowed back by the statefulness of the NSGs that permitted the forward direction. The application gets its connection.

Now break it deliberately and watch the model localize the fault. If the private DNS zone is unlinked, stop one returns a public IP, every later stop operates correctly on the wrong address, and the only tool that reveals the truth is the nslookup at stop one. If someone attaches a UDR to subnet app that routes the VNet range to a decommissioned appliance, stop two sends the packet to a dead next hop, the effective route table shows the appliance as next hop, and Next Hop in Network Watcher confirms it, while every NSG is innocent. If a custom inbound deny is added to subnet data above the allow, stop five drops the packet, IP Flow Verify returns Deny and names the rule, and DNS and routing are both fine. Three different faults, three different stops, three different tools, and the model tells you which is which before you change a single setting. That is the entire value of holding the layers apart.

Six patterns the model explains

Engineers report the same connectivity failures repeatedly, and each one is a clean illustration of a layer being misread. Naming them as patterns lets you recognize a ticket as an instance of a known shape rather than a fresh mystery. Each pattern below names the stop where it lives, the symptom that distinguishes it, and the command that confirms it.

Why does traffic route correctly but still get dropped?

This is the canonical filtering failure and the most common ticket. The packet reaches the destination boundary, the effective routes are all correct, and yet the connection times out, because an inbound NSG rule denies it. The distinguishing detail is that the route is provably fine: the effective route table on the source NIC shows the right next hop, and Next Hop confirms it, so stop two is cleared. The fault is at stop five, an inbound deny on the destination subnet or NIC, often a custom rule someone added at a low priority number that matches more broadly than they intended and shadows the allow below it. The confirmation is IP Flow Verify against the destination interface for the exact five-tuple; it returns Deny and names the rule. The fix is to correct that one rule, not to widen the range, because a broad widening to make the symptom disappear creates an exposure the next audit will flag. The pattern teaches the model’s first discipline: prove the route before you touch a rule, and let the verdict name the rule rather than guessing which one.

Why does a UDR send my traffic into a black hole?

A user-defined route can point at a next hop that does not forward the packet, and when it does, the traffic vanishes with no filtering evidence at all. The usual cause is a UDR with next hop type VirtualAppliance pointing at a network virtual appliance that is powered off, misconfigured, or decommissioned, or a UDR with next hop None added to block a range and never removed. The symptom is a clean timeout, and crucially there is no NSG denial to find, because the packet never reached a boundary where an NSG applied; it was dropped at the routing layer. This absence of filtering evidence is the tell. The confirmation is the effective route table on the source NIC, which shows the offending User route and its next hop, and Next Hop in Network Watcher, which returns the next hop the platform will actually use for the destination. The fix is to correct or remove the UDR, or to restore the appliance it points at. The pattern teaches that a timeout with no NSG log is a routing suspect first, because routing drops are silent in a way filtering drops are not.

Why does the name fail to resolve while the network is fine?

The application reports it cannot connect, and the instinct is to examine the network, but the name never resolved to an address, so no packet was ever routed or filtered. The symptom that distinguishes it is that the failure happens before any connection attempt reaches the wire: the resolver returned NXDOMAIN, or a public IP where a private endpoint IP was expected, or it timed out. The confirmation is a resolution from the application’s own host, nslookup or dig, comparing the returned address to the expected one. A public IP where a private one was expected points at a missing or unlinked private DNS zone or a VNet DNS setting that bypasses it; an NXDOMAIN or timeout points at a broken resolver chain, usually a custom DNS server that cannot forward correctly. The fix is in the resolver chain, never in routes or rules. The pattern teaches the model’s most time-saving habit: resolve the name from the right place before assuming the problem is connectivity, because a name fault wears a connectivity costume.

Why can two subnets reach each other when I expected isolation?

This is the inverse failure, where reach exists that the engineer did not intend. Two VMs in different subnets of the same VNet communicate freely, and a security review flags it as an unexpected open path. The cause is the default fabric: the system route for the VNet’s own range carries intra-VNet traffic, and the default NSG rules allow inbound from the VNet, so unless you added a deny, the subnets can talk. The symptom is not a failure at all but a finding, traffic flowing where a subnet boundary was assumed to block it. The confirmation is the effective security rules on the interfaces, which show the default VNet-inbound allow with no custom deny above it, and a successful IP Flow Verify between the subnets. The fix is to add explicit inbound denies or scoped allows to enforce the segmentation you want, using application security groups to express it by workload rather than by raw address. The pattern teaches that the subnet is not a wall and that isolation is something you subtract from the default reach, deliberately.

Why does on-premises connectivity fail while everything in Azure works?

Resources inside the VNet reach each other and reach the internet, but traffic to on-premises ranges times out. Because intra-Azure traffic is healthy, the instinct to blame a general network problem is wrong; the fault is specific to the on-premises destinations, which means it is specific to the route that should carry them. The cause is a missing or withdrawn route to the on-premises ranges with the gateway as next hop, either because BGP is not advertising the prefix, the gateway connection is down, or a UDR is shadowing the gateway route. The symptom’s specificity, only on-premises destinations failing, is the tell. The confirmation is the effective route table on a source NIC, checking whether a route to the on-premises range exists with next hop VirtualNetworkGateway, and Next Hop for an on-premises address. The fix is in the routing and the gateway, restoring the advertisement or the connection. The pattern teaches that a failure scoped to one set of destinations is a routing question about those destinations, not a filtering question about the source.

Why did connectivity break after a topology change, and is it asymmetric?

After adding a firewall appliance, changing peering, or reworking route tables, a connection that worked starts failing intermittently or in one direction only. The cause is asymmetric routing: the request takes one path and the reply takes another, and a stateful device on one path never saw the other half of the connection, so it drops the return as unsolicited. The symptom is direction-dependent or intermittent failure that appeared right after a change, and the timing is the strongest clue. The confirmation is to trace the forward path and the reverse path separately, using the effective routes on both interfaces and Next Hop in each direction, and to look for a stateful hop, a firewall or NVA, that is on one path but not the other. The fix is to make the paths symmetric, usually by ensuring both directions traverse the same appliance through matching UDRs, or by using a design that does not depend on symmetry. The pattern teaches that statefulness is per-hop and that a topology change can split the forward and reverse paths in a way that no single NSG or route, read in isolation, reveals; you must read both directions.

Designing the network so the model stays legible

A network you can debug quickly is a network designed so that the route-then-filter model maps cleanly onto it. The design choices below are not about adding features; they are about keeping the two decisions, path and permission, separable and inspectable, so that when something breaks the map still applies.

Keep routing intent explicit and minimal. Every UDR you add overrides the default fabric and becomes a thing future-you must remember when debugging. A route table with three deliberate routes is legible; a route table with twenty routes accreted over years, several of which shadow each other, is a place where packets disappear for reasons nobody can reconstruct. When you must steer traffic to an appliance, do it with the narrowest prefix that achieves the goal, document why the route exists in its name or in the surrounding infrastructure code, and remove routes whose reason has passed. The discipline pays off precisely at the moment of an incident, when the effective route table is the first thing you read and a clean table gives a clean answer.

Express filtering by workload, not by address. Network Security Groups support application security groups, which let you write a rule that allows a workload role to reach another role without hardcoding addresses that change as the deployment scales. A rule that says the web tier may reach the data tier on 1433 survives a redeployment that changes every IP; a rule that allows 10.1.1.4 to reach 10.1.2.5 breaks the moment either address moves and leaves a stale allow that is both a failure waiting to happen and a security smell. Designing filtering around roles keeps the rule set small and meaningful, which keeps the effective security rules view readable when you need to find the deny.

Make DNS a deliberate decision, not an accident. Decide for each VNet whether it uses the Azure-provided resolver or a custom server, and if custom, treat the resolver chain as a first-class part of the network with its own monitoring, because a resolver outage presents as a total network outage to every application at once. For private endpoints, link the private DNS zones to the VNets that need them as part of the same deployment that creates the endpoints, so that the resolution and the endpoint are never out of step. The single most common production DNS fault, the private endpoint that resolves publicly, is prevented entirely by treating the zone link as part of the endpoint’s definition rather than a follow-up step someone might forget.

Design for symmetric paths when stateful devices are involved. If a firewall or network virtual appliance sits in the path, ensure both directions of every connection traverse it, because the statefulness that makes the device useful also makes it drop return traffic it never saw the request for. Hub-and-spoke designs that route spoke egress through a hub firewall must route the return through the same firewall, which means the UDRs on both ends and the appliance’s own routing have to agree. Asymmetry introduced by a well-meaning optimization is one of the hardest faults to find after the fact, so it is cheapest to design out at the start by making the path deterministic in both directions.

Segment with intent and verify the segmentation. Because the default fabric grants intra-VNet reach, the segmentation you want is something you impose and should confirm rather than assume. After you place the denies and scoped allows that express your intended boundaries, verify them with IP Flow Verify in both the should-pass and should-fail directions, so that you have positive evidence the segmentation does what you think. A segmentation you designed but never tested is a segmentation you do not actually have, and the test is cheap.

To build any of this with your hands and watch the model behave, run the hands-on Azure labs and command library on VaultBook. The lab environment lets you stand up a VNet with subnets, attach route tables and security groups, create a private endpoint and its DNS zone, and then trace a packet through the stops of the map while you deliberately break one layer at a time, which is the fastest way to make the route-then-filter model intuitive rather than theoretical. The tested command library covers the effective-routes, effective-rules, and IP Flow Verify invocations used throughout this article, so you can confirm each stop on a network you control before you ever have to do it under the pressure of a production incident. Building the failure on purpose, in a sandbox, is how the map stops being a diagram and becomes a reflex.

The diagnostic toolkit, mapped to the stops

The tools matter only in relation to the stop they inspect, and the model’s payoff is knowing which tool answers which question so you stop using the wrong one. Here is the toolkit, organized by the decision each tool reveals rather than by the menu it lives under.

For name resolution, stop one, the tool is a resolution from the application’s own host: nslookup or dig, run where the application runs, compared against the expected address. Nothing else conclusively tells you what address the application is actually aiming at, and no platform tool substitutes for asking the resolver the question from the right place.

For routing, stop two and stop four, the tools are the effective route table and Next Hop. The effective route table on a NIC shows the merged, post-precedence routes and the winning next hop for every prefix, which tells you where a packet to any destination will go. Next Hop in Network Watcher answers the same question for a single destination address with a single call, returning the next hop type and IP, and it is the fastest way to settle “where does this packet go” without reading a whole table.

# Next Hop: where does a packet from this VM to this destination actually go?
az network watcher show-next-hop \
  --vm myvm \
  --resource-group my-rg \
  --source-ip 10.1.1.4 \
  --dest-ip 10.1.2.5 \
  --nic myvm-nic
# Returns the next hop type (VnetLocal, VirtualAppliance, VnetPeering,
# VirtualNetworkGateway, Internet, None) and the next hop IP. A type of None
# means the packet is black-holed: a routing fault, with no NSG to blame.

For filtering, stop three and stop five, the tools are the effective security rules and IP Flow Verify. The effective security rules view merges subnet and NIC NSGs into the ordered list actually applied, and IP Flow Verify simulates one five-tuple and returns Allow or Deny with the deciding rule named. Use IP Flow Verify before editing any rule, because its verdict tells you whether the NSG is even the layer at fault.

For the whole connection at once, Connection Troubleshoot runs an end-to-end check between a source and a destination and reports where it failed, which is a good first move when you do not yet know which stop to suspect, because it points you at a stop and then you switch to the specific tool for that stop.

# Connection Troubleshoot: end-to-end check that reports the failing hop.
az network watcher test-connectivity \
  --resource-group my-rg \
  --source-resource myvm \
  --dest-address sqlsrv-prod.database.windows.net \
  --dest-port 1433
# Read the result for the hop where it failed, then switch to the stop-specific
# tool: a name issue sends you to nslookup, a routing issue to Next Hop, a
# filtering issue to IP Flow Verify.

For traffic visibility over time, NSG flow logs record the allow and deny decisions on a per-flow basis, which is how you find a denial that happened minutes ago when nobody was watching, and how you confirm that a routing black-hole left no NSG evidence, which itself is evidence. Flow logs answer the historical question that the live tools cannot: what actually happened to the traffic that failed before you started looking. The deep treatment of flow logs and rule evaluation belongs to the NSG deep dive, but their place in the model is clear: they are the record that lets you reconstruct which stop a past failure died at, and the absence of a denial in them for a timed-out flow is itself a pointer back to routing.

The toolkit’s organizing principle is that you choose the tool by the stop you suspect, and you suspect the stop by walking the map. An engineer who reaches for IP Flow Verify reflexively will sometimes get a clean Allow and be left confused, because the fault was at routing or DNS; an engineer who walks the map first reaches for the right tool the first time. The tools are excellent, but they are answers to specific questions, and the model is what tells you which question to ask.

A triage procedure you can run under pressure

When a connectivity incident lands and people are waiting, the temptation is to start changing the thing you changed last time. The model gives you a faster path: a fixed triage order that localizes the fault in three or four questions, each answered by one command. Run it in order and stop at the first failing answer, because the first failing layer is the fault and the layers below it are irrelevant until it is fixed.

The first question is always about the address. Resolve the destination name from the source host and compare the result to what you expect. If the address is wrong or absent, you are done localizing: the fault is name resolution, the resolver chain is the place to look, and you have spent thirty seconds proving that routes and rules are not the issue. If the address is correct, the application is aiming at the right target and you move on.

The second question is about the path. Ask Next Hop where a packet from the source to that address goes, or read the effective route table for the prefix that contains it. If the next hop is None, or an appliance that should not be there, or missing for an on-premises destination, the fault is routing, and you fix the route or the appliance or the gateway advertisement. If the next hop is the sensible one, VnetLocal for an in-VNet destination, a peering for a peered VNet, a gateway for on-premises, then the path is right and you move on.

The third question is about permission. Run IP Flow Verify for the exact five-tuple in the inbound direction at the destination, and if you suspect egress filtering, in the outbound direction at the source. If it returns Deny, you have the rule name and you fix that rule. If it returns Allow in both directions, filtering is not the fault, and you have proven it rather than assumed it.

The fourth question, reached only when the first three pass, is about the destination itself and the symmetry of the return path. A connection refused with the address, route, and rules all correct means nothing is listening on the destination, which is an application or host-firewall question, not an Azure networking one. An intermittent or one-directional failure with the first three clean points at asymmetric routing, and you trace the reverse path the way you traced the forward path, looking for a stateful hop that sits on one path and not the other.

This order is not arbitrary; it follows the packet’s own sequence, address then path then permission then destination, so each question is only meaningful once the previous one has passed. Asking about a rule before confirming the route is asking whether a guest is allowed into a building the taxi never arrived at. The procedure feels slower than jumping straight to the NSG, and it is faster every time, because it spends a few seconds proving each layer instead of an hour fixing the wrong one. Keep the four questions and their four commands somewhere you can see them during an incident, and the model becomes a reflex rather than a thing you reconstruct under stress.

What the model deliberately leaves to the specialized articles

The route-then-filter model is a frame, not a complete reference, and being clear about its edges keeps it trustworthy. It tells you which layer a fault lives in and which tool inspects that layer; it does not replace the depth each layer rewards. The precise precedence interactions among many overlapping user-defined routes, the forced-tunneling configuration, and the subtleties of BGP route selection belong to the routing deep dive. The full rule-evaluation order across augmented rules, service tags, application security groups, and the exact behavior of NSGs with private endpoints belong to the filtering deep dive. The mechanics of public and private zones, conditional forwarding, split-horizon resolution, and the resolver’s internal behavior belong to the DNS deep dive. The trade-offs among peering, VPN, and a private circuit, and the topology patterns that compose them, belong to the connectivity comparison and the architecture articles.

What the model gives you that none of those individually give you is the connecting tissue: the single path made of stops, the rule that routing precedes filtering and resolution precedes both, and the map that turns any connectivity question into a position you can inspect. The specialized articles go deep on one stop; the model is what lets you stand at the whole path and say, with confidence and in seconds, which stop to go deep on. Use them together, and the cluster of networking articles in this series stops being a pile of separate topics and becomes one coherent way of reasoning about how a packet gets from a client to a service in Azure.

Verdict

In Azure networking, routing chooses the path and security groups filter it, and name resolution decides the address both of them operate on, three separate decisions made by three separate mechanisms in a fixed order. Almost every connectivity problem reduces to identifying which of those three failed, and the cost of not separating them is the hours engineers lose fixing the wrong layer. The packet-path map is the artifact that keeps them separate: six stops, each owning one decision, each inspected by one tool. Walk the map in order, prove each layer before changing it, and let the tools name the fault rather than guessing at it. An engineer who internalizes the route-then-filter model and the triage order it implies will diagnose in minutes what an engineer who blends the layers chases for an afternoon, and that difference, repeated across every incident, is the whole return on learning the fundamentals deliberately rather than absorbing them by accident.

Frequently Asked Questions

Q: What are the Azure networking fundamentals an engineer actually needs?

The fundamentals are fewer than the portal’s menu suggests. An engineer needs the virtual network as the private address space and boundary, the subnet as the attachment point for routing and filtering, the routing fabric of system routes and user-defined routes that decides where a packet goes, the Network Security Group that decides whether a packet is allowed, name resolution that produces the address in the first place, and the edges (peering, gateways, private endpoints, the internet) through which a VNet reaches beyond itself. Above all, an engineer needs the relationship among these: routing chooses the path, filtering permits or denies it, and resolution decides the target, in that order. Memorizing portal blades teaches you where settings live; understanding this relationship teaches you what they do to a packet, which is what lets you reason about a problem you have never seen before instead of matching it to one you have.

Q: How do VNets and subnets form the base of Azure networking?

The virtual network defines a private address space, and subnets carve that space into segments. The VNet is not a router or a firewall on its own; it is the container that gives addresses meaning and supplies the default routing fabric and the DNS setting that every resource inside inherits. The subnet matters because it is the unit that route tables and Network Security Groups attach to, so the path decision and the permission decision are both made at the subnet boundary. A common mistake is to treat the subnet as a wall that isolates traffic, but by default resources in different subnets of the same VNet can reach each other, because the system route carries intra-VNet traffic and the default rules allow it. Reach is the default; isolation is something you add with filtering. Holding that asymmetry in mind is the difference between expecting the network to behave and being surprised by it.

Q: How does routing work by default and when I add user-defined routes?

By default, Azure populates system routes that make the obvious destinations reachable: the VNet’s own range, peered VNets, the internet, and a few reserved ranges that are dropped. You never edit these; you override them with user-defined routes in a route table attached to a subnet. The lookup is longest-prefix-match, so the most specific route for a destination wins, and among sources a user-defined route beats a route learned over BGP, which beats a system route. The practical consequence is that attaching a route table is how you change or break the default fabric, sending traffic to a firewall appliance, to a gateway, or to a deliberate black hole with next hop None. When debugging, you do not reason about the sources separately; you read the effective route table on the interface, which shows the merged, post-precedence answer for every prefix, and that table tells you exactly where a packet will go.

Q: How does NSG filtering fit into the networking model?

Filtering layers on top of routing. Once routing has decided where a packet goes, a Network Security Group decides whether it is allowed, by walking its rules in priority order from the lowest number and stopping at the first match. NSGs attach to subnets and to network interfaces, and both apply, so a packet must be permitted at every layer it passes; an allow on the subnet means nothing if a NIC rule denies. The defaults allow intra-VNet traffic and load balancer probes and deny other inbound, which encodes the model directly: reach inside the VNet is granted, exposure is deliberate. NSGs are stateful, so return traffic for an allowed connection flows back automatically. The key to placing filtering correctly is that it can only ever explain a failure where the packet reached a boundary; a packet that never arrived because routing sent it nowhere produces no NSG evidence, which is why you confirm the route before you examine the rules.

Q: How does name resolution work in Azure networking, and why is it separate?

Name resolution is the transaction that turns a name into an address before any packet is routed or filtered, and it is separate because it has its own machinery and its own failures. A resource sends its query to the DNS server its VNet is configured to use, either the Azure-provided resolver or a custom server, and the answer depends on that resolver and the zones it consults. It is separate from connectivity because a name can resolve to the wrong address, sending a perfectly routed and permitted packet to the wrong place, or fail to resolve at all, so no packet is ever formed. Treating resolution as part of the network is the conflation that produces the most circular debugging, because the engineer examines routes and rules for a packet that was misaimed or never created. The fix is to resolve the name from the application’s host and compare the answer to what you expect before assuming the problem is the network.

Q: How do VNets connect to on-premises networks and to each other?

A VNet connects to other VNets through peering, which establishes private reach on the Azure backbone between the two directly peered networks. Peering is not transitive, so two spokes peered to a common hub cannot reach each other through the hub unless you build a transit path with a gateway or an appliance. A VNet connects to on-premises through a VPN gateway, which terminates an encrypted tunnel over the internet, or through an ExpressRoute gateway, which uses a private circuit from a provider for higher and more predictable bandwidth. In the model, each of these edges is a next hop type in the routing fabric: peering, a gateway, or the internet. When connectivity across an edge fails, the first question is whether the route to that destination exists and points at the right edge, because a missing or withdrawn route is a far more common cause than a filtering rule, even though it is often escalated as one.

Q: What is the difference between a routing problem and a filtering problem?

A routing problem is a geography problem: the packet went to the wrong place or to a place that swallowed it, so it never reached the destination at all. A filtering problem is a permission problem: the packet reached the destination boundary and was turned away by a rule. The symptoms can both look like a timeout, but they differ in a usable way. A routing black hole leaves no filtering evidence, because the packet never reached a Network Security Group, so a timeout with no corresponding denial in the flow logs points at routing. A filtering denial can be reproduced with IP Flow Verify, which returns Deny and names the rule, while a routing fault shows up in the effective route table and in Next Hop as a wrong or None next hop. Confirming which one you have, before changing anything, is the single habit that prevents the most wasted effort, because the fix for one does nothing for the other.

Q: Why does my traffic time out with no NSG denial in the logs?

A timeout with no denial in the NSG flow logs is the signature of a routing failure, because filtering produces evidence and routing black holes do not. If a user-defined route sends the destination prefix to a next hop of None, or to a network virtual appliance that is powered off or misconfigured, the packet is dropped at the routing layer before it ever reaches a Network Security Group, so there is nothing for the NSG to log. The absence of a denial, which feels like a dead end, is actually the clue: it tells you the packet never reached a filtered boundary. Confirm it by reading the effective route table on the source interface and by running Next Hop for the destination address; a next hop of None or an unexpected appliance is your answer. The fix is in the route table or the appliance, not in any rule, and editing rules while this fault persists changes nothing.

Q: Does a subnet isolate traffic by default in Azure?

No. A subnet is an attachment point for routing and filtering, not a wall. By default, resources in different subnets of the same virtual network can reach each other on any port the destination is listening on, because the system route carries intra-VNet traffic and the default Network Security Group rules allow inbound from the VNet. Engineers from on-premises backgrounds, where a VLAN often implies a firewall, are repeatedly surprised by this. If you want isolation between subnets, you create it deliberately with Network Security Groups, adding denies or scoped allows that subtract from the default reach. After you impose the segmentation, verify it with IP Flow Verify in both the should-pass and should-fail directions, because a segmentation you designed but never tested is one you cannot be sure you have. The model’s underlying rule is that reach is the default and restriction is something you add on purpose.

Q: Why does a private endpoint connection fail when the routing looks correct?

Because the failure is almost always name resolution, not routing. A private endpoint is a network interface in your subnet with a private IP, and the routing to that private IP is the ordinary intra-VNet route, which is rarely broken. What breaks is the resolution: the service’s name must resolve to the private IP, and that only happens when a private DNS zone for the service is linked to the VNet and the VNet’s DNS setting consults it. If the zone is missing, unlinked, or bypassed, the name resolves to the service’s public IP instead, and the packet is correctly routed and permitted to the wrong target. The tell is that the routing inspection passes cleanly while the connection still fails. Confirm it by resolving the name from inside the VNet and checking whether you get the private IP or a public one. The detailed setup and failure handling live in the dedicated DNS and private endpoint articles, but the model’s lesson is to suspect resolution first.

Q: In what order should I troubleshoot an Azure connectivity problem?

Follow the packet’s own sequence and stop at the first failing layer. First, resolve the destination name from the source host and compare it to the address you expect; if it is wrong or absent, the fault is name resolution and you stop there. Second, check the path with Next Hop or the effective route table; if the next hop is None, an unexpected appliance, or missing, the fault is routing. Third, run IP Flow Verify for the exact five-tuple; if it returns Deny, the fault is filtering and you have the rule name. Fourth, reached only when the first three pass, check whether anything is listening on the destination and whether the return path is symmetric, because an intermittent or one-directional failure with clean routing and rules points at asymmetric routing. This order works because each question is only meaningful once the previous one passes, and it is faster than jumping to the Network Security Group because it spends seconds proving each layer instead of an hour fixing the wrong one.

Q: What exactly is the route-then-filter model?

It is the claim that in Azure networking, routing chooses the path a packet takes and Network Security Groups filter whether the packet is allowed on that path, so the two decisions are distinct, made by distinct mechanisms, in a fixed order that never reverses. Routing runs first and answers where the packet goes; if no route exists or the route black holes the packet, filtering never gets a say. Filtering runs at the boundary and answers whether the packet may pass; a deny ends the journey with no reply. Name resolution sits upstream of both, deciding the address they operate on. The model’s value is diagnostic: it turns any connectivity question into “which route and which rule,” and it keeps you from fixing a filter for a routing fault or a route for a resolution fault. Almost every connectivity problem in Azure resolves cleanly once you decide which of those three layers actually failed.

Q: Are Network Security Groups stateful, and what does that mean for me?

Yes, NSGs are stateful, which means that when a rule allows a connection, the return traffic for that established connection is permitted automatically without a matching rule in the opposite direction. A single inbound allow on port 443 is enough to serve HTTPS, because the responses flow back on the established connection. This statefulness is a per-hop property, and that detail matters when you introduce a stateful device like a firewall into the path. If the request takes one path through the firewall and the reply takes a different path that bypasses it, the firewall never recorded the connection’s outbound half, so it treats the return traffic as unsolicited and drops it. That is the mechanism behind asymmetric-routing failures after a topology change. The takeaway is that statefulness saves you from writing paired rules in the simple case, but it depends on both directions of a connection traversing the same stateful device, which is something your routing design has to guarantee.

Q: Which tool should I use to find where an Azure packet is dropped?

Choose the tool by the layer you suspect, which you determine by walking the packet-path map. For name resolution, resolve the name from the source host with nslookup or dig and compare the address. For routing, use Next Hop in Network Watcher to see where a packet to a destination goes, or read the effective route table on the interface. For filtering, use IP Flow Verify to simulate the exact five-tuple and get an Allow or Deny verdict with the deciding rule named, and read the effective security rules for the merged rule set. When you do not yet know which layer to suspect, run Connection Troubleshoot, which performs an end-to-end check and reports the failing hop, then switch to the specific tool for that hop. NSG flow logs answer the historical question of what happened to traffic that failed before you started looking. The tools are precise answers to specific questions, and the model is what tells you which question to ask first.

Q: Why can two virtual machines in the same VNet reach each other without any configuration?

Because reach inside a virtual network is the default behavior. The platform creates a system route for the VNet’s own address range with a next hop that delivers traffic directly within the VNet, and the default Network Security Group rules allow inbound traffic from the VNet. With nothing added, those two defaults combine to let any resource in the VNet reach any other on any port the destination is listening on. This is intentional and reflects the model’s asymmetry: routing grants reach broadly, and you subtract from it with filtering when you want boundaries. If you intended the two machines to be isolated, the configuration you are missing is a Network Security Group rule that denies the traffic, not a change to routing. Confirm the current behavior with IP Flow Verify between the machines, which will show an Allow decided by the default VNet-inbound rule, and then add the scoped deny that expresses the isolation you actually want.

Q: Does a name resolution failure really look like a network problem?

Yes, and that disguise is why it wastes so much time. When an application cannot resolve a name, it logs something like “could not connect to” the destination, and an engineer reads “could not connect” and starts examining the network. But no packet was ever routed or filtered, because the name never produced an address to aim at, or produced the wrong one. The failure happens upstream of the wire entirely. The way to strip off the disguise is to resolve the name from the application’s own host and look at the answer: a wrong address, a public IP where you expected a private endpoint, or an NXDOMAIN or timeout all point at resolution rather than connectivity. Running that one command before touching routes or rules conclusively separates the resolution layer from the path and permission layers, and it is the highest-return habit in the entire model because the failure so convincingly impersonates a network outage.

Q: What is a next hop, and how do I see which one my traffic uses?

A next hop is the place a packet is sent on its way toward its destination, the answer the routing lookup produces. It has a type, such as VnetLocal for a destination inside the same VNet, VnetPeering for a peered VNet, VirtualNetworkGateway for on-premises through a gateway, VirtualAppliance for a network virtual appliance with an explicit IP, Internet for the default outbound edge, or None for a deliberate or accidental black hole. You see the next hop for a specific destination with the Next Hop feature in Network Watcher, which takes a source and a destination address and returns the type and IP the platform will use, or you read the effective route table on the interface and find the most specific prefix containing your destination. A next hop type of None is an immediate finding: the packet is being dropped at the routing layer, which explains a timeout that has no corresponding Network Security Group denial.

Q: Why is a connectivity failure that affects only certain destinations a routing question?

Because the scope of the failure points at the mechanism. Routing decisions are made per destination prefix, so a fault that affects only some destinations, such as everything on-premises while everything inside Azure works, is almost certainly a route specific to those destinations. Filtering, by contrast, tends to fail per port or per source-and-destination pair rather than cleanly along a whole class of addresses, and name resolution tends to fail for specific names rather than for routable ranges. When you see a failure neatly bounded by a set of destination addresses, the first thing to check is the route for that range: does it exist, does it point at the right next hop, has a BGP advertisement been withdrawn, is a user-defined route shadowing it. Reasoning from the shape of the failure to the likely layer saves a step, and a destination-scoped failure has the shape of a routing problem far more often than a filtering one.

Q: How do I reason about traffic that passes through a load balancer or application gateway?

Treat the device as a touchpoint that may split one path into two. A layer-four load balancer distributes connections across a backend pool and checks each backend with a health probe, and that probe must be allowed inbound by the backends’ rules, which is why a default rule permits the Azure load balancer; a failing probe marks a backend unhealthy and it silently receives no traffic. A layer-seven application gateway terminates the client connection and re-originates a new one to the backend, so the backend sees the gateway’s address as the source and the path becomes two segments, client to gateway and gateway to backend, each with its own routing and filtering. When a fronted service misbehaves, first decide which segment failed, then apply the route-then-filter analysis to that segment. Treating the device as a single opaque hop is how engineers miss that the front half is healthy while the back half is denied, or that one pool member is unhealthy while the rest serve fine.

Q: What does the Azure networking fundamentals model intentionally leave out?

The model is a frame for placing faults, not a complete reference for any one layer. It tells you that a problem lives in resolution, routing, or filtering, and which tool inspects that layer, but it does not replace the depth each layer rewards. The detailed precedence among many overlapping user-defined routes, forced tunneling, and BGP route selection belong to the routing deep dive. The full rule-evaluation order across service tags, application security groups, augmented rules, and private endpoint behavior belong to the filtering deep dive. The mechanics of public and private zones, conditional forwarding, and split-horizon resolution belong to the DNS deep dive. The trade-offs among peering, VPN, and a private circuit belong to the connectivity comparison. What the model adds that none of those give alone is the connecting tissue: one path, made of stops, with a fixed order, so you can stand at the whole path and decide in seconds which stop to go deep on.