Azure Front Door vs CDN vs Application Gateway

Three services in Azure sit in front of an application and speak HTTP, and engineers reach for whichever name they heard first. Azure Front Door versus CDN versus Application Gateway is the comparison that decides whether a request crosses the planet to the nearest edge or terminates inside a single region, whether a response comes from a cache or from your servers, and where a web application firewall inspects the traffic. Treating the three as interchangeable produces the two failures this comparison exists to prevent: putting a regional gateway where a worldwide entry point belongs, and standing up three products that compete when the right design layers them.

The confusion is reasonable. All three terminate TLS, all three understand paths and host headers, and all three can carry a WAF in some form. Read the marketing pages and they blur together into “the thing that fronts my web app.” The differences that matter are not in the feature checklist. They are structural: scope (does the resource live everywhere or in one region), purpose (does it accelerate and cache content or proxy and balance an application), and placement (where in the path the security and the routing decisions happen). Get those three axes right and the choice becomes mechanical.

Azure Front Door vs CDN vs Application Gateway edge decision - Insight Crunch

This guide treats the three as what they are: a global layer-7 entry point, a global content-caching network, and a regional layer-7 reverse proxy. By the end you will be able to place any incoming need against the right tool by asking two questions in order, and you will know how to compose them when one alone does not cover the requirement.

The global-versus-regional-then-caching rule

Here is the decision rule the rest of this article defends, stated once so you can carry it: the first cut is global versus regional, and the second cut is whether caching is the point. Azure Front Door and Azure CDN are global; Application Gateway is regional. Among the two global options, CDN exists to cache and accelerate content while Front Door exists to route and secure an application at the edge. Among everything, Application Gateway is the one that proxies and load-balances inside a region. Two questions, asked in that order, place the right tool nearly every time.

The rule works because it follows the architecture rather than the brochure. A global service answers from a network of edge locations spread across the world, so the client connects to the nearest one and the distance to your origin stops mattering for the first hop. A regional service is a resource you deploy into one region’s virtual network, and a client in another continent reaches it across the open internet. That single structural fact, where the service physically lives, determines latency behavior, failover scope, and how you would ever make it worldwide.

Is the first question always global versus regional?

Yes, because scope is the axis you cannot change later without redesigning. A regional Application Gateway never becomes global by configuration; you make it worldwide by deploying one per region and placing a global service in front. Front Door and CDN are global by construction. Decide scope first, then refine.

What does the second question actually decide?

Whether the workload is content delivery or application proxying. CDN caches static assets near users and accelerates dynamic fetches; it is built around the cache. Front Door routes requests to application origins with health probing, path rules, and an edge WAF. If the job is serving files fast, the answer leans CDN; if the job is fronting an app, it leans Front Door.

Can the two questions ever give the wrong answer?

They can under-specify when the design needs both a global entry and regional proxying, which is the composition case rather than a contradiction. The rule still holds: global versus regional selects the outer layer, and a regional Application Gateway sits behind it. The questions choose layers, not a single winner, when the requirement spans both.

The value of stating the rule this plainly is that it survives the feature churn. Azure renames things, folds one product’s capabilities into another, and ships new SKUs, but the structural distinction between a worldwide edge network and a region-bound proxy does not move. When a new acceleration feature appears on one of these services, you can place it immediately by asking which layer it belongs to rather than relearning the whole comparison.

How Azure Front Door actually works

Azure Front Door is a global, layer-7 entry point that receives client requests at Microsoft’s edge network and forwards them to your application origins. The mechanism that makes it global is anycast: Front Door advertises its IP addresses from many points of presence simultaneously, and the internet’s routing delivers each client to the nearest healthy edge location. A user in Sydney and a user in Frankfurt hit the same Front Door profile but enter through different physical sites, and from there Microsoft’s backbone carries the request toward your origin rather than the public internet.

At that edge location, Front Door makes several decisions before the request ever reaches your servers. It terminates TLS, so the handshake completes close to the user instead of after a transcontinental round trip. It evaluates routing rules, matching the host header and the path against the routes you defined to pick an origin group. It can serve a cached response directly from the edge if caching is enabled and the object is fresh. And it can run a web application firewall policy that inspects the request and blocks malicious patterns at the edge, before the traffic consumes any of your origin’s capacity.

What does Front Door route to?

Front Door routes to origins grouped into origin groups. An origin is a backend endpoint identified by hostname or IP, such as an App Service, a storage account serving static content, a public load balancer, or a regional Application Gateway. The origin group is the unit Front Door health-probes and load-balances across, distributing requests and removing unhealthy members.

The origin group is the heart of Front Door’s resilience. You place two or more origins in a group, Front Door probes each one on a schedule, and it directs traffic only to origins that pass the probe. When you spread origins across regions, Front Door becomes a global load balancer: it sends each user to the closest healthy origin and fails over automatically when a region goes dark. This is the behavior that makes Front Door the natural worldwide entry point for a multi-region application, and it is why a regional gateway cannot fill the same role on its own.

Front Door comes in tiers, and the tier governs which capabilities you get. The Standard and Premium tiers replaced the earlier classic profiles, and the Premium tier adds private connectivity to origins through Private Link and a managed rule set for the WAF along with bot protection. Treat the exact tier feature split and any pricing as values to confirm against the current official Azure documentation at read time, because Microsoft revises tier boundaries and folds features in over time. The durable point is that Front Door is the layer that lives at the edge worldwide and routes to your application, with security and caching available at that same edge.

When you set up Front Door routing rules in detail, the routes, origin groups, and rule sets are where the behavior is defined, and a misconfigured route is the most common reason traffic reaches the wrong origin or returns an error. The end-to-end setup of those routing rules deserves its own treatment, and the steps to define routes, attach origin groups, and order rule precedence are covered in the dedicated walkthrough on configuring Front Door routing rules.

A Front Door failure usually shows up as a 502 from the edge when the origin is unreachable or unhealthy, and diagnosing it means separating an edge problem from an origin problem. The probe configuration, the origin health, and the path the probe takes are the first things to check, and the full diagnosis of why Front Door returns a 502 origin error walks through each cause and its confirming signal.

How Azure CDN actually works

A content delivery network exists to put copies of your content close to users so that a request for a file is answered from a nearby cache rather than from a distant origin. Azure CDN is the global caching and acceleration layer in this comparison. When a user requests an object, the request lands at the nearest CDN edge node. If that node already holds a fresh copy, it serves the object immediately, and the round trip to your origin never happens. If the node does not hold the object, it fetches it once from the origin, stores it according to your caching rules, and serves every subsequent request for that object from the edge until it expires.

The benefit is two-sided. Users see lower latency because the bytes travel a shorter distance, and your origin sees far less load because the edge absorbs the repeated requests. For a site that serves images, scripts, stylesheets, downloads, or video segments, a CDN can turn the origin from a bottleneck into a service that only handles cache misses and dynamic requests. This is the workload the CDN is shaped around: high-volume, cacheable content delivered to a geographically spread audience.

What is the difference between caching and acceleration?

Caching stores a copy of a response at the edge and serves it without contacting the origin. Acceleration improves the delivery of content that cannot be cached, such as personalized or dynamic responses, by carrying the request over an optimized backbone path and reusing connections rather than re-traversing the open internet for every hop.

When is a CDN the right tool over Front Door?

When the dominant need is serving cacheable content fast to a worldwide audience and you do not need application-level routing, health-based origin failover, or an edge WAF in front of an app. A media library, a static site, or a downloads endpoint fits the CDN’s purpose directly, and reaching for a full application front door would add machinery the workload does not use.

Azure’s CDN story has been consolidating. Microsoft has been folding content-delivery capabilities into Front Door so that a single global edge product handles both application routing and content caching, and some classic CDN profiles and partner offerings have been on a retirement path. Because that lineup changes, verify the current set of CDN products and providers against the official Azure documentation when you choose, rather than assuming a specific profile type still exists. The durable concept is unchanged regardless of the product name: a CDN is the global layer whose reason to exist is caching and acceleration of content, distinct from an application’s routing and proxy needs.

The practical consequence of the consolidation is that the line between “CDN” and “Front Door caching” has blurred at the product level even though the conceptual roles stay separate. Front Door can cache, which means a single resource can serve as both the application’s global entry and its content cache. You still decide by purpose: if caching content is the whole job, the caching-focused option is simplest; if you need routing, failover, and an edge WAF in front of an application and also want caching, the application front door covers both.

How Application Gateway actually works

Application Gateway is a regional, layer-7 reverse proxy that you deploy into a subnet inside your virtual network. Unlike the two global services, it does not live at the edge and it does not span regions. It is a resource bound to one region, sitting on a private network, receiving requests on a frontend IP and forwarding them to a pool of backend targets. Because it lives inside the VNet, it can reach private backends directly, which is one of its defining advantages over a purely public global service.

The request path through Application Gateway is a chain of components, and understanding the chain is what lets you debug it. A listener accepts traffic on a frontend IP and port for a given protocol and hostname. A routing rule ties that listener to a backend pool and a set of backend HTTP settings. The backend pool is the collection of targets, which can be IP addresses, fully qualified domain names, NICs, or scale sets. The backend HTTP setting defines how the gateway talks to those targets: the protocol, the port, the timeout, cookie-based session affinity, and whether to probe their health. A health probe checks each backend and the gateway sends traffic only to healthy members.

What is the difference between a backend pool and a backend HTTP setting?

The backend pool is who the targets are, the list of servers or endpoints that can receive traffic. The backend HTTP setting is how the gateway communicates with them: the protocol, the port, the request timeout, the affinity behavior, and the probe. One pool can be reused with different backend HTTP settings, and the two are configured separately for that reason.

Application Gateway routes by path and by host. Path-based rules send requests for different URL paths to different backend pools, so a single gateway can front an images pool, an API pool, and a static pool. Multi-site hosting lets one gateway serve several hostnames, each with its own listener and rule set. The v2 SKU autoscales, supports zone redundancy, and is the current generation for new deployments. The WAF v2 SKU adds an inspecting web application firewall to the same regional proxy. As always, confirm SKU capabilities and any throughput or connection limits against the current official documentation, because the SKU lineup and its numbers change.

The comparison engineers most often confuse is Application Gateway against the layer-4 load balancer, because both balance traffic inside a region. The distinction is the layer: Application Gateway understands HTTP, so it can route by URL path and host and terminate TLS, while the load balancer operates on TCP and UDP without seeing the application protocol. That layer-4 versus layer-7 decision inside a region is its own comparison, and the reasoning for choosing the load balancer versus Application Gateway is worked through in the dedicated piece on that pairing.

When Application Gateway carries a WAF, the firewall runs regionally, inside the gateway, inspecting requests after they have crossed the internet to your region. That is a different placement from the edge WAF in Front Door, and the placement has real consequences for how early malicious traffic is stopped. Configuring the WAF on Application Gateway, its policy modes, and its rule tuning are covered in the dedicated guide to setting up Application Gateway WAF.

The InsightCrunch edge decision table

The findable artifact for this comparison is a decision table that maps each common need to the right tool and names the signal that decides it. The table is the fast path: find the row that matches your requirement, read the deciding signal, and you have your answer without rerunning the whole analysis. The claim it encodes is the global-versus-regional-then-caching rule, made concrete for the needs engineers actually bring.

Need	Front Door	CDN	Application Gateway	Deciding signal
Global routing to multi-region app	Yes, primary fit	No	No	Worldwide entry with health-based failover across regions
Cache and accelerate static content	Yes, caching available	Yes, primary fit	No	Content delivery near users; CDN is the caching-first option
Regional layer-7 reverse proxy	No	No	Yes, primary fit	HTTP routing and load balancing inside one region, reaching private backends
Web application firewall at the edge	Yes, edge WAF	Folded into Front Door	No	Inspection at the point of presence before traffic reaches the region
Web application firewall in the region	No	No	Yes, regional WAF	Inspection inside the VNet, close to the backends
Route by URL path to backend pools	Yes, route rules	No	Yes, path rules	Both do path routing; the layer below decides global versus regional
Reach private VNet backends directly	Premium, via Private Link	No	Yes, native	Private connectivity to origins on the internal network
Worldwide low latency for users	Yes, anycast edge	Yes, edge nodes	No	Edge presence shortens the first hop; a regional resource cannot

Read the table as a set of rows you scan, not as a scorecard you total. The wrong way to use it is to count the “Yes” marks and pick the service with the most. The right way is to find the row that names your dominant requirement and follow the deciding signal. If the dominant need is global routing to a multi-region application, the first row settles it. If the dominant need is regional proxying to private backends, the third and seventh rows settle it. The table exists to turn a requirement into a tool in one lookup.

The rows also reveal the composition cases, which are the ones the table makes visible that a feature list hides. The path-routing row shows that Front Door and Application Gateway both route by path, which is precisely why a design can use both: Front Door routes globally to the right region, and the Application Gateway in that region routes by path to the right backend pool. The two WAF rows show that you can inspect twice, at the edge and again in the region, when the threat model justifies it. The table is therefore not only a chooser but a map of how the layers stack.

Front Door versus CDN versus Application Gateway: the global cut

The single most useful thing to internalize from Front Door versus CDN versus Application Gateway is the global-versus-regional split, because it is the cut that the other distinctions hang from. Front Door and CDN are global services. Each is a single logical resource that is present at edge locations around the world, and a client always enters through the nearest one. There is no “which region is my Front Door in” question, because the answer is everywhere the edge network reaches. Application Gateway is the opposite: it is a regional resource that you place into a specific region’s virtual network, and a client far away reaches it across the open internet with all the latency that implies.

Why does global versus regional decide so much?

Because scope sets latency behavior, failover reach, and the path traffic takes. A global service answers from the nearest edge, so a distant user gets a short first hop and Microsoft’s backbone carries the rest. A regional service answers only from its region, so a distant user crosses the public internet end to end and any failover stays within that region unless something global sits in front.

A worked illustration makes the difference concrete. Suppose your application runs in two regions for resilience, one in the United States and one in Europe, and your users are spread across both continents and Asia. With Application Gateway alone you would deploy a gateway in each region, but nothing decides which gateway a given user should reach or fails them over when a region drops. Each gateway is an island. Put Front Door in front and the picture changes: the user enters at the nearest edge, Front Door health-probes both regional origins, and it sends each request to the closest healthy region and reroutes automatically when one region fails. The regional gateways still do their job inside each region, but the worldwide decision now has an owner.

The reverse mistake is just as common: using a global service when the workload is single-region and internal. If an application serves one region, lives entirely inside a virtual network, and never needs worldwide entry or edge caching, fronting it with a global product adds a layer that buys nothing for that workload and complicates the path. A regional Application Gateway is the right size for a regional application. The global cut cuts both ways: do not regionalize a worldwide need, and do not globalize a regional one.

There is a subtler reason the global cut matters for operations. When the entry point is global and health-aware, a regional outage becomes a reroute rather than an incident, because the global layer stops sending traffic to the failed region on its own. When the entry point is regional, a regional outage takes the application down for everyone routed there, because nothing above it knows to move them. The resilience you can build is bounded by the scope of your outermost layer, which is the strongest practical argument for getting the global-versus-regional decision right before anything else.

Where the web application firewall sits in each

All three can offer a web application firewall in some form, which is exactly why the feature checklist misleads. The question that matters is not whether a WAF exists but where it inspects. Placement changes how early an attack is stopped, how much of your capacity malicious traffic consumes, and how you operate the rules.

Front Door’s WAF runs at the edge. The policy is evaluated at the point of presence where the client connected, which means a blocked request is rejected near the user and never travels to your region or touches your origin’s capacity. For a global application under broad attack, edge inspection is the difference between absorbing the attack at Microsoft’s edge and absorbing it on your own servers. The edge WAF sees traffic for the whole application worldwide because every request enters through the edge.

Application Gateway’s WAF runs regionally, inside the gateway, on the virtual network. A request reaches it only after crossing the internet into your region, so the inspection happens closer to the backends and farther from the attacker. That placement has its own merits: the regional WAF sits in the same network as the application, it can be the single chokepoint for traffic that did not arrive through a global edge, and it inspects requests destined for that region’s backends specifically. The trade-off is that malicious traffic has already consumed a network path into your region before it is rejected.

Should I run a WAF at the edge or in the region?

Run it at the edge when the application is global and you want attacks stopped before they reach your infrastructure. Run it in the region when traffic enters regionally or when the WAF must sit in the same network as the backends. Run both when a global entry feeds regional gateways and the threat model justifies inspecting twice.

The CDN’s place in the WAF story has shifted with the consolidation. Edge security that once attached to CDN profiles now lives in Front Door, which is the global edge product that carries the WAF. Treat the edge WAF as a Front Door capability rather than a CDN one, and verify the current arrangement against the official documentation, since Microsoft continues to consolidate these features. The conceptual placement is stable: edge inspection belongs to the global front door, regional inspection belongs to the regional gateway, and the CDN’s job remains caching rather than application firewalling.

When both layers carry a WAF, the rules are tuned for their position. The edge policy can stop broad, volumetric, and signature-based attacks early; the regional policy can enforce stricter, application-specific rules close to the backends where the application context is clearest. Tuning the regional WAF, choosing detection versus prevention mode, and managing custom rules are covered in the guide to configuring Application Gateway WAF, and the same conceptual modes apply at the edge with the policy attached to Front Door instead.

Caching at the edge versus at the origin

Caching is the capability that most clearly separates the content-delivery role from the application-proxy role. A global edge service can hold a copy of a response at the point of presence and serve it without contacting your origin at all. A regional Application Gateway does not cache; it proxies every request through to a backend. That single difference explains why a content-heavy workload belongs behind a global caching layer and why an application-proxy workload does not gain the same benefit from one.

When caching happens at the edge, the first request for an object travels to the origin, the edge stores the response, and every later request for that object within its freshness window is served from the edge. The origin sees one fetch instead of thousands, and users worldwide get the object from a nearby node. The levers are the cache key, which decides what counts as the same object, and the freshness rules, which decide how long the edge serves a stored copy before revalidating. Query strings, headers, and the path all factor into the cache key, and getting the key wrong either fragments the cache so nothing is reused or collapses distinct responses so users get the wrong one.

Why does my origin still get heavy traffic behind a cache?

Usually because responses are not cacheable as configured or the cache key is too specific. If the origin sends no-cache headers, sets cookies on cacheable assets, or varies on a header that changes per request, the edge cannot reuse a stored copy and forwards everything. A fragmented cache key, such as one that includes a unique query parameter, has the same effect.

Caching at the origin is a different layer with a different purpose. An application can cache computed results, database query outputs, or rendered pages in memory or in a distributed cache, which reduces the work the backend repeats. Edge caching and origin caching are complementary rather than alternatives: the edge cuts the network distance and the request volume reaching your region, while the origin cache cuts the computation a backend repeats once a request does arrive. A content workload leans on the edge; a compute-heavy application leans on the origin; many systems use both.

Application Gateway’s lack of a content cache is not an oversight; it reflects its role. A reverse proxy that load-balances and routes inside a region is concerned with which backend handles a request, not with avoiding the backend entirely. If you need both regional proxying and content caching, you do not make Application Gateway cache; you put a caching layer in front of it. That is another instance of composition: the global caching edge sits ahead of the regional proxy, each doing the job it is built for.

How each routes to origins and backends

The three services all forward traffic somewhere, but what they forward to and how they reach it differ in ways that matter for network design. Front Door routes to origins, which are public endpoints by default and can be private endpoints through Private Link on the Premium tier. CDN routes to an origin only on a cache miss, fetching the object once to populate the edge. Application Gateway routes to a backend pool of targets that live in or are reachable from its virtual network, including private IP addresses that never appear on the public internet.

The reachability distinction is the practical crux. A global edge service is, by nature, on the public internet, so by default it reaches origins over public endpoints unless you use the private-connectivity feature of the higher tier. A regional Application Gateway sits inside the virtual network, so it reaches private backends natively, with no public exposure of those backends at all. This is why a common production shape places public-facing global entry at the edge and keeps the actual application servers private behind a regional gateway: the gateway can talk to private IPs that the global layer would otherwise need Private Link to reach.

Can these services reach a private backend that has no public IP?

Application Gateway can natively, because it lives in the virtual network and routes to private addresses directly. Front Door can on the Premium tier through Private Link to supported origin types. A standard global edge service without private connectivity expects a reachable public endpoint, so a fully private backend is reached either through the gateway in the VNet or through the private-link feature.

Health probing is the shared mechanism that turns routing into resilient routing. Front Door probes the origins in an origin group and routes only to healthy ones, which is how it fails over across regions. Application Gateway probes the members of a backend pool and removes unhealthy ones from rotation inside the region. The two probe at different scopes, global and regional, but the principle is identical: do not send a request to a target that has not proven it can answer. Designing the probe path so it actually reflects backend health, rather than returning healthy while the application is broken, is one of the most common sources of a proxy that routes confidently to a dead backend.

How they compose rather than compete

The framing that does the most damage is treating Front Door, CDN, and Application Gateway as three candidates competing for one slot. In real production designs they layer. The canonical shape for a global, resilient web application places a global edge in front and regional proxies behind it, and each does the part of the job it is built for. Seeing this composition is what finally dissolves the “which one” question into “which one at which layer.”

The reference shape is Front Door in front of regional Application Gateways. Front Door is the worldwide entry point: it terminates TLS at the edge near the user, runs the edge WAF, optionally caches content, health-probes the regional origins, and routes each user to the nearest healthy region. In each region, an Application Gateway receives the traffic Front Door sent, runs its regional WAF, routes by path to the right backend pool, and reaches the private application servers inside the virtual network. The two are not redundant; they own different layers. Front Door owns the global decision of which region and whether to serve from cache; Application Gateway owns the regional decision of which backend and how to reach it privately.

Why would I run both Front Door and Application Gateway?

Because they solve different layers of the same problem. Front Door provides global entry, edge security, worldwide health-based failover, and optional caching; Application Gateway provides regional path routing, a regional WAF, and native access to private backends. A global application that keeps its servers private gets the worldwide layer from Front Door and the regional, private-reaching layer from the gateway.

Composition with the CDN follows the same logic. If the workload is a global application that also serves heavy static content, the global edge can both route the application and cache the content, since the edge product carries both roles after the consolidation. If you keep a separate caching layer, it sits at the same global tier, fronting the content paths while the application paths flow through to the regional proxies. The rule that keeps the design honest is that caching, edge security, and global routing all belong at the global layer, while regional proxying, regional security, and private backend access all belong at the regional layer.

The failure to compose shows up as two recognizable anti-patterns. The first is a regional gateway used where a global entry is needed, which leaves a multi-region application without a worldwide failover owner and without edge inspection or caching. The second is a global edge pointed straight at servers that should have been private behind a regional gateway, which either exposes backends publicly or forces private-connectivity features the design did not plan for. Both are cured by placing each requirement at its proper layer rather than asking one service to cover layers it was not built for.

The failure modes and the diagnostics that expose them

Each layer fails in characteristic ways, and knowing the signature of each failure is what turns a vague “the site is down” into a located cause. The diagnostic discipline is the same one this series applies everywhere: name the distinct causes, find the confirming signal for each, and fix the matching one rather than changing everything at once.

A global edge returning a 502 almost always means the edge could reach itself but not your origin, or the origin answered in a way the edge rejected. The confirming signals are the origin’s own health, the probe configuration, and whether the origin’s certificate and hostname match what the edge expects. An origin that is healthy when you hit it directly but fails behind the edge points at the probe path, the host header the edge sends, or a TLS mismatch between edge and origin. Working a 502 from the global edge means separating the edge from the origin and testing each independently, and the full cause-by-cause diagnosis of a Front Door 502 origin error covers the probe, the host header, and the certificate checks in order.

How do I tell an edge problem from an origin problem?

Hit the origin directly, bypassing the edge, with the same host header and path. If the origin answers correctly when reached directly but fails through the edge, the problem is in the edge configuration or the path between edge and origin, such as the probe, the host header rewrite, or a certificate mismatch. If the origin fails when hit directly, the problem is the origin itself.

A regional gateway has its own failure signatures. A gateway returning a backend error usually means the backend pool members are failing their health probe, the backend HTTP setting points at the wrong port or protocol, or the path the gateway requests does not exist on the backend. The confirming signal is the backend health view, which tells you whether the gateway considers each member healthy and why a probe failed. A gateway that reports all backends unhealthy while the servers are actually up almost always has a probe configured against a path or port the backend does not serve, or a host header the backend rejects.

The CDN’s characteristic problem is not an outage but a correctness or efficiency issue: stale content served past its intended freshness, a cache that never populates because responses are uncacheable, or a cache key that fragments so badly the hit rate stays near zero. The confirming signal is the cache status on the response, which tells you whether a given request was a hit, a miss, or uncacheable. A low hit rate with mostly uncacheable responses points at the origin’s caching headers; a low hit rate with mostly misses points at a cache key that is too specific.

The unifying diagnostic habit across all three is to locate the failure at a layer before changing anything. Determine whether the problem is global or regional, edge or origin, cache or proxy, and only then apply the fix for that layer. Changing configuration at the wrong layer is how a one-layer problem becomes a multi-layer outage, and the layered model this whole comparison is built on is also the map you use to find where a failure actually lives.

Designing the edge for production

A production design starts from the requirement, not the product, and the global-versus-regional-then-caching rule turns the requirement into a layered shape. Begin by asking whether the application serves users worldwide or within one region. A worldwide application needs a global entry point so that distant users get a short first hop and a regional outage becomes a reroute rather than an incident. A single-region internal application needs only a regional proxy, and adding a global layer buys nothing it will use.

For a worldwide application, the next decision is whether to keep the backends private. Most production designs do, because exposing application servers directly to the internet widens the attack surface and removes a control point. Keeping backends private means a regional proxy that lives in the virtual network has to sit between the global edge and the servers, because the gateway can reach private addresses that a public edge would otherwise need a private-connectivity feature to touch. The shape that results, a global edge in front of regional gateways in front of private backends, is the default for a serious multi-region web application precisely because each layer covers a need the others cannot.

How do I design for a regional outage?

Place a health-aware global entry in front of origins in at least two regions, and let it probe each region and route only to healthy ones. When a region fails its probe, the global layer stops sending traffic there and the remaining region absorbs it. Without a global health-aware layer, a regional outage takes down everyone routed to that region.

Caching enters the design where content is heavy and cacheable. If the application serves large volumes of static assets, enabling caching at the global edge cuts both user latency and origin load, and the cache key and freshness rules become part of the design rather than an afterthought. If the application is almost entirely dynamic and personalized, caching contributes little and the edge earns its place through routing, security, and failover instead. The design question is not whether the edge can cache but whether this workload has content worth caching.

Security placement follows the same layered logic. Put the broad, volumetric, signature-based inspection at the edge so that attacks are stopped before they cross into your region and consume capacity. Put the stricter, application-aware rules at the regional gateway, close to the backends where the application context is clearest. For many applications a single WAF at the chosen entry point is enough; for high-value targets, inspecting at both layers is justified by the threat model. The decision is driven by where the traffic enters and how much you want stopped before it reaches your infrastructure.

Finally, design the health probes deliberately, because the probe is what makes the routing resilient rather than merely configured. A probe should hit a path that actually exercises the application’s readiness, not a static page that returns healthy while the application behind it is broken. At the global layer the probe decides regional failover; at the regional layer it decides backend rotation. A probe that lies about health is the quiet cause of a layer that routes confidently to a dead target, and getting the probe right is as much a part of the design as the routing rules themselves. To build and test these layered shapes hands-on, including standing up a global entry, attaching origins, and watching failover behave, you can run the hands-on Azure labs and command library on VaultBook and reproduce each layer of the design end to end.

How the edge layer interacts with the rest of the network

The edge and proxy layers do not exist in isolation; they sit on top of the virtual network, the DNS configuration, the TLS setup, and the backend services, and each of those interactions is a place where a design succeeds or fails. Understanding how the front door layer touches the rest of the network is what lets you reason about why a request behaves the way it does once it leaves the client.

DNS is the first interaction. A global edge service publishes a hostname that resolves through anycast to the nearest edge, and you point your own custom domain at that hostname with a CNAME or the appropriate alias record. A regional gateway has a frontend IP or a DNS name in its region, and you point your domain at that. Getting the DNS layer right is what makes the custom domain actually flow through the intended entry point, and a domain that resolves to the wrong target is a common reason traffic skips the layer you built. The resolution chain, the custom domain binding, and the certificate that matches the domain all have to line up.

How does TLS termination work across the layers?

The global edge terminates the client TLS handshake at the point of presence, close to the user, then connects to the origin over its own TLS or plain connection depending on configuration. A regional gateway terminates TLS at the gateway and can re-encrypt to the backend for end-to-end TLS. Where TLS terminates decides where the certificate lives and where the handshake latency is paid.

The virtual network interaction is where the regional and global layers differ most sharply. Application Gateway is deployed into a dedicated subnet, and that subnet’s network security rules, route tables, and size all affect how the gateway operates and scales. The gateway needs room to scale instances into its subnet, and a subnet that is too small or that has restrictive rules can throttle or break it. A global edge service is not in your virtual network at all by default; it reaches your origins from the outside, which is why the private-connectivity feature exists for the cases where the origin must stay off the public internet. The mental model is that the regional layer is inside your network and the global layer is outside it reaching in.

Backend services interact with the proxy layer through their own readiness and their own limits. A backend that is slow to start, that returns errors under load, or that closes connections aggressively will surface as a proxy-layer failure even though the cause is the backend. The proxy faithfully reports what the backend does, so a gateway returning errors is often a backend problem wearing a gateway’s error code. This is why the diagnostic discipline insists on testing the backend directly before blaming the proxy: the layer that reports the failure is not always the layer that caused it. Reasoning about these interactions, the way the layer below a layer-7 proxy shapes its behavior, is the same reasoning the comparison between the regional load balancer and Application Gateway turns on, and the layered network view ties the whole edge story to the rest of the Azure networking stack.

Choosing among them by cost and need

Cost should follow the need rather than lead it, because choosing the cheapest layer for a need it does not fit produces a design that fails differently and often more expensively. That said, the three services have different cost shapes, and understanding the shape helps you avoid paying for capability you will not use or under-provisioning a layer the workload depends on.

A global edge service generally bills on the data that flows through it and the requests it handles, with tier-based differences in features and rates, and caching can reduce cost by serving from the edge instead of fetching from the origin repeatedly. A regional gateway generally bills on the provisioned capacity and the data processed, with the WAF SKU costing more than the plain SKU because it does more work. A content-caching layer bills on the data served from the edge and the volume of cache fills. Treat every specific rate, unit, and tier price as a value to confirm against the current official pricing, because Azure revises pricing and tier boundaries regularly, and a number that is right today ages quickly.

How do I choose without overpaying?

Match the layer to the dominant need, then size it to the actual traffic rather than a worst-case guess. Do not buy a global edge for a single-region internal app, and do not skip a global layer for a worldwide app to save on the edge, because the regional outage it prevents costs more than the edge. Let caching offset edge cost where content is cacheable.

The false economy to avoid is treating the layers as substitutes on price. A regional gateway is cheaper than a global edge for a single workload in isolation, but using it where a global entry is needed does not save money; it removes the worldwide failover and edge inspection the application required, which surfaces as an outage or an attack absorbed on your own servers. The cost comparison is only valid between options that actually fit the same need, which is why the need question comes first and the cost question second. When two options genuinely fit, then cost, operational simplicity, and the team’s familiarity become the deciding factors, and the cheaper or simpler option wins on the margin.

The durable cost principle mirrors the durable design principle: caching, edge security, and global routing earn their cost at the global layer for a worldwide workload, and regional proxying with private backend access earns its cost at the regional layer. Paying for the right layer is rarely the expensive mistake. Paying for the wrong layer, or skipping a layer the workload needed, is where the real cost shows up, usually as an incident rather than a line on the bill.

Recurring scenarios and the deciding factor in each

The comparison becomes muscle memory once you have walked the recurring scenarios and seen the deciding factor in each. These are the patterns engineers report, framed as a situation and the signal that resolves it, so you can match a new requirement to the closest pattern.

A global application that someone tried to front with a regional gateway. The symptom is that distant users have high latency and a regional outage takes down everyone routed there, with no automatic failover. The deciding factor is scope: the workload is worldwide, so the entry point must be global. The fix is to place a global edge in front of the regional gateways so that users enter at the nearest edge and failover spans regions. The regional gateways stay; they were never the problem, only the wrong outermost layer.

Static content that was being served straight from the origin. The symptom is that the origin carries heavy load serving the same files repeatedly and users far away wait for distant bytes. The deciding factor is that the content is cacheable and the audience is spread out, which is exactly the content-delivery role. The fix is to put a global caching layer in front so the edge serves the files and the origin handles only misses and dynamic requests.

Front Door composing in front of regional gateways. The symptom is not a failure but a design question: how to make a multi-region application both globally routed and regionally proxied to private backends. The deciding factor is that the two needs sit at different layers, global routing and regional private proxying, so the answer is composition rather than choosing one. Front Door owns the global decision and the regional gateway owns the private regional one.

Edge WAF placement for a worldwide app under attack. The symptom is that malicious traffic reaches the region and consumes capacity before being blocked. The deciding factor is where inspection should happen, and for a global application the answer is at the edge so attacks are stopped before they cross into the region. The fix is to attach the WAF policy to the global edge rather than relying solely on a regional WAF.

Caching at the edge versus at the origin for a compute-heavy app. The symptom is a backend repeating expensive computation while edge caching does little because the responses are dynamic. The deciding factor is what kind of work is being repeated: network distance and request volume call for edge caching, repeated computation calls for an origin cache. The fix is to cache at the layer that matches the repeated work, and to use both when the workload has both.

Choosing by global reach and caching need on a fresh design. The symptom is the blank-page question of which service to start with. The deciding factor is the two-question rule applied directly: global versus regional first, then whether caching is the point. A worldwide content site starts with the caching edge; a worldwide application starts with the application edge; a regional internal app starts with the regional gateway. The rule turns the blank page into a starting layer in two questions.

How a request flows through the global edge, step by step

Tracing a single request through the global edge end to end is the fastest way to make the abstract roles concrete, because each step is a decision point you can configure and later debug. The flow is the same shape whether the edge is routing an application or serving content, and naming each step gives you the vocabulary to say exactly where a problem lives.

The request begins when the client resolves your custom domain, which through anycast points at the nearest point of presence. The client opens a connection to that edge location and starts a TLS handshake, which the edge terminates locally. Terminating close to the user is the latency win: the expensive handshake completes over a short hop rather than a transcontinental one. Once the secure channel is open, the edge has the decrypted request and can inspect it.

What happens at the edge before my origin sees the request?

The edge terminates TLS, evaluates the web application firewall policy, applies any rule-set transformations such as header rewrites or redirects, checks the cache for a fresh stored copy, and only then selects an origin group and forwards the request if the cache did not answer it. Several decisions complete at the edge before your origin is ever contacted.

After TLS termination the edge evaluates the WAF policy if one is attached, and a blocked request stops here, rejected near the user without ever reaching your region. A request that passes the WAF then runs through any rule sets you configured, which can rewrite headers, redirect, or alter the path before routing. Next the edge consults the cache: if caching is enabled and a fresh copy of the requested object exists at this location, the edge returns it immediately and the origin is never contacted. A cache miss or an uncacheable request moves to routing, where the edge matches the host and path against your routes and selects an origin group.

With an origin group chosen, the edge applies its load-balancing method among the healthy origins in that group and forwards the request across Microsoft’s backbone to the selected origin. The backbone path is the second latency advantage: instead of traversing the public internet hop by hop, the request rides an optimized network from the edge to the origin. The origin processes the request and answers, the edge optionally stores the response in its cache according to your freshness rules, and the response travels back to the client over the connection that is already open. Every step in that chain is something you set, which is why a problem usually localizes to one step rather than to “the edge” as a whole.

The load-balancing method among origins is itself a design lever. Routing by latency sends each user to the lowest-latency healthy origin, which suits an active-active multi-region deployment where every region serves traffic. Routing by priority sends all traffic to a primary origin and fails over to a secondary only when the primary is unhealthy, which suits an active-passive design where the second region is a standby. Weighted distribution splits traffic by assigned weights, which suits gradual rollouts or capacity-based splitting. The method you pick encodes your resilience model, and it interacts directly with the health probe that decides which origins are eligible in the first place.

What changes when origins go private

The default assumption for a global edge service is that origins are reachable on public endpoints, because the edge itself lives on the public internet. Many production designs want the opposite: application servers that have no public exposure at all. The two ways to satisfy a private-origin requirement are the regional gateway inside the virtual network and the private-connectivity feature on the higher edge tier, and choosing between them is a real design decision rather than a formality.

The regional gateway approach keeps the application servers private behind an Application Gateway that lives in the virtual network. The global edge points at the gateway, and the gateway reaches the private backends natively because it shares their network. This is the composition shape, and it has the advantage that the regional layer also gives you path routing and a regional WAF in the same move. The servers never have a public address; only the gateway is reachable, and even the gateway can be fronted so that only the global edge reaches it.

Do I need Private Link if I already use a regional gateway?

Usually not, because the gateway inside the virtual network already reaches private backends and gives the application a private path. Private Link on the edge tier matters when you want the global edge itself to connect privately to an origin without a regional gateway in between, for origin types that support it. The two are alternative ways to keep origins off the public internet.

The private-connectivity feature takes a different route: it lets the global edge connect to a supported origin through a private link rather than over a public endpoint, so the origin can stay private without a regional gateway standing in front of it. This suits designs where you want the global edge to reach a private origin directly and do not need regional path routing or a regional WAF in between. The trade-off is that you give up the regional proxy’s capabilities, so the choice comes down to whether you need that regional layer for routing and security or only need the origin to be private. Confirm which origin types support the private-connectivity feature against the current documentation, since the supported set expands over time.

The security consequence of getting this right is significant. An application whose servers have no public IP cannot be attacked directly on the internet; an attacker can only reach the edge or the gateway, where your inspection and rate controls live. An application whose servers were exposed publicly because the design pointed a global edge straight at them has a wider attack surface than the architecture diagram implies. Keeping origins private, by whichever mechanism fits, is one of the highest-value decisions in the whole design, and it is a decision the layered model makes obvious.

The components of a regional gateway in depth

To configure and debug a regional gateway you have to know its parts and how they chain, because every routing decision and every failure traces back to one of them. The chain runs from the frontend, through a listener, into a routing rule, out through a backend HTTP setting, to a backend pool, validated by a health probe. Each part has a job, and a misconfiguration in any one produces a recognizable symptom.

The frontend is the IP and port the gateway listens on, public or private. A listener binds to a frontend IP, a port, a protocol, and optionally a hostname, and it is what accepts an incoming connection. A basic listener serves a single site; a multi-site listener serves several hostnames on the same gateway, each matched by host header. The listener is where TLS terminates when the gateway handles HTTPS, so the certificate for the site lives on the listener. A request that does not match any listener is not served, which is a common cause of a connection that simply does not connect.

Why does my gateway report a backend as unhealthy when the server is up?

Almost always because the health probe targets a path, port, or host the backend does not answer the way the probe expects. A probe hitting a path that returns an error, a port the backend does not serve, or a host header the backend rejects marks the member unhealthy even though the application is running. Align the probe with what the backend actually serves.

A routing rule connects a listener to a backend pool through a backend HTTP setting. A basic rule sends everything from its listener to one pool; a path-based rule splits by URL path so different paths reach different pools, which is how one gateway fronts an API, a static path, and an images path at once. The backend pool is the set of targets, and the backend HTTP setting is the contract for talking to them: protocol, port, timeout, session affinity, and the probe to use. Because the pool and the HTTP setting are separate, you can reuse one pool with different settings or apply one setting across pools, and a mismatch between the setting and what the backend actually serves is a frequent source of backend errors.

Rewrite rules let the gateway modify request and response headers and the URL as traffic passes through, which covers needs such as adding a header the backend requires or rewriting a path the backend expects in a different form. The v2 generation autoscales by adding capacity as load rises, supports zone redundancy for resilience within a region, and is the current SKU for new deployments. The WAF v2 variant adds the regional firewall to the same proxy. Each of these is a knob that shapes behavior, and knowing which knob does what is the difference between configuring the gateway on purpose and copying a template you cannot debug.

Reading the signals each layer emits

A design you cannot observe is a design you cannot operate, and each layer emits signals that tell you what it is doing. Knowing which signal to read for which question is what turns a guess into a diagnosis, and it is the operational half of the layered model this comparison is built on.

The global edge emits logs and metrics that distinguish what happened at the edge from what happened at the origin. The signal that separates the two is whether a response was served from cache, forwarded to an origin, or blocked by the WAF, and the response carries indicators of the cache outcome and the origin that answered. When a request behaves unexpectedly, the first read is whether the edge handled it locally or passed it through, because that single fact tells you whether to look at edge configuration or origin behavior. The WAF emits its own signal for blocked requests, which is how you tell a security block apart from an application error that happens to look similar.

Which signal tells me if the edge or the origin caused a slow response?

The edge logs distinguish the time spent at the edge from the time waiting on the origin. A response that was slow because the origin was slow shows origin latency in the signal, while a response served from cache shows little origin involvement at all. Reading whether the origin was contacted, and how long it took, localizes the latency to a layer.

The regional gateway emits a backend health view and access logs that report which backend served each request and whether members are passing their probe. The backend health view is the first read for any routing problem, because it tells you directly whether the gateway considers each backend healthy and, when a probe fails, gives the reason. The access log tells you which backend handled a request and how it responded, which is how you confirm that path-based routing is sending traffic where you intended rather than where you assumed.

The content-caching layer emits a cache status on each response that classifies the request as a hit, a miss, or uncacheable, which is the single most useful signal for tuning a cache. A workload with a low hit rate and mostly uncacheable responses has a headers problem at the origin; a workload with a low hit rate and mostly misses has a cache-key problem; a workload serving stale content has a freshness problem. The cache status turns “the cache is not working” into a specific, fixable cause. Across all three layers, the discipline is identical: read the signal that distinguishes the layers before you change anything, and the signal will tell you which layer owns the problem.

Moving from a regional design to a global one

Applications are not always born global. A common trajectory is a single-region application fronted by a regional gateway that grows a worldwide audience and a resilience requirement it did not start with. Knowing how to evolve that design without rebuilding it is the practical payoff of understanding the layers, because the move is additive rather than a teardown.

The starting point is a regional gateway in front of backends in one region, which is the right shape for a single-region application. The first pressure is usually latency for distant users or a resilience requirement that one region cannot meet. The evolution is to add a global edge in front of the existing gateway rather than to replace the gateway. The gateway keeps doing regional path routing and private backend access; the new global layer adds worldwide entry, edge security, and the ability to fail over once a second region exists. The existing investment is preserved, and the design gains a layer rather than swapping one out.

How do I add global reach to a single-region app without rebuilding it?

Place a global edge in front of the regional gateway you already run, pointing the edge at the gateway as an origin. The gateway continues to route regionally to private backends; the edge adds worldwide entry, edge security, and optional caching. When you add a second region with its own gateway, the global edge gains health-based failover across both.

The second region is the step that turns the global layer from a convenience into resilience. Until a second region exists, the global edge improves latency and adds edge security but cannot fail over, because there is only one place to send traffic. Adding a second region with its own gateway and backends gives the global edge two healthy origins to choose between, at which point it becomes the worldwide failover owner the resilient design requires. The routing method you choose, latency for active-active or priority for active-passive, encodes whether the second region serves traffic always or only on failover.

The migration risk to plan for is the cutover of the custom domain and the certificate, because that is where the new global layer actually takes over the traffic. Until the domain points at the global edge, the edge is configured but unused; once it does, every request flows through the new layer, so the edge, its routes, its WAF policy, and its origin health all need to be correct before the switch. Validating the edge against the existing gateway as its origin, before moving the domain, is how you make the cutover a non-event rather than an outage. The layered model makes the whole evolution legible: you are adding the global layer on top of the regional one, not replacing the application’s front end.

The path-routing overlap and why it misleads

The single feature that most often convinces engineers the three services are interchangeable is path-based routing, because both the global application edge and the regional gateway can route by URL path. Seeing the same capability on two products, the natural conclusion is that they are alternatives and you should pick the cheaper or more familiar one. That conclusion mistakes a shared feature for a shared role, and untangling it is what makes the composition design click into place.

Path routing at the global edge answers a global question: given a request that just entered at the nearest point of presence, which origin group should handle it, and is that origin group in the right region. The edge matches the host and path against its routes, selects an origin group, and forwards the request across the backbone to a healthy member, possibly in a different region from where the request entered. The path rule at the edge is part of choosing where in the world the request goes and whether the cache or the WAF intervened first.

Path routing at the regional gateway answers a regional question: given a request that has already arrived in this region, which backend pool inside this virtual network should serve it. The gateway matches the path against its rules, selects a backend pool, applies the backend HTTP setting, and reaches the chosen private backend. The path rule at the gateway is part of choosing which server inside one region handles the request and how the gateway talks to it.

The two path rules therefore operate at different scopes on the same request, which is exactly why a design uses both rather than choosing between them. A request can be routed by path at the edge to send a content path to a cache-friendly origin and an application path to a regional gateway, and then routed by path again at that gateway to send an API path to one backend pool and a static path to another. The path-routing overlap is not redundancy; it is the same kind of decision made twice at two scopes, global then regional. Once you see that, the urge to treat the products as substitutes because they share a feature dissolves, and the layered design stops looking like wasted duplication and starts looking like two distinct decisions that happen to use the same mechanism.

The related source of confusion is redirect and rewrite handling, which both layers also support. The edge can redirect or rewrite as part of shaping traffic before it leaves the point of presence, and the gateway can rewrite headers and paths as traffic passes through it in the region. Again the capability is shared but the scope differs: an edge redirect happens worldwide near the user, while a gateway rewrite happens regionally near the backend. Deciding where a redirect or a header change belongs follows the same rule as everything else in this comparison, which is to place the operation at the layer whose scope matches the intent, global or regional, and to stop reading a shared feature as evidence that the layers are the same thing.

Closing verdict

Front Door versus CDN versus Application Gateway is not a contest with one winner, and treating it as one is the mistake that produces both the regional gateway used where a global entry belongs and the three products stood up to compete when they should layer. The verdict is the rule: cut on global versus regional first, then on whether caching is the point, and the right layer falls out. Front Door is the global application entry with edge routing, edge security, worldwide failover, and optional caching. CDN is the global content layer whose reason to exist is caching and acceleration, increasingly folded into the same edge product. Application Gateway is the regional layer-7 proxy that routes by path and host inside one region and reaches private backends natively.

The strongest production designs compose them: a global edge in front for worldwide entry, security, and caching, and regional gateways behind it for path routing and private backend access. The deciding factor is always the layer a requirement belongs to, not the longest feature list. Decide scope before anything else, because scope is the one axis you cannot reconfigure later, and let caching and security placement follow from where the traffic enters and what it carries. Get the layers right and the three services stop competing in your head and start cooperating in your architecture.

The reason this rule outlasts the product churn is that it is built on structure rather than features. Microsoft will keep renaming things, folding one product’s capabilities into another, and shipping new SKUs, but the difference between a worldwide network of presence points and a region-bound proxy in a single virtual network does not move. When a fresh acceleration or security feature lands on one of these services, you place it in seconds by asking which layer it belongs to, global or regional, rather than relearning the whole comparison from a new marketing page. The two-question rule is portable across every revision because the questions are about where a service lives and what its purpose is, and neither of those is something a release note changes.

Carry one more habit out of this comparison: the same layered map that chooses your tools also locates your failures. When something breaks, ask whether the problem is global or regional, edge or origin, cache or proxy, and read the signal that distinguishes those layers before changing a single setting. A worldwide application that reroutes around a dead region, keeps its servers private behind a regional proxy, and inspects attacks at the edge is not running three competing products; it is running one coherent design in which each layer does the job it was built for. That coherence, not a feature count, is what tells you the choice was right.

Frequently Asked Questions

Q: Front Door versus CDN versus Application Gateway, which should I choose?

Choose by asking two questions in order. First, is the workload global or regional? Front Door and CDN are global and live at edge locations worldwide; Application Gateway is regional and lives in one region’s virtual network. Second, among the global options, is caching the point? CDN exists to cache and accelerate content, while Front Door exists to route and secure an application at the edge with health-based failover. A worldwide application that fronts servers picks Front Door, a worldwide content site that serves files picks the caching layer, and a single-region application that proxies to private backends picks Application Gateway. Many real designs do not pick one at all; they layer a global edge in front of regional gateways, because the requirement spans both scopes. Treat the choice as selecting the right layer for each need rather than crowning a single winner.

Q: Which of these edge options are global and which are regional?

Front Door and CDN are global services. Each is a single logical resource present at points of presence around the world, and a client always enters through the nearest one, so there is no region to assign. Application Gateway is regional: you deploy it into a specific region’s virtual network, and clients elsewhere reach it across the public internet. This scope difference is the axis you cannot reconfigure later, which is why it comes first in any decision. A regional gateway does not become global by setting; you make a regional design worldwide by deploying a gateway per region and placing a global service in front to route across them. Scope determines latency behavior, failover reach, and the path traffic takes, so deciding global versus regional settles more of the design than any feature comparison does.

Q: Which of the three provide caching and content acceleration?

The global edge services provide caching and acceleration; the regional gateway does not. A CDN is built around the cache: it stores a copy of an object at the nearest edge node and serves later requests from there, so the origin handles only misses and dynamic requests. Acceleration improves delivery of content that cannot be cached by carrying it over an optimized backbone and reusing connections. Front Door can also cache, since Microsoft has folded content-delivery capabilities into the same edge product, so a single global resource can serve as both an application entry and a content cache. Application Gateway is a reverse proxy that forwards every request to a backend and holds no content cache, because its job is choosing which backend handles a request rather than avoiding the backend. If caching content is the dominant need, the caching-first global layer is the simplest fit.

Q: Which offer a web application firewall, and where does it inspect?

Front Door and Application Gateway both offer a WAF, but they inspect in different places, and placement is what matters. The Front Door WAF runs at the edge, at the point of presence where the client connected, so a blocked request is rejected near the user and never reaches your region or consumes origin capacity. The Application Gateway WAF runs regionally, inside the gateway on the virtual network, so it inspects after traffic has crossed into your region, closer to the backends. Edge inspection stops broad attacks before they cross into your infrastructure; regional inspection sits in the same network as the application and enforces stricter, context-aware rules. The CDN’s edge security has folded into Front Door, so treat the edge WAF as a Front Door capability. High-value applications sometimes inspect at both layers, with broad rules at the edge and application-specific rules in the region.

Q: How does each route to origins or backends?

Front Door routes to origins grouped into origin groups, which are public endpoints by default and can be private through Private Link on the Premium tier; it health-probes the group and load-balances across healthy members, including across regions. CDN routes to an origin only on a cache miss, fetching an object once to populate the edge. Application Gateway routes to a backend pool of targets reachable from its virtual network, including private IP addresses with no public exposure, using listeners and routing rules to direct traffic and backend HTTP settings to define how it talks to each target. The reachability difference is the practical crux: the regional gateway reaches private backends natively because it lives in the network, while a global edge reaches public endpoints unless you add private connectivity. Health probing turns routing into resilient routing at both scopes.

Q: How do I choose among them by cost and need?

Let need lead and cost follow. Match the layer to the dominant requirement first, then size it to the actual traffic rather than a worst-case guess. A global edge generally bills on data and requests with tier-based features, and caching can offset its cost by serving from the edge instead of fetching repeatedly. A regional gateway generally bills on provisioned capacity and data processed, with the WAF SKU costing more because it does more. The false economy is treating the layers as price substitutes: a regional gateway is cheaper in isolation, but using it where a global entry is needed removes the worldwide failover the workload required, which surfaces as an outage rather than a saving. Cost only decides between options that genuinely fit the same need. Confirm every rate and tier price against current official pricing, because Azure revises both regularly.

Q: Can I use Front Door and Application Gateway together, and which goes in front?

Yes, and the global edge goes in front. The canonical design for a resilient global application places Front Door as the worldwide entry point and regional Application Gateways behind it. Front Door terminates TLS at the edge, runs the edge WAF, optionally caches, health-probes the regional origins, and routes each user to the nearest healthy region. In each region, an Application Gateway receives that traffic, runs its regional WAF, routes by path to the right backend pool, and reaches the private application servers inside the virtual network. The two are not redundant; they own different layers. Front Door owns the global decision of which region and whether to serve from cache, and the gateway owns the regional decision of which backend and how to reach it privately. Pointing the global edge at each region’s gateway as an origin is what stitches the layers together.

Q: Can Application Gateway serve a global audience on its own?

Not on its own. Application Gateway is a regional resource bound to one region’s virtual network, so a user on another continent reaches it across the open internet with the full latency that implies, and a regional outage takes down everyone routed there with no automatic failover. You can deploy a gateway in several regions, but nothing then decides which gateway a given user should reach or moves them when a region fails, so each gateway is an island. To serve a global audience with regional gateways, you place a global service in front to route each user to the nearest healthy region and to fail over automatically. The gateway is the right tool for regional layer-7 proxying, including reaching private backends, but worldwide entry and cross-region failover belong to a global layer above it.

Q: What happened to Azure CDN, and is it merging into Front Door?

Microsoft has been consolidating its content-delivery capabilities into Front Door so that one global edge product handles both application routing and content caching, and some classic CDN profiles and partner offerings have been on a retirement path. The practical effect is that the line between a standalone CDN and Front Door caching has blurred at the product level, even though the conceptual roles stay separate: a CDN is the global layer whose reason to exist is caching and acceleration, distinct from an application’s routing needs. Because this lineup changes, verify the current set of content-delivery products and providers against the official Azure documentation when you choose, rather than assuming a specific profile type still exists. The durable concept survives the product churn: caching and acceleration of content is a global-layer job, and you decide by purpose regardless of which product currently carries it.

Q: Which option reaches a private backend that has no public IP?

Application Gateway reaches private backends natively, because it is deployed into the virtual network and routes to private addresses directly with no public exposure of those backends. Front Door can reach private origins on its Premium tier through Private Link for supported origin types. A standard global edge without private connectivity expects a reachable public endpoint, so a fully private backend is served either through the regional gateway in the virtual network or through the edge’s private-connectivity feature. The most common production shape keeps application servers private behind a regional gateway and points the global edge at the gateway, which gives private backend access and regional routing in one move. Choosing Private Link on the edge instead suits designs that want the global edge to connect privately without a regional gateway in between, at the cost of the gateway’s path routing and regional WAF.

Q: Do I still need a separate CDN if Front Door can cache?

Often not. Because Front Door can cache content at the edge, a single global resource can serve as both your application’s worldwide entry and its content cache, which removes the need for a separate caching product in many designs. You would keep a distinct caching layer when the workload is almost entirely content delivery with no application routing, failover, or edge WAF requirement, in which case the caching-first option is the simplest fit and the full application front door would add machinery the workload never uses. The decision is by purpose: if the job is fronting an application that also serves cacheable content, one edge product covers both; if the job is purely serving files fast to a worldwide audience, the caching-focused layer alone is enough. Either way, caching is a global-layer capability, so the choice is which global product carries it.

Q: Where should TLS terminate when I stack these layers?

The global edge terminates the client TLS handshake at the point of presence, close to the user, which is a deliberate latency win because the expensive handshake completes over a short hop rather than a transcontinental one. From there the edge connects to its origin, which can be the regional gateway, over its own connection. The regional gateway terminates TLS again at the gateway and can re-encrypt to the backend for end-to-end encryption inside the region. Where TLS terminates decides where each certificate lives and where the handshake latency is paid, so plan the certificate for the custom domain on the layer that faces the client and any backend certificates on the layers behind it. End-to-end TLS is a choice you make when the traffic must stay encrypted all the way to the backend rather than being re-encrypted only between the public-facing layers.

Q: How does failover differ between a global edge and a regional gateway?

The global edge fails over across regions; the regional gateway fails over across backends within one region. Front Door health-probes the origins in an origin group, which you spread across regions, and routes only to healthy ones, so a regional outage becomes a reroute to the surviving region rather than an incident. Application Gateway health-probes the members of a backend pool and removes unhealthy ones from rotation inside its region, so it survives a backend failure but not the loss of its whole region. The scopes stack: the global layer owns cross-region resilience and the regional layer owns within-region resilience. The resilience you can build is bounded by the scope of your outermost layer, which is why a multi-region application needs a global health-aware entry on top of its regional gateways rather than relying on the gateways alone.

Q: Is an edge WAF enough, or do I also need a regional WAF?

It depends on where traffic enters and how much you want stopped before it reaches your infrastructure. For a global application whose traffic all arrives through the edge, an edge WAF can be enough, because it inspects every request near the user and blocks attacks before they cross into your region. You add a regional WAF when the threat model justifies inspecting twice, when some traffic enters regionally rather than through the global edge, or when the firewall must sit in the same network as the backends to enforce application-specific rules with full context. Many applications run a single WAF at their chosen entry point; high-value targets run both, with broad volumetric and signature rules at the edge and stricter context-aware rules in the region. The deciding factor is the traffic’s entry path and the value of stopping attacks earlier versus closer to the application.

Q: Does adopting Front Door remove the need for Application Gateway?

No, when the design needs regional layer-7 proxying or private backend access. Front Door and Application Gateway solve different layers: Front Door provides global entry, edge security, worldwide failover, and optional caching, while Application Gateway provides regional path routing, a regional WAF, and native access to private backends inside the virtual network. A global application that keeps its servers private still needs the gateway to reach those private backends and to route by path within each region, with Front Door supplying the worldwide layer on top. Front Door does replace the gateway when the design has no need for regional proxying or private access, for example when origins are public endpoints and global routing with edge security is the whole requirement. The question is whether you need the regional layer at all, not whether the global layer can technically reach an origin.

Q: When is a single regional Application Gateway enough on its own?

A single regional gateway is enough when the application serves one region, lives inside a virtual network, and needs no worldwide entry, edge caching, or cross-region failover. For a regional internal application or a workload whose users are concentrated in one region, the gateway provides exactly the right capabilities: layer-7 routing by path and host, TLS termination, a regional WAF, and native access to private backends, without the extra layer a global service would add. Reaching for a global edge in that case buys nothing the workload uses and complicates the path. The signal that you have outgrown a single gateway is a worldwide audience suffering latency, or a resilience requirement that one region cannot meet, at which point you add a global edge in front rather than replacing the gateway. Right-sizing the entry to the workload’s scope is the whole point of the rule.

Q: How do I decide between routing by latency and routing by priority at the edge?

Routing by latency sends each user to the lowest-latency healthy origin, which suits an active-active multi-region deployment where every region serves traffic at once and you want each user on the closest one. Routing by priority sends all traffic to a designated primary origin and fails over to a secondary only when the primary is unhealthy, which suits an active-passive design where the second region is a standby rather than a constant participant. The choice encodes your resilience and capacity model: active-active spreads load and uses every region, while active-passive keeps a warm standby and a simpler traffic picture. Weighted distribution is the third option, splitting traffic by assigned weights for gradual rollouts or capacity-based splitting. The routing method interacts directly with the health probe, since only origins that pass the probe are eligible, so a sound probe is what makes either method behave as intended.

Q: Why does fronting a regional app with a global edge sometimes add no value?

Because a global edge earns its keep through worldwide entry, edge caching, edge security, and cross-region failover, and a single-region internal application with concentrated users exercises none of those. If the audience is in one region, the latency win from a nearby edge is small. If the content is dynamic and personalized, caching contributes little. If there is only one region, the global layer cannot fail over because there is nowhere to fail over to. In that situation the edge adds a layer to operate and debug without supplying a capability the workload uses, which is the mirror image of the more common mistake of regionalizing a global need. The value of a global edge scales with how global, cacheable, attacked, or multi-region the workload actually is. Match the entry layer to the workload’s real scope, and add the global layer when the workload grows into needing it rather than by default.