A component in Azure produces telemetry whether or not anyone is listening. The platform meters it, generates platform metrics, and writes audit records to the activity log automatically. The detailed signal, the per-request logs, the firewall verdicts, the gateway access entries, the database query statistics, stays inside the resource and is discarded unless you have told the platform where to send it. That instruction is the diagnostic setting. When you configure diagnostic settings correctly, every asset forwards its logs and metrics to a destination you control, and the data is there the day an incident starts. When you skip the configuration, the service looks healthy in the portal blade right up to the moment you open a Log Analytics query during an outage and discover the table is empty for the window you most need.

That gap is the most expensive misconfiguration in Azure observability, and it is silent. Nothing warns you that a resource has no diagnostic setting. The blade renders, the metrics charts populate from the always-on platform metrics, and the component serves traffic. The absence only surfaces later, when an auditor asks for ninety days of access logs you never captured, or when a 3 a.m. page sends you to a query that returns no rows because the network security group was never wired to forward its flow data. Getting diagnostic settings right is not a nice-to-have hardening step you schedule for a quiet sprint. It is the precondition for every other thing you will try to do with Azure Monitor, and the cost of getting it wrong is paid entirely in the future, at the worst possible time.
This guide walks the full configuration end to end. You will learn what a diagnostic setting actually is and what it is not, the destinations a setting can target and how to choose among them, how to pick the log categories that matter without drowning in the ones that do not, and the verification step that proves data is flowing rather than assuming it. Then it moves to the part that separates a tidy demo from a governed estate: applying diagnostic settings at scale with Azure Policy so that new resources are covered the moment they are created, rather than the day someone remembers to click through a blade. Throughout, the focus stays on the decisions that the defaults get wrong and the ingestion bill that careless category selection produces.
What a diagnostic setting is, and what it is not
A diagnostic setting is a small configuration object attached to a single Azure service. Its job is narrow and specific: it names which categories of that resource’s diagnostic data should be exported, and it names one or more destinations the data should be exported to. Nothing more. It does not generate the data, it does not store the data, and it does not analyze the data. It is a pipe. The component emits diagnostic data on the inside; the diagnostic setting connects that internal emission to an external destination so the data leaves the asset and lands somewhere you can keep and query it.
The word “pipe” matters because it shapes every later decision. A pipe carries flow in one direction from one source to one or more sinks. A diagnostic setting attaches to exactly one resource. If you have forty storage accounts, you have forty diagnostic settings to think about, one per account, because each account is its own source and each needs its own pipe. There is no parent setting that covers a resource group or a subscription by inheritance. This single fact is the root of most diagnostic-settings pain in real estates, and the entire second half of this article exists to answer it. For now, hold the model in mind: one service, one pipe, configured deliberately.
The data that flows through the pipe comes in two broad shapes, and the distinction governs what you can do with it downstream. The first shape is logs, which are event records. A log entry describes a discrete thing that happened: a request arrived, a rule allowed or denied a packet, a query ran and took a certain number of milliseconds, a key was read from a vault. Logs are categorical, which is to say each resource type defines its own named categories of log, and you choose which categories to export. The second shape is metrics, which are numeric time series. A metric is a value sampled at an interval: CPU percentage every minute, request count per minute, used capacity in bytes. Platform metrics are collected automatically and retained for a rolling window without any diagnostic setting at all, but a diagnostic setting lets you route those metrics to the same destination as the logs so that the two live together and can be correlated in a single workspace.
Why does a component show metrics but no logs?
Because platform metrics are always collected and retained for ninety-three days regardless of configuration, while logs require a diagnostic setting to leave the resource. A brand-new service with no diagnostic setting will populate its metrics charts immediately and return nothing from a log query. The charts are not proof that logging works.
This asymmetry trips up nearly everyone the first time. An engineer opens a freshly deployed Application Gateway, sees the metrics blade rendering request counts and backend health, and reasonably concludes that monitoring is on. Then an application owner reports intermittent 502 responses, the engineer opens Log Analytics to read the access log, and there is nothing there. The metrics were never the logs. They came for free from the platform’s always-on collection. The access log, which is what actually tells you which backend returned the 502 and why, was waiting on a diagnostic setting that nobody created. The lesson is to treat the populated metrics chart as meaningless evidence about logging state and to verify log flow explicitly, which is a step this guide covers in detail later.
There is one more category of telemetry that sits adjacent to diagnostic settings and is constantly confused with them: the activity log. The activity log is a subscription-level record of control-plane operations, the writes and deletes and role assignments that change your resources, as opposed to the data-plane operations inside a component. It is collected automatically and kept for ninety days in the platform, and it is not configured through a per-resource diagnostic setting. It has its own export path, the diagnostic setting on the activity log itself, configured once at the subscription scope. Conflating the activity log with resource logs is one of the most common reasons an engineer believes a setting exists when it does not, and a later section pulls the two apart carefully.
The pipe-and-scale rule
Here is the claim this article is built around, stated plainly so you can quote it and act on it: a diagnostic setting is the pipe for exactly one resource, so consistent telemetry across a subscription comes from enforcing settings at scale with Azure Policy, never from clicking each asset. Call it the pipe-and-scale rule. The first half names the mechanism, the per-resource pipe. The second half names the only configuration approach that survives contact with a real, growing environment, the policy-driven enforcement that applies the pipe to every service automatically.
The rule exists because the per-resource nature of the setting and the per-resource nature of human attention do not match. When you create a resource in the portal, the diagnostic setting is not part of the creation wizard for most services. You deploy the component, it starts serving, and the diagnostic setting is a separate action on a separate blade that you have to remember to perform. People are good at the first action because it is the point of the work and bad at the second action because it is invisible plumbing. The result, in every estate that relies on manual configuration, is drift: some resources have settings, some do not, the ones that do were configured at different times with different categories, and nobody can say with confidence which resources are covered. The portal will not tell you, because there is no single view that lists every service and its diagnostic-setting status without a deliberate query.
Drift is not a tidiness problem. It is a coverage problem, and coverage gaps in telemetry are invisible until an incident lands on the one resource that was missed. The economics are brutal precisely because the failure is silent and delayed. A manual process that achieves ninety-five percent coverage feels like success, but the five percent that slipped through is exactly the set of resources whose logs you will eventually need, by the iron law that the component you forgot to instrument is the one that breaks. Enforcement at scale is the answer not because it is elegant but because it changes the default. With a policy in place, a new asset is covered the instant it exists, with no human in the loop to forget. The pipe still attaches to one resource, but the attaching is automatic and universal.
The rest of this guide treats the pipe-and-scale rule as the spine. The single-resource configuration sections teach you the pipe so that you understand exactly what the policy will apply, because a policy that deploys a setting you do not understand is worse than no policy at all. The scale section then shows you how to turn that understood, deliberate setting into a fleet-wide standard that defends itself against drift.
The InsightCrunch diagnostic-settings checklist
Before the detail, here is the findable artifact this article centers on, a checklist you can run for any service or any fleet. Each row names the decision, the action, and the gotcha that bites people at that step.
| Step | Decision | Action | The gotcha at this step |
|---|---|---|---|
| 1. Pick categories | Which log categories and whether to send all metrics | Read the resource type’s available categories, select the ones tied to your diagnostic questions | Selecting every category “to be safe” multiplies ingestion cost; many categories are verbose and rarely queried |
| 2. Choose destination | Query, archive, or stream | Log Analytics for query, storage account for cheap long-term archive, Event Hubs for streaming to a SIEM | A storage-account destination is not queryable with KQL; choosing it when you meant to query leaves you unable to investigate |
| 3. Decide the mix | One destination or several | A single setting can fan out to all three destinations at once | Sending the same verbose categories to all three triples the ingestion and storage cost for data you query from only one |
| 4. Enforce at scale | Per-resource or policy-driven | Author or assign a built-in deployIfNotExists policy per resource type and scope it at a management group or subscription | Policy effects need a managed identity and a remediation task to cover existing resources; assignment alone only covers future ones |
| 5. Verify flow | Assume or confirm | Run a category-specific query in the workspace and confirm rows arrive within the expected latency | Platform metrics populating the chart is not proof; you must query the log table itself |
The five steps are ordered deliberately. Categories before destination, because the destination choice depends on whether you intend to query, archive, or stream the categories you picked. Destination before the mix, because the mix is just the destination choice extended to more than one sink. Enforcement after the single-resource decisions, because you cannot sensibly enforce a configuration you have not yet settled. Verification last, always, because every step before it is a hypothesis until data lands in the destination and a query returns it.
Prerequisites and the correct order of operations
Configuring a diagnostic setting requires three things to exist first, and getting their order wrong is a common early stumble. The first prerequisite is the component itself, which is obvious, and the second is the destination, which is less obvious because people sometimes try to configure the setting before the workspace it points at exists. You cannot route logs to a Log Analytics workspace that has not been created, so the workspace comes first. The third prerequisite is permission, and it is the one that produces confusing failures, because the permission model spans two planes.
To create a diagnostic setting on a resource you need write access to that service’s diagnostic settings, which the Monitoring Contributor role grants, along with the broader Contributor and Owner roles. To point that setting at a Log Analytics workspace you also need permission on the workspace, because you are establishing a write relationship into it. The classic failure here is an engineer who has Contributor on the resource group containing the source component but no role on the workspace, which sits in a different resource group owned by the platform team. The portal lets them select categories, lets them pick the workspace from the dropdown, and then fails on save with an authorization error that names the workspace rather than the source. The fix is to grant the principal the appropriate role on the workspace, not to keep retrying on the source. Sorting permission out before you start saves a confusing detour.
The correct order of operations, then, is workspace first, permissions second, setting third, verification fourth. If you are setting up a new landing zone, the workspace and the permission model are part of the platform foundation and should already exist before any workload resource is deployed, which is exactly the model the Azure Monitor and Log Analytics guide lays out for the destination side. Treating the workspace as foundational rather than per-workload is what lets a single policy later point hundreds of resources at one well-governed sink.
Do I need a separate workspace per asset or per team?
No. A diagnostic setting can route many resources into one shared workspace, and a small number of workspaces aligned to access boundaries is easier to govern than a sprawl of one-per-resource workspaces. Use separate workspaces to draw security or data-sovereignty boundaries, not to mirror your service layout, because cross-workspace queries add friction.
The workspace-design decision deserves a moment because it constrains everything downstream. A workspace is the unit of access control and the unit of retention configuration, so the right number of workspaces is the number of distinct access-and-retention boundaries you have, not the number of resources or teams. Many organizations land on a small set: one per environment, or one per major security boundary, with everything inside that boundary routed to the same workspace so that correlation across resources is a single query rather than a federated one. The temptation to give every application its own workspace feels organized but fragments your data exactly along the lines you will most often want to query across, so resist it unless an access boundary genuinely requires it. The destination model is covered in depth in the dedicated workspace guide; for diagnostic settings, the operative point is that the workspace must exist and be reachable before the pipe can be built.
One subtlety in the order of operations concerns regions. A Log Analytics workspace lives in a region, and there is no hard requirement that a resource and its destination workspace share a region; cross-region routing of diagnostic data works. There can be data-egress cost and a data-residency consideration when telemetry crosses a regional or geographic boundary, so for a data-sovereignty-sensitive workload you may deliberately keep the workspace in the same geography as the component. That is a design choice rather than a technical blocker, and it belongs in the prerequisites stage where you decide which workspace a given service will target, before you start clicking.
Configuring a single diagnostic setting, step by step
Start with the single resource because the policy you will write later simply automates exactly this. The portal path is the clearest place to learn the shape of the object, and the command-line path is what you will reach for once you understand it. Take a storage account as the worked example, because storage exercises every interesting feature: it has multiple log categories, it splits along sub-services, and it is one of the services where the default behavior surprises people.
In the portal you open the storage account, expand the Monitoring section in the left navigation, and select Diagnostic settings. The first thing you see is not the account itself but a list of its sub-resources, because storage diagnostic settings are configured per service: the account-level metrics live in one place, and the blob, file, queue, and table services each have their own log categories. This is a property of storage specifically, and it is worth naming because an engineer who configures the account and walks away has logged nothing about blob requests, since blob logs are configured on the blob sub-resource, not the account. You click into the blob service, choose “Add diagnostic setting,” give it a name, and you are presented with two columns: the log categories on the left and the destination options on the right.
The same operation from the Azure CLI makes the structure explicit, and the command is what your automation will eventually run. The verb is az monitor diagnostic-settings create, and its essential parameters are the component it attaches to, the destination, and the categories. A minimal command pointing a asset at a Log Analytics workspace looks like this:
az monitor diagnostic-settings create \
--name "send-to-law" \
--resource "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<acct>/blobServices/default" \
--workspace "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.OperationalInsights/workspaces/<workspace>" \
--logs '[{"category":"StorageRead","enabled":true},{"category":"StorageWrite","enabled":true},{"category":"StorageDelete","enabled":true}]' \
--metrics '[{"category":"Transaction","enabled":true}]'
Three things in that command repay attention. The --resource value points at the blob sub-resource, not the bare account, which is the command-line manifestation of the per-service rule the portal showed you. The --logs array names categories explicitly and sets each to enabled, which means you are opting categories in rather than getting them all by default. And the --metrics array routes the metric time series to the same workspace as the logs, which is what lets you correlate a spike in transaction count against a burst of StorageRead entries in one place. If you omit the workspace and supply a storage-account ID instead, the same command archives to a storage account; supply an Event Hubs authorization rule and it streams. A single setting can carry all three destination arguments at once, which is the fan-out the checklist mentioned.
PowerShell expresses the same object through New-AzDiagnosticSetting and a set of New-AzDiagnosticSettingLogSettingsObject builders, and a Bicep or ARM template expresses it declaratively as a Microsoft.Insights/diagnosticSettings service nested under the target. The declarative form is the one that matters for the scale story, because the policy you write later deploys precisely that template fragment. The point of running the imperative command by hand first is to internalize the shape so that the declarative version reads as familiar rather than mysterious.
What is the difference between enabling a category and enabling a category group?
Enabling a named category exports exactly that one log type. A category group, where a resource type offers one, bundles several categories under a single name such as “allLogs” or “audit,” so enabling the group exports its whole set and automatically picks up new categories Azure adds later. Groups trade precise cost control for future-proof coverage.
Category groups are a relatively recent addition and they change the cost calculus, so it is worth being deliberate about them. The “allLogs” group is convenient and dangerous in equal measure: convenient because you never miss a newly introduced category, dangerous because “all” includes the verbose, high-volume categories you would never have selected individually. The “audit” group, where a resource type defines one, is usually the better default for a security or compliance posture, because it captures the access and change records that auditors ask for without the chatty operational categories that drive ingestion volume. The rule of thumb is to use a named-category list when you know exactly which diagnostic questions you need to answer and cost discipline matters, and to use the audit group when compliance coverage is the goal and you would rather not maintain a category list as Azure evolves. Reach for “allLogs” only when you have budgeted for the volume or when an investigation genuinely needs everything.
Once the setting is created, it appears in the component’s diagnostic-settings list and begins forwarding immediately, though “immediately” carries a latency caveat that the verification section makes precise. The setting is now the pipe. Everything that follows is about choosing what flows through it and where it flows to.
Choosing the destination: query, archive, or stream
A diagnostic setting can target three kinds of destination, and they are not interchangeable. Choosing the wrong one is the misconfiguration that hurts most after the fact, because the data is technically being captured the whole time, just in a form that cannot answer the question you eventually ask. The three destinations map cleanly to three intentions, and naming the intention first is how you avoid the trap.
The first destination is a Log Analytics workspace, and its intention is query. Data routed here lands in tables you can interrogate with Kusto Query Language, correlate across resources, visualize, alert on, and join against other telemetry. This is the destination you want whenever the answer to “why did this happen” will come from reading the logs, which is to say almost always for operational and troubleshooting work. The cost is per gigabyte ingested plus retention beyond the included period, and that cost is the lever the ingestion section dissects. If you can only afford one destination and your purpose is to understand and troubleshoot your systems, it is the workspace.
The second destination is a storage account, and its intention is archive. Data routed here lands as append blobs in a container, organized by resource and time, in a JSON-lines format. It is cheap, it can be kept for years under a lifecycle policy that tiers it to cool and archive storage, and it satisfies the compliance requirement to retain raw records for a long horizon. What it cannot do is answer a query interactively. You cannot run KQL against a storage account. To investigate archived data you must export it, ingest it elsewhere, or read the blobs directly, all of which are deliberate, slow operations. The trap is choosing storage because it is cheap and then discovering during an incident that your logs are in a form you cannot search in the moment you need to. Storage is the right destination for “keep this for seven years in case an auditor asks,” and the wrong destination for “I need to find the failing request now.”
The third destination is an Event Hub, and its intention is stream. Data routed here is delivered as a real-time event stream that an external consumer reads. The canonical consumer is a security information and event management platform, a SIEM, that ingests Azure telemetry alongside logs from the rest of the estate for centralized correlation and alerting. The intention is integration: the data is not staying in Azure to be queried in a workspace, it is leaving to be processed by a system that owns the security or analytics workflow. Event Hubs is the right destination when a downstream platform is the system of record for the data and Azure is only a source. It is the wrong destination when you actually wanted to query the data in Azure and reached for streaming out of unfamiliarity.
Can one diagnostic setting send to more than one destination?
Yes. A single diagnostic setting can target a Log Analytics workspace, a storage account, and an Event Hub simultaneously, fanning the same selected categories to all three. This is the normal pattern for a service that needs interactive query, long-term archive, and SIEM streaming at once, configured as one object rather than three.
The fan-out is convenient but it is also where cost discipline quietly fails, so configure it with intent rather than reflex. The mistake is to enable the same broad category set across all three destinations because the interface makes it easy. The categories you query interactively in a workspace are rarely the same as the categories you need to archive for compliance, which are rarely the categories your SIEM consumes. A more disciplined pattern is sometimes more than one diagnostic setting on the same component, each tuned to its destination: a workspace setting with the operationally useful categories at query-grade cost, and a storage setting with the compliance categories at archive-grade cost. A resource can carry several diagnostic settings, and using that capability to right-size each destination separately is how you avoid paying query-tier ingestion for data you only ever needed to archive.
There is a practical limit worth knowing: a single asset supports a bounded number of diagnostic settings, on the order of a handful, so the multi-setting pattern is for a small number of deliberately separated destinations, not an unbounded sprawl. In practice two or three settings per service cover the query, archive, and stream split comfortably.
Choosing categories and the all-metrics option
Categories are where the cost of an estate is set, one resource at a time, mostly without anyone noticing. Each resource type publishes its own list of available log categories, and the list reflects the component’s nature: a key vault offers audit-event categories, a network security group offers flow and rule-evaluation categories, an Application Gateway offers access, performance, and firewall categories, a SQL database offers query-store, audit, and timeout categories. There is no universal category set; the only way to choose well is to look at what a given resource type offers and map each category to a question you actually need to answer.
The mapping discipline is the whole game. For each available category, ask a concrete question: if this service broke at 3 a.m., which category would hold the evidence I would reach for first? The access log for a gateway, the audit log for a vault, the flow-related categories for a security group. Those are the categories you enable without hesitation. Then ask the harder question for the remaining categories: have I ever actually queried this category, and if a problem in its domain occurred, would I look here or somewhere else? Many categories are verbose performance or diagnostic traces that sound useful, generate enormous volume, and get queried approximately never. Those are the categories that quietly double an ingestion bill while contributing nothing to an investigation. Enabling them “to be thorough” is the single most common cost mistake in Azure observability, and it is committed with good intentions every time.
The metrics side of the setting is simpler and the choice is usually clearer. The “AllMetrics” option routes the resource’s platform metric time series to the destination alongside the logs. Because platform metrics are already collected and retained for free in the metrics store, the value of routing them through a diagnostic setting is correlation and retention: having the numeric series in the same Log Analytics workspace as the logs means you can join a metric spike to the log events around it in one query, and you can keep the metrics beyond the platform’s rolling window. The cost of enabling AllMetrics is generally modest compared to verbose log categories, because metric data is compact relative to event logs, so enabling it is a reasonable default when you are routing to a workspace you query. It is less obviously worthwhile when you are only archiving to storage, where the metrics add blob volume without adding query convenience.
Which log categories should I enable for a typical component?
Enable the categories tied to the questions you would ask during an incident or audit for that asset: access and audit logs almost always, plus the one or two operational categories specific to the service. Skip verbose performance-trace categories unless you have a measured need, because they dominate ingestion volume while rarely answering a real diagnostic question.
A worked example makes the discipline concrete. For a network security group, the flow-related diagnostic data is the high-value signal, because it tells you which connections were allowed and denied, which is exactly what you need when an application cannot reach a dependency and you suspect the security group. For an Application Gateway, the access log and the firewall log carry the answers to the two most common questions, which backend served a request and whether the web application firewall blocked something, while the performance log is lower value for most teams. For a key vault, the audit category that records who read or wrote which secret is the one auditors and incident responders both want, and it is comparatively low volume, which makes it an easy yes. The pattern across all of them is the same: a small number of high-signal, moderate-volume categories carry most of the diagnostic value, and a small number of high-volume categories carry most of the cost, and the two sets rarely overlap. Choosing categories is the act of keeping the first set and declining the second.
One trap deserves explicit naming because it produces a particularly frustrating empty-table experience. Some resource types require that a specific feature be turned on at the resource before its diagnostic data exists to be exported. Audit logging on a SQL database, for instance, has its own enablement that is related to but distinct from the diagnostic setting that routes the audit category onward. If the feature is off at the source, enabling its category in the diagnostic setting forwards nothing, because nothing is being produced. When a category you enabled returns no rows and you are sure the setting is correct, check whether the source feature that generates that category is itself enabled, a failure mode the missing logs and metrics troubleshooting guide covers as one of its primary causes.
Activity log versus resource logs
The single most common reason an engineer believes telemetry is captured when it is not is the conflation of the activity log with resource logs. They are different signals, collected differently, configured differently, and answering different questions, and treating them as one thing leaves a gap exactly where people assume coverage exists.
The activity log is a subscription-scoped record of control-plane operations. Every time something writes to, deletes, or modifies a service, or assigns a role, or starts a deployment, an entry lands in the activity log. It answers the question “who changed what, when,” which is the question of administration and governance. It is collected automatically for every subscription, retained in the platform for ninety days with no configuration, and visible in the portal’s Activity Log blade without anyone setting anything up. Because it is always there, it feels like the baseline of logging, and an engineer who has glanced at the Activity Log blade can come away believing the resources are logging when in fact only the control plane is.
Resource logs, the data a diagnostic setting exports, are data-plane records. They answer the question “what happened inside the component,” which is the question of operation and troubleshooting. The access requests a gateway served, the queries a database ran, the secrets a vault returned, the packets a security group evaluated. This data is not collected to any durable destination automatically; it requires the per-resource diagnostic setting, and it is discarded if no setting routes it. The gap, then, is precise: the activity log will faithfully record that someone deleted a storage account, but only a resource-log diagnostic setting will tell you what requests that account was serving in the minutes before it was deleted. One is the change record, the other is the operational record, and you need both for most real investigations.
Is the activity log configured through a diagnostic setting too?
Yes, but at the subscription scope rather than per resource. The activity log has its own diagnostic setting, configured once, that exports the same control-plane records to a Log Analytics workspace, storage account, or Event Hub. This is how you retain activity-log data beyond the platform’s ninety days and query it alongside resource logs in the same workspace.
Exporting the activity log to a workspace is a step many estates skip and later regret, because the ninety-day platform retention is shorter than most audit horizons and because correlating a control-plane change against the resource-log behavior it caused is far easier when both live in one queryable place. The configuration is a single diagnostic setting at the subscription level that selects the activity-log categories of interest, such as administrative, security, policy, and service-health events, and routes them to the destination. Because it is one setting per subscription rather than one per service, it does not suffer the drift problem that per-resource settings do, but it is also easy to forget entirely precisely because it is a one-time action that is not attached to any particular component you work with. Add it to the landing-zone foundation alongside the workspace, so that the subscription’s control-plane history is captured from day one rather than backfilled never.
The clean mental separation to carry away is this: the activity log is the subscription’s change history, configured once at the subscription, and resource logs are each resource’s operational history, configured per asset and best enforced at scale. When someone says “we have logging,” ask which of the two they mean, because the answer is very often only the first, and the gap is the second.
The settings the defaults get wrong
Diagnostic settings have a default that is the most consequential default in Azure observability, and it is the absence of a default. A new service has no diagnostic setting at all unless something creates one. There is no out-of-the-box setting, no “basic logging” that ships enabled, no inherited configuration from the resource group or subscription. The starting state of every resource is silence, and that silence is the default the rest of this section is about correcting, because correcting it deliberately and universally is the entire job.
Beyond the absence, a second default trips people: when you do create a setting through the portal, nothing is pre-selected for you in a way that matches your intent. The portal presents the category list with nothing enabled, so the default of inaction is an empty setting that forwards no logs even though it exists. An engineer who creates a diagnostic setting, names it, picks the workspace, and saves without ticking any categories has created a real object that forwards metrics if AllMetrics was selected and forwards no logs at all, which produces the especially confusing state of a setting that exists, looks configured, and populates nothing in the log tables. When a diagnostic setting is present but a log query is empty, the first thing to check is whether any log categories are actually enabled inside it, because an all-metrics-no-logs setting is a common and easily overlooked cause.
A third default-related trap is retention. Historically a diagnostic setting carried per-category retention-day fields, and those fields are deprecated for the workspace and storage destinations: retention is now governed by the workspace’s own retention configuration or by a lifecycle management policy on the storage account, not by the diagnostic setting. Engineers who learned the older model sometimes set a retention value in the setting and assume their data will be kept for that period, when in fact the workspace retention setting is what actually governs how long the data lives. Configure retention where it actually lives, on the workspace and on the storage account, and treat any retention field on the setting itself as legacy.
Why does my diagnostic setting exist but no logs appear?
The most common cause is that the setting was saved with no log categories enabled, so it forwards metrics or nothing while the log tables stay empty. The second most common is that the source feature generating those logs is turned off at the component, so there is nothing to forward even though the category is checked.
A fourth subtlety concerns where the data actually lands in the workspace, because the default table layout changed and the difference matters for queries. Diagnostic data routed to a workspace can arrive in resource-specific tables, where each resource type writes to its own dedicated table with typed columns, or in a single shared table that holds many resource types’ logs in a more generic schema. The resource-specific layout is generally the better choice because the typed columns make queries clearer and the data is easier to manage, and for many resource types it is now the standard, but a setting created under the older model may still be writing to the shared table. The practical consequence is that a query against the table you expect can return nothing because the data is in the other table. When logs seem missing, confirming which table layout the setting uses, and querying the table the data actually lands in, resolves a surprising share of empty-result confusion.
The throughline of every default trap is the same: the platform does very little for you by default, and the small amount it does do is easy to misread as more than it is. Correcting the defaults means creating settings deliberately, enabling categories explicitly, configuring retention where it actually lives, and verifying the destination table rather than assuming it. The next section turns assumption into confirmation.
Verification: proving the telemetry actually flows
Every step up to this point is a hypothesis. The setting exists, the categories are enabled, the destination is selected, and none of that is evidence that data is arriving. Verification is the step that converts the hypothesis into a fact, and it is the step most often skipped, because the configuration looks finished and the next task beckons. Skipping it is how a team operates for months in the comfortable belief that a service is logging, right up to the incident that reveals the table has been empty the whole time. The verification habit is cheap and the absence of it is expensive, which is the entire argument for doing it every time.
The verification itself is a query, not a glance. Open the destination workspace and run a query against the specific table the resource’s logs should land in, scoped to the component and to a recent time window, and confirm that rows are present. For a storage account’s blob logs in the resource-specific layout, that is a query against the storage blob logs table filtered to the account; for an Application Gateway, the access-log table filtered to the gateway. The query you run is the same query you would run during an incident, which is the point: verification is a rehearsal of the investigation, and if the rehearsal returns rows, the investigation will too. A populated metrics chart proves nothing about logs, as the opening section warned, so the verification must hit the log table directly.
Latency is the wrinkle that makes naive verification misleading. Diagnostic data does not appear in the destination workspace the instant the asset emits it. There is an ingestion pipeline between the resource and the queryable table, and end-to-end latency from a few minutes to longer is normal and varies by resource type and data category. An engineer who creates a setting and immediately queries the table will find it empty and may conclude the setting is broken, when the data is simply still in flight. The correct verification waits past the expected latency, generates some activity on the service so there is fresh data to forward, and then queries. If the table is still empty well beyond the expected latency window, that is a real signal worth investigating, and the ingestion delay troubleshooting guide walks through distinguishing normal latency from a genuine stall.
How long after configuring a setting should I wait before checking?
Allow at least several minutes, and up to fifteen or more for some resource types, before treating an empty table as a problem. Generate fresh activity on the component so there is new data to forward, then query. An empty result inside the latency window is expected; an empty result well past it is a real signal worth investigating.
A complete verification confirms three things in order. First, that the setting exists and has the intended categories enabled, which you can read back from the setting itself rather than trusting your memory of having configured it. Second, that the destination is the one you intended, because a setting pointing at the wrong workspace will route data faithfully to a place you never query. Third, that rows actually arrive in the table for the categories you enabled, after the latency window, in response to real activity on the resource. Confirming all three closes the loop. Confirming only the first, which is what reading the setting in the portal does, leaves the most important question, whether data is flowing, unanswered.
Build verification into the workflow rather than treating it as an optional afterthought. When you configure a setting, schedule the verification query for after the latency window and actually run it. When you enforce settings at scale with policy, the verification generalizes into a compliance-and-query check across the fleet, which the scale section addresses. The principle scales but does not change: a diagnostic setting is not done when it is saved, it is done when a query against its destination returns the data it was supposed to forward.
Applying diagnostic settings at scale with Azure Policy
Everything to this point has been about one service and one pipe, and one component is the wrong unit for a real estate. An estate has hundreds or thousands of resources, created continuously by many teams through many mechanisms, and a per-resource manual setting cannot keep pace. The resource created at 2 p.m. by a pipeline you did not run has no diagnostic setting because nobody clicked the blade, and it will have none until somebody notices, which is to say possibly never. The pipe-and-scale rule names the only durable answer: enforce diagnostic settings at scale with Azure Policy, so that the pipe is attached automatically to every asset that should have one, the moment it exists.
The mechanism is a policy with the deployIfNotExists effect. A deployIfNotExists policy evaluates each service in scope against a condition, and where the condition is not met, it deploys a remediation, in this case a diagnostic setting, to bring the resource into compliance. Azure publishes built-in deployIfNotExists policies for diagnostic settings on most loggable resource types, each one parameterized with the destination workspace and the categories to enable, so in many cases you assign a built-in rather than authoring one from scratch. The policy watches for resources of its target type, checks whether a diagnostic setting routing to the specified workspace exists, and where it does not, creates one. New resources are covered as they appear; the drift that defeats manual configuration is structurally prevented.
Two requirements make deployIfNotExists work, and missing either produces a policy that looks assigned but enforces nothing, so name them explicitly. The first is a managed identity. Because the policy deploys a component on your behalf, it needs an identity with permission to create diagnostic settings and to write to the destination workspace, and that identity is created and assigned roles as part of the policy assignment. A deployIfNotExists assignment without a properly permissioned managed identity will evaluate resources, find them noncompliant, and fail to remediate, because it cannot perform the deployment. The second is scope. A policy assigned at a single resource group covers only that group; to cover an estate you assign at a management group or subscription so that inheritance carries the policy down to every resource group and service beneath it. The full effect model, the role requirements, and the assignment mechanics are the subject of the Azure Policy governance guide, which this article leans on for the enforcement layer.
Does assigning the policy fix existing resources automatically?
No. A deployIfNotExists assignment covers resources created after the assignment automatically, but existing noncompliant resources are flagged rather than fixed until you run a remediation task. The remediation task uses the policy’s managed identity to deploy the missing diagnostic setting across the already-existing resources, closing the gap the assignment alone leaves open.
The existing-resource gap is the detail that turns a confident “we enforced logging” into a quiet coverage hole, so internalize it. When you assign a deployIfNotExists policy, Azure begins evaluating compliance and will remediate resources created from that point forward, but the resources that already existed when you assigned the policy are marked noncompliant and left as they are. Bringing them into line requires an explicit remediation task, which iterates the noncompliant resources and uses the policy’s managed identity to deploy the diagnostic setting each one is missing. The sequence for a real rollout is therefore: author or select the built-in policy, assign it at a management group or subscription with a managed identity and the destination parameters, confirm the compliance view populates, then run a remediation task to sweep the existing fleet, and finally rely on the policy to cover everything new. Skip the remediation step and you have protected the future while leaving the present exactly as drifted as it was.
Scaling category selection across many resource types is the part that takes design rather than a single assignment. Because each resource type has its own category list, the diagnostic-settings policies are per-type, and a comprehensive rollout assigns many of them, often grouped into a policy initiative so they assign and report as one unit. The category choices you made deliberately for the single-resource case become the parameters of these policies, which is why understanding the pipe first matters: the policy is only as good as the category and destination decisions encoded into it, and a thoughtlessly broad built-in assignment will enforce the very over-collection the cost section warns against, only now across the entire estate at once. Enforce the disciplined configuration, not merely some configuration.
The ingestion cost consequence
Diagnostic settings are where a Log Analytics bill is written, and the bill is written one category at a time, mostly by people who never see it. Ingestion into a workspace is priced per gigabyte, so the cost of your observability is a direct function of how many bytes of log your diagnostic settings forward, which is a direct function of which categories you enabled across how many resources. A single verbose category enabled fleet-wide through a policy can move a workspace bill by a meaningful amount, and because the person enabling the category and the person paying the bill are usually different, the feedback loop that would discourage over-collection is broken. Naming the cost mechanism explicitly is how you restore the loop.
The volume drivers are predictable once you know to look. High-frequency, per-request log categories are the dominant cost in most estates: anything that writes a log line for every transaction, request, or evaluation produces volume proportional to traffic, and at scale that volume is large. Verbose performance and trace categories are the second driver, generating detailed records that are voluminous and rarely queried. The asymmetry the categories section described is the cost asymmetry restated: the high-value categories you query are usually moderate volume, and the high-volume categories that dominate the bill are usually the ones you never query, which means a careful category selection can cut ingestion substantially while losing almost no diagnostic value. The savings come from declining to forward data you would not have read.
The destination fan-out is the second cost trap, and it is subtler because it feels like good coverage. Sending the same broad category set to a workspace, a storage account, and an Event Hub triples the cost of moving that data, and only the workspace portion buys you interactive query. If a category exists for compliance retention, it belongs in cheap storage, not in the query-priced workspace; if it exists for SIEM correlation, it belongs in the Event Hub. Routing every category to every destination because the interface allows it is how a bill inflates without anyone choosing to inflate it. The disciplined pattern, separate settings tuned per destination, exists precisely to put each category in the cost tier that matches its purpose.
How do I reduce diagnostic ingestion cost without losing coverage?
Audit which categories you actually query, then stop forwarding the verbose categories nobody reads, especially the high-volume per-request and trace categories. Route compliance-only data to cheap storage rather than the query-priced workspace, and avoid fanning the same broad category set to multiple destinations. The savings come from declining data you never read.
There are structural cost levers beyond category selection that belong in the same decision, because the right combination depends on volume. Workspaces offer commitment-tier pricing that trades a daily volume commitment for a lower per-gigabyte rate, which is worth modeling once your ingestion is predictable and substantial. Some data can be routed to a lower-cost ingestion tier intended for high-volume, low-query logs, which fits exactly the verbose categories you want to retain cheaply but rarely query interactively. And the workspace retention setting governs how long ingested data is kept at full query cost before archiving, so aligning retention to actual need rather than a generous default trims cost on data already collected. These levers compound with category discipline rather than replacing it: the cheapest gigabyte is the one you chose not to forward, and the second cheapest is the one you forwarded to the tier and retention that match how you actually use it.
The cost discipline and the coverage discipline are the same discipline viewed from two sides. Over-collection is both a cost problem and, paradoxically, a coverage problem, because a workspace drowning in unqueried verbose logs is harder to search and slower to query than one that holds the signal you actually use. Configuring diagnostic settings well means forwarding the data that answers your real questions, to the destination and tier that match how you will use it, across every resource through enforcement, and nothing more. That sentence is the whole article compressed, and it is also the cost-control strategy.
Common misconfigurations and their symptoms
Most diagnostic-settings problems reduce to a small set of recurring misconfigurations, each with a recognizable symptom, and learning to read the symptom back to the cause is what turns a confusing empty table into a quick fix. The patterns below are the ones engineers report most often, paired with the configuration error that produces each.
The first and most common is the component with no setting at all, and its symptom is a log query that returns nothing for a asset the team believed was monitored. The cause is the absence of a default combined with manual configuration: the resource was created, the setting was never added, and the metrics chart populating from platform collection masked the gap. The confirmation is to read the service’s diagnostic-settings list and find it empty. The fix is to add the setting, and the durable fix is the policy that would have added it automatically. This single pattern is the reason the scale section exists.
The second is the setting that exists but enables no log categories, and its symptom is nearly identical to the first, an empty log table, with the misleading difference that a setting is present, which sends the investigator looking everywhere except the setting’s contents. The cause is a save that selected a destination and perhaps AllMetrics but ticked no log categories. The confirmation is to open the setting and see the empty category selection. The fix is to enable the categories, and the lesson is that the presence of a setting is not the presence of logging.
The third is the wrong-destination setting, and its symptom is that data is clearly being forwarded somewhere, the component is configured, but the workspace you query returns nothing. The cause is a setting pointing at a different workspace, often a default or a leftover from a previous design, so the data lands faithfully in a place nobody looks. The confirmation is to read the destination from the setting and discover it is not the workspace you expected. The fix is to repoint the setting, and the prevention is the policy parameter that pins the destination workspace fleet-wide so individual settings cannot drift to the wrong sink.
Why does a brand-new resource have no logs even though others do?
Because diagnostic settings are per-resource and do not inherit, so a new service starts with no setting regardless of how its siblings are configured. If a policy enforces settings, the new component is covered automatically after evaluation; if configuration is manual, the new resource has no logging until someone adds a setting to it specifically.
The fourth pattern is the source feature that is off, and its symptom is an empty result for one specific category while other categories on the same setting return rows. The cause is that the category’s underlying feature is not enabled at the asset, so the diagnostic setting forwards a category that is producing nothing. The confirmation is that the setting is correct and other categories work, which isolates the problem to the one silent category. The fix is to enable the source feature, after which the category begins forwarding. This pattern is distinctive because it presents as a partial gap rather than a total one, which points directly at the source rather than the setting.
The fifth is the latency misread, and its symptom is an empty table immediately after configuration that fills in later. The cause is verifying inside the ingestion latency window and concluding the setting failed when the data was in flight. The confirmation is that waiting and re-querying returns rows. The fix is patience and a verification habit that respects the latency window. This one is not a configuration error at all, which is exactly why it wastes time: engineers tear down and rebuild a working setting because they checked too soon.
The sixth is the scope-and-remediation gap in policy-driven estates, and its symptom is that new resources are logging but a set of older resources are not, despite a policy being assigned. The cause is a deployIfNotExists assignment that was never followed by a remediation task, so the future is covered and the existing fleet is flagged but untouched. The confirmation is the compliance view showing noncompliant existing resources. The fix is to run the remediation task. Reading these six patterns against the symptom in front of you resolves the large majority of diagnostic-settings investigations without guesswork.
Making the configuration repeatable as code
A diagnostic setting clicked into a portal is a fact about one service at one moment, undocumented and unrepeatable, and the goal is the opposite: a configuration that is declared, versioned, reviewed, and applied identically everywhere. There are two complementary routes to that goal, and a mature estate uses both. The first is declaring the setting as code on the resource that owns it, so the pipe is part of the component’s own definition. The second is the policy enforcement already described, which applies settings to resources that did not declare their own. They are not in conflict; they cover different cases, and together they leave no service uncovered.
Declaring the setting alongside its resource is the cleaner model where you own the component’s deployment template. In Bicep or in an ARM template, a diagnostic setting is a Microsoft.Insights/diagnosticSettings asset scoped to the target, and you author it once in the module that deploys the resource so that every deployment of that module carries its logging configuration. A Bicep fragment that attaches a setting to a service and routes it to a workspace reads cleanly:
resource diag 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
name: 'send-to-law'
scope: targetResource
properties: {
workspaceId: logAnalyticsWorkspaceId
logs: [
{
categoryGroup: 'audit'
enabled: true
}
]
metrics: [
{
category: 'AllMetrics'
enabled: true
}
]
}
}
The value of this form is that the logging configuration travels with the component through every environment and every redeployment, reviewed in the same pull request as the service itself, and never drifts because it is regenerated from the template each time. The scope property is what attaches the setting to its target, the workspaceId pins the destination, and the categoryGroup of audit selects the audit bundle rather than an explicit category list, which is the future-proof choice for a compliance posture. Terraform expresses the same object through an azurerm_monitor_diagnostic_setting resource with equivalent fields, and the choice between Bicep and Terraform follows whatever your estate already uses rather than any property of diagnostic settings specifically.
Should I use template-declared settings or policy enforcement?
Use both. Declare the setting in the component’s own template where you control the deployment, so logging ships with the asset and is reviewed alongside it. Use policy enforcement to catch resources created outside your templates, by other teams or click-ops, so nothing slips through. The template covers what you build; the policy covers everything else.
The division of labor between the two routes is the practical design. Resources your teams deploy through your own infrastructure-as-code pipelines should declare their diagnostic settings in their templates, because that is the most precise and reviewable place to specify exactly the categories and destination each resource needs, and it keeps logging configuration honest in code review. Resources created by other means, a team using the portal, a third-party tool, a service that provisions resources on your behalf, will not carry a template-declared setting, and those are exactly the resources the policy catches. The policy is the safety net under the template, the enforcement that guarantees coverage even for the resources your templates never touched. An estate that relies only on templates leaves the click-ops resources uncovered, and an estate that relies only on policy enforces a one-size category set rather than the tuned configuration a template can express. Using both gives you precision where you control the deployment and coverage everywhere else.
There is a real payoff to running and reproducing these configurations in a sandbox before you commit them to a production estate, because the per-type category lists, the destination behaviors, and the policy remediation flow all have details that are faster to learn by doing than by reading. You can run the hands-on Azure labs and command library on VaultBook to build a diagnostic setting against a live service, watch the data land in a workspace, attach a deployIfNotExists policy at a scope, trigger a remediation task, and confirm the fleet comes into compliance, all in an environment where a mistake costs nothing. The command and template library there gives you tested starting points for the CLI, Bicep, and policy forms this article describes, which is the fastest way to turn the reading into a configuration you trust.
Worked scenario: bringing a drifted subscription into compliance
The patterns above are clearest when walked through end to end against a realistic estate, so consider a subscription that grew organically over a year. Several teams deployed workloads through a mix of pipelines, portal clicks, and a third-party provisioning tool. Logging was configured by whoever remembered, which means coverage is patchy: some virtual machines forward their guest data, most network security groups forward nothing, a handful of key vaults audit correctly because one security-minded engineer wired them up, and the activity log was never exported anywhere. Nobody can state the coverage with confidence, which is itself the symptom. This is the ordinary starting state, and the goal is to reach a governed standard without manually touching hundreds of items.
The first move is assessment, not configuration. Before changing anything, you want a list of every loggable component in the subscription and whether it currently forwards data. Azure Resource Graph answers this with a single query that enumerates the estate and joins against the diagnostic-export objects that exist, producing the coverage map the portal will not give you. The map is sobering and useful in equal measure: it names exactly which assets are silent, and it gives you a baseline number to measure remediation against. Skipping assessment and jumping straight to enforcement works, but you lose the before-and-after measurement that proves the rollout did what you intended, and measurement is what turns a hopeful change into a verified one.
The second move is to establish the destination foundation if it does not already exist. One governed workspace, sized and access-controlled for the subscription, becomes the query destination, and a storage account with a lifecycle policy becomes the compliance archive for the categories that exist only to satisfy retention. With the foundation in place, you decide the category policy per service type, applying the mapping discipline from earlier: audit and access categories everywhere, the one or two operational categories each service type needs, and a deliberate refusal of the verbose performance categories that would inflate ingestion. This decision, made once and written down, becomes the parameter set every policy assignment will carry.
The third move is enforcement through a policy initiative. Rather than assigning dozens of individual deployIfNotExists policies one at a time, you group the per-type diagnostic policies into one initiative, parameterize it with the governed workspace and the category decisions, and assign the initiative at the subscription, or at a management group if the standard should span several subscriptions. The assignment creates the managed identity, and you grant that identity the roles it needs to write export objects and to reach the workspace. At this point the future is handled: any component created from now on is evaluated and brought into line automatically, with no engineer in the loop to forget.
The fourth move closes the existing gap that the assignment leaves open. The compliance view now lists every pre-existing component as noncompliant, because the assignment covers the future and flags the past. You trigger a remediation task per policy in the initiative, and the managed identity sweeps the noncompliant components, deploying the missing export configuration to each one. The coverage map you built in step one is the verification artifact: rerun the Resource Graph query after remediation completes and confirm the silent assets now forward data. The before-and-after numbers are the proof that the rollout worked, and they are the report you hand to whoever asked whether the estate is governed.
The fifth move is the one teams skip and pay for: the activity log. The per-component work above never touches the subscription’s control-plane history, because the activity log is exported by its own subscription-scoped configuration. Add that export to the governed workspace as a one-time action, selecting the administrative, security, policy, and health categories, so that the change record lives alongside the operational records and the ninety-day platform window is no longer the limit of your audit reach. With that final action, the subscription has moved from “logging configured by whoever remembered” to “logging enforced as a standard, verified by query, and covering both the data plane and the control plane.” That transition, repeated per subscription, is what the pipe-and-scale rule looks like in practice.
The two specialized destination patterns
Most workloads need the query destination and nothing more, but two specialized patterns recur often enough to deserve their own treatment, because each carries a configuration subtlety that the generic workspace path does not. The first is streaming to a security platform, and the second is archiving for long-horizon compliance. Both are destination choices the checklist named, and both are where a careless configuration quietly fails to deliver what the team assumed.
Streaming to a SIEM through an Event Hub is the pattern when a security team owns a central analytics platform that ingests telemetry from across the organization, not only Azure. The diagnostic export points at an Event Hub, the SIEM consumes the stream, and Azure becomes one source feeding a system of record that lives elsewhere. The subtlety is that the Event Hub is a throughput-bounded pipe with its own capacity model, so a high-volume stream of verbose categories can saturate it and cause the consumer to fall behind or drop data. The discipline here is to stream the categories the security platform actually correlates on, typically the audit and security-relevant ones, rather than fanning every operational category into a stream that was never sized for it. Streaming is for integration, and the integration works only when the volume matches the pipe’s capacity and the consumer’s appetite.
Archiving to storage for compliance is the pattern when a regulation or an internal policy requires raw records be kept for years and the team has no intention of querying them interactively unless an auditor or an investigation demands it. The export points at a storage account, a lifecycle management policy tiers the blobs to cool and then archive storage as they age, and the cost per gigabyte falls dramatically compared to keeping the same data in a query-priced workspace. The subtlety is the one the destinations section warned about: storage data is not queryable with KQL, so the day an investigation does need the archived records, retrieving them is a deliberate operation of rehydrating archive-tier blobs and parsing the JSON-lines format, not an interactive query. Teams that understand this configure storage archive as a complement to a workspace, not a replacement for it, keeping the recent and operationally useful window in the queryable destination and the long tail in cheap storage.
When should I send the same data to both a workspace and storage?
When you need interactive query over a recent window and cheap retention over a long horizon for the same categories. Keep the operationally useful retention in the workspace where you query during incidents, and archive the same or a compliance subset to storage at a fraction of the cost. Tune each destination’s category set rather than fanning one broad selection to both.
The combined pattern, query in a workspace and archive in storage, is the most common production shape for a regulated workload, and the way to configure it well is to think of the two destinations as serving two different time horizons. The workspace holds the recent window at full query capability, retained for as long as your investigations realistically reach back, often a few months. The storage archive holds the long horizon at minimal cost, retained for the years a compliance regime demands, never queried unless something forces it. Configuring them as separate export definitions, each with the category set that matches its horizon, is how you avoid paying query-tier ingestion for data whose only job is to sit in archive against a future audit that may never come. The cost section’s lesson applies precisely here: the cheapest gigabyte is one you chose not to forward to the expensive destination, and the second cheapest is one you sent to the tier that matches how you will actually use it.
There is a parallel scenario worth naming because it surprises teams during a security event: ingestion cost spiking without an obvious cause. The usual culprit is a newly deployed fleet of a chatty service type, brought online by a policy that enforces a broad category set, suddenly forwarding high-volume per-request data that nobody budgeted for. The spike is the cost section’s warning made real, and the fix is to revisit the policy’s category parameters rather than to disable logging in a panic. Tightening the enforced categories to the high-signal set, and routing any compliance-only categories to storage rather than the workspace, brings the bill back down without sacrificing the coverage that the enforcement was protecting. The lesson recurs: enforcement is only as good as the category discipline encoded into it, and a cost spike is usually a signal that the discipline slipped, not that logging itself was a mistake.
A per-service-type category starting point
Because every service type publishes its own category list, there is no single answer to “what should I enable,” but there is a defensible starting point for the common types that covers the high-signal data while declining the volume drivers. The table below is a findable artifact you can lift into a category standard and then tune, organized by the question each selection is meant to answer rather than by an exhaustive enumeration.
| Service type | High-signal categories to enable | What it answers | What to leave off |
|---|---|---|---|
| Key vault | Audit events | Who read or wrote which secret, and when | Verbose policy-evaluation traces unless investigating |
| Network security group | Flow and rule-evaluation data | Which connections were allowed or denied | Nothing major; volume scales with traffic, so size the destination |
| Application Gateway | Access and firewall logs | Which backend served a request and what the firewall blocked | Performance log for most teams |
| Storage (per service) | Read, write, delete on the relevant sub-service | Which client performed which data-plane operation | Categories on sub-services you do not use |
| SQL database | Audit, query timeouts, blocking | Who accessed data and which queries stalled | High-volume query-store export unless tuning |
| App Service | HTTP and application logs, audit | Request behavior and application-level events | Verbose platform traces unless debugging |
Treat the table as a conversation starter for a category standard, not a mandate. The right column matters as much as the middle one, because the volume you decline is the cost you avoid, and the discipline of naming what to leave off is what keeps an enforced standard from becoming an enforced overspend. When a service type is not in the table, apply the same reasoning from scratch: read its published categories, keep the audit and access entries, keep the one or two operational categories tied to the questions you would ask during an incident, and decline the verbose performance traces unless you have a measured reason to keep them. The reasoning generalizes even when the specific names do not.
How do I keep coverage from regressing over time?
Treat the enforcement policy as the floor and add a periodic audit on top of it. Rerun a Resource Graph coverage query on a schedule and alert when a loggable component appears without an export configuration, which catches gaps from new service types the policy initiative does not yet cover. Governance is a standing process, not a one-time rollout.
The governance habit that keeps an estate covered is to assume the standard will erode and to build the check that catches the erosion. A policy initiative covers the service types it knows about, but Azure adds new service types and your teams adopt them, and a type the initiative does not yet include will deploy with no telemetry export and no warning. The defense is a scheduled coverage audit: a Resource Graph query that enumerates loggable components and flags any without an export configuration, run on a cadence and wired to an alert so a human sees the gap. When the alert fires, you extend the initiative to cover the new type, and the floor rises to include it. This standing loop, the policy as the automatic floor and the audit as the catch for what the floor misses, is what separates an estate that was governed once from an estate that stays governed.
A final piece of the durable picture is treating telemetry coverage as part of the definition of done for any new workload. When a team ships a workload, the review that approves it should confirm that its components forward the categories the standard requires, in the same way the review confirms the workload is secured and scaled. Baking the expectation into the delivery process means coverage is established at birth rather than retrofitted after an incident exposes its absence. The combination of enforcement, audit, and a delivery-time expectation is belt, suspenders, and a tailor, and together they make the empty-table-during-an-incident experience a memory rather than a recurring surprise.
The verdict
Configuring diagnostic settings well is not a complicated task, but it is an easy task to do incompletely, and the incompleteness is invisible until it is expensive. The model is simple enough to state in a sentence: a diagnostic setting is the pipe that carries one component’s logs and metrics to a destination you choose, and because it attaches to one resource at a time, consistent telemetry across an estate comes from enforcing the pipe at scale with policy rather than clicking each service. That is the pipe-and-scale rule, and it is the difference between an estate that can answer questions during an incident and one that discovers its logs were never captured at the moment it needs them most.
The decisions that matter are few and they reward deliberateness. Choose categories by mapping each to a question you would actually ask, which keeps the high-signal data and declines the high-volume data that drives cost without earning its keep. Choose the destination by naming the intention, query in a workspace, archive in storage, stream to an Event Hub, and do not reach for the wrong one out of habit. Separate the activity log from resource logs in your thinking, because believing you have logging when you have only the control-plane change record is the most common false comfort in Azure observability. Verify by querying the destination table after the latency window rather than trusting a populated metrics chart, because the chart proves nothing about logs. Enforce at scale with a deployIfNotExists policy, supply it the managed identity it needs, scope it at a management group or subscription, and remember the remediation task that brings existing resources into line, because the assignment alone only covers the future. Make the configuration repeatable, declaring settings in templates where you own the deployment and relying on policy to cover everything else.
Do those things and the empty-table-during-an-incident experience, the single most demoralizing moment in operating Azure, stops happening, because the data is already there, in the destination you chose, for every component, captured automatically the moment each resource was born. That is the entire return on getting diagnostic settings right, and it is paid out exactly when you need it.
One last framing worth carrying away ties the whole guide together. Observability is often treated as a product you buy and a dashboard you admire, but the dashboard is downstream of a decision made far earlier and far less glamorously: whether each component’s telemetry was ever routed somewhere durable. No query language, no alert rule, and no expensive analytics platform can recover data that was discarded at the source because no export pipe existed. The humble per-component export configuration is therefore the foundation the entire observability stack rests on, and enforcing it universally with policy is the single highest-leverage thing you can do for the day an incident arrives. Get the foundation right, automatically and everywhere, and everything built on top of it has the data it needs.
Frequently asked questions
What is a diagnostic setting in Azure?
A diagnostic setting is a configuration object attached to a single Azure asset that routes that service’s logs and selected metrics to one or more destinations: a Log Analytics workspace, a storage account, or an Event Hub. It does not generate or store data itself; it is the pipe connecting a resource’s internal telemetry to a destination where you can keep and query it. Without a setting, the detailed logs are discarded.
How do I configure diagnostic settings to route logs to Log Analytics?
Create a diagnostic setting on the component, select the log categories you want, and choose your Log Analytics workspace as the destination. In the portal this is the Diagnostic settings blade under Monitoring; from the CLI it is az monitor diagnostic-settings create with the --workspace parameter and a --logs array naming categories. The workspace must already exist and you need write permission on it.
What destinations can a diagnostic setting send to?
Three: a Log Analytics workspace for interactive KQL query, a storage account for cheap long-term archive, and an Event Hub for streaming to an external system such as a SIEM. A single setting can target all three at once, fanning the same categories to each. Choose by intention, because storage data is not queryable with KQL and streaming data leaves Azure entirely.
How do I choose which log categories to enable?
Map each available category to a question you would actually ask during an incident or audit for that service, enable the categories that hold the answers, and decline the verbose performance and trace categories you would never query. The high-signal categories are usually moderate volume; the high-volume categories usually go unqueried, so careful selection cuts cost while keeping diagnostic value.
How do I apply diagnostic settings at scale across many resources?
Assign an Azure Policy with the deployIfNotExists effect, parameterized with the destination workspace and categories, at a management group or subscription scope. The policy deploys a diagnostic setting to any resource of its target type that lacks one. Because category lists are per resource type, you assign multiple such policies, often grouped into an initiative so they assign and report as one unit.
What is the difference between the activity log and resource logs?
The activity log is a subscription-scoped record of control-plane operations, who changed, deleted, or created what, collected automatically and kept ninety days. Resource logs are data-plane records of what happened inside a component, exported only through a per-resource diagnostic setting. You need both: the activity log shows the change, the asset log shows the operational behavior around it.
How do diagnostic settings affect ingestion cost?
Log Analytics ingestion is priced per gigabyte, so cost is a direct function of how many bytes your settings forward, which depends on the categories enabled across all your resources. Verbose, high-frequency categories dominate volume and cost, and fanning the same categories to multiple destinations multiplies it. Disciplined category selection and per-destination tuning are the main cost levers.
Why does my resource show metrics but no logs?
Platform metrics are collected automatically and retained without any configuration, so the metrics charts populate immediately on a new service. Logs require a diagnostic setting to leave the component and are discarded otherwise. A populated metrics chart is not evidence that logging is configured; you must verify log flow by querying the log table directly.
Why does my diagnostic setting exist but no logs appear?
The most common cause is that the setting was saved with no log categories enabled, so it forwards metrics or nothing while the log tables stay empty. The next most common is that the underlying source feature generating a category is disabled at the resource, so there is nothing to forward. Confirm categories are enabled and the source feature is on.
Does assigning a deployIfNotExists policy fix existing resources?
No. The assignment covers resources created afterward automatically, but existing noncompliant resources are flagged rather than remediated until you run a remediation task. The remediation task uses the policy’s managed identity to deploy the missing settings across the existing fleet, closing the gap. Skip it and you protect only the future.
Why does my deployIfNotExists policy not remediate anything?
Most often the policy assignment lacks a properly permissioned managed identity, so it can evaluate resources and mark them noncompliant but cannot perform the deployment that would fix them. The identity needs permission to create diagnostic settings and to write to the destination workspace. Confirm the assignment created an identity and that it holds the required roles.
How long should I wait before checking that a new setting works?
Allow at least several minutes and up to fifteen or more for some resource types, generate fresh activity on the service so there is new data to forward, then query the destination table. An empty result inside the latency window is expected and not a failure; an empty result well past the window is a real signal worth investigating as a possible stall.
Can a single component have more than one diagnostic setting?
Yes, up to a small bounded number. The multi-setting pattern is useful for separating destinations by purpose: one setting with operational categories routed to a workspace at query cost, another with compliance categories routed to cheap storage. This lets you right-size each destination’s category set and cost tier rather than fanning one broad set everywhere.
Should I configure retention in the diagnostic setting?
No. The per-category retention fields on the setting are deprecated for the workspace and storage destinations. Retention is governed by the workspace’s own retention configuration and by a lifecycle management policy on the storage account. Set retention where it actually lives and treat any retention field on the setting itself as legacy.
Do I need a separate Log Analytics workspace for each resource?
No. Route many resources into a shared workspace and use a small number of workspaces aligned to access and data-residency boundaries rather than to your asset layout. One workspace per service fragments your data along the lines you most often want to query across and complicates correlation, so use separate workspaces only where an access boundary genuinely requires one.
Should I declare diagnostic settings in templates or enforce them with policy?
Both. Declare the setting in the resource’s own Bicep, ARM, or Terraform template where you control the deployment, so logging ships with the component and is reviewed alongside it. Use policy enforcement to cover resources created outside your templates by other teams or tools. The template gives precise per-resource configuration; the policy guarantees no service slips through uncovered.
Why does one category return no rows while others on the same setting work?
Because the underlying feature that generates that specific category is not enabled at the resource, so the setting forwards a category that is producing nothing while the other categories work normally. This presents as a partial gap rather than a total one, which points at the source feature rather than the setting. Enable the feature at the component and the category begins forwarding.
Is the activity log captured by default for the long term?
The activity log is collected automatically and kept in the platform for ninety days, but that window is shorter than most audit horizons. To retain it longer and query it alongside resource logs, configure a diagnostic setting on the activity log at the subscription scope that exports it to a workspace or storage. It is a one-time configuration that is easy to forget precisely because it is not attached to any single asset.
What happens to logs if I delete a diagnostic setting?
Deleting a setting stops the forwarding from that point forward; the resource resumes discarding the detailed logs that the setting was carrying. Data already ingested into the destination is unaffected and remains subject to the destination’s retention. There is no backfill, so any logs produced while no setting exists are lost permanently, which is why coverage gaps matter even when they are later corrected.
Can I see every component in a subscription and whether it forwards telemetry?
Yes, with Azure Resource Graph. A single query enumerates loggable components across the subscription and joins them against the export definitions that exist, producing a coverage map the portal does not offer in one view. Run it before a rollout to baseline coverage and on a schedule afterward to catch components that appear without an export, which is how you keep an enforced standard from quietly regressing.