A cache is the fastest way to make a slow system feel fast, and the fastest way to make a correct system serve wrong answers. Both happen for the same reason: the cache holds a copy of data that lives somewhere else, and a copy is only as good as the discipline that keeps it honest. Most teams reach for caching patterns with Azure Cache for Redis the moment a database starts groaning under read load, drop a few GET and SET calls into the hot path, watch latency fall, and ship. Weeks later a customer sees a price that changed an hour ago, a feature flag that was turned off still reads as on, or a deleted record reappears, and the bug is almost impossible to reproduce because it depends on what happens to be sitting in memory at that instant.

Caching patterns with Azure Cache for Redis: cache-aside, invalidation, TTL, and stampede avoidance - Insight Crunch

That class of bug is not a Redis bug. Redis does exactly what it was told. The bug is a pattern bug: the team adopted the read path of caching without the write path, the speed without the correctness contract. This article treats caching as a design pattern rather than a pair of commands. You will leave able to apply cache-aside correctly, decide where invalidation belongs and where a time-to-live belongs, run Redis as a session store across stateless instances, pick the tier and persistence the workload actually needs, and recognize and defuse a cache stampede before it takes the source system down with it. The throughline is one rule, stated below, that explains the large majority of cache correctness incidents and tells you exactly what to check first.

What caching patterns actually solve

A caching pattern is a contract about three things: when a value enters the cache, when it leaves, and what a reader sees in the window where the cached copy and the source disagree. The reason caching looks deceptively simple is that the first of those, putting a value in, is trivial, and the other two, getting it out at the right time and bounding the disagreement, are where every hard problem lives. A team that thinks of caching as “store the result so we do not query again” has implemented one third of a pattern and inherited the other two thirds as latent bugs.

The dominant pattern in practice is cache-aside, also called lazy loading. The application, not the cache, owns the logic. On a read, the application asks Redis for the key. If the value is present, that is a hit and the application returns it. If the value is absent, that is a miss: the application reads from the system of record, writes the result into Redis with an expiry, and returns it to the caller. The cache fills itself on demand, one missed key at a time, which is why only data that is actually requested ever occupies memory. Nothing is preloaded, nothing is wasted on keys no one reads.

The write side is where discipline enters. When the underlying record changes, the cached copy is now a lie. Cache-aside handles this by invalidating: on a write to the source, the application deletes the cached key rather than trying to update it in place. The next read misses and repopulates from fresh data. Deleting instead of updating is deliberate, and the reason matters: an update-in-place under concurrency can interleave with a stale read and write an old value back over a new one, while a delete is idempotent and cannot resurrect stale data. The cost is one extra miss after each write, which is almost always cheaper than the bug an in-place update invites.

How does the cache-aside pattern work?

Cache-aside puts the application in control. On a read it checks Redis first, returns the value on a hit, and on a miss loads from the source of truth, stores the result with a time-to-live, and returns it. On a write it updates the source and deletes the affected key so the next read repopulates.

That single paragraph hides the whole correctness story, and it is worth slowing down on the word “deletes.” The instinct of most engineers writing their first cache layer is to keep the cache in sync by writing the new value to both the database and Redis on every update. That is a different pattern, write-through, and it has its place, but adopting it by reflex inside a cache-aside design is how subtle races appear. Picture two operations on the same key happening close together: request A reads from the database and is about to populate the cache with the value it read; meanwhile request B updates the database and writes the new value to the cache. If A’s cache write lands after B’s, the cache now holds A’s older value with no event left to correct it, and only the time-to-live will eventually save you. Delete-on-write sidesteps the entire family of these interleavings because a delete that arrives in any order still leaves the cache empty, and an empty cache is never wrong, only slower.

The invalidate-and-bound rule

Here is the claim this article is built around, and the one to carry into every cache design review. Call it the invalidate-and-bound rule: a cache is correct only when every write to the source invalidates the cached copy and every cached entry carries a time-to-live that bounds how long a missed invalidation can serve stale data. Invalidation handles the staleness you can see coming, the explicit writes. The time-to-live handles the staleness you cannot, the write that happened in a path you forgot, the event that failed to fire, the out-of-band change a database trigger made. The two are not redundant. Invalidation gives you freshness; the expiry gives you a ceiling on the damage when invalidation is missed. Strip either one out and you get a recognizable failure: drop invalidation and reads serve old data until the expiry; drop the time-to-live and a single missed invalidation serves old data forever. When a cache bug lands on your desk, the rule tells you where to look before you touch anything: find the write path that should have invalidated the key, and find the entry that should have expired but did not. The large majority of real incidents are one of those two, and naming them turns a mysterious intermittent bug into a two-line checklist.

This is the same discipline the series applies everywhere: name the root requirement, then derive the design from it rather than from the happy path. The performance gain from caching is real and easy. The correctness contract is the part that separates a cache that speeds the system from a cache that quietly corrupts it.

The InsightCrunch caching pattern table

Before walking the Azure mechanics, fix the map of patterns in one place. The table below is the findable artifact for this article: each row is a pattern, the problem it solves, the correctness obligation it carries, and the failure you get if you skip that obligation. Read it as a decision aid, not a menu to adopt all at once. A typical production system uses cache-aside for most reads, a write strategy chosen per data class, a time-to-live on every entry, and a stampede control on the handful of keys hot enough to matter.

Pattern What it does Problem it solves Correctness obligation Failure if skipped
Cache-aside (lazy loading) App reads cache, on miss loads source and populates with a TTL Repeated reads of the same data hammering the source Delete the key on every source write Stale reads until expiry
Read-through Cache library loads the source on a miss transparently Same as cache-aside, with load logic centralized in the client Library must still be told when to invalidate Stale reads and a hidden coupling to the loader
Write-through App writes source and cache together in the write path Read-after-write must see the new value immediately Write ordering and failure handling so cache and source agree Cache and source diverge on a partial write
Write-behind (write-back) App writes cache, an async worker flushes to the source Absorbing write bursts the source cannot take synchronously Durable queue, retry, and source-of-truth recovery Acknowledged writes lost on a node failure
Invalidation on write Source write deletes or versions the affected keys Bounding staleness for data you know just changed Cover every write path, including out-of-band ones A forgotten write path serves stale data
TTL bounding Every entry expires after a fixed or jittered interval Capping staleness when invalidation is missed Pick a TTL the business can tolerate as a worst case Forever-stale data after one missed invalidation
Session and distributed state Redis holds session or shared state for stateless app tiers Scaling app instances horizontally without sticky routing Persistence or graceful loss, and per-key expiry Logouts and lost state on a cache restart
Stampede control Single-flight, early recompute, or jitter on hot keys Many concurrent misses crushing the source at once Apply only where a key is hot enough to matter A thundering herd takes the source down on expiry

The named claim, the invalidate-and-bound rule, sits across two of these rows: invalidation on write and TTL bounding. Every other row is an optimization on top of a correct cache. If you adopt write-behind for throughput but forget the TTL, you have a faster way to serve stale data. If you add stampede control but never invalidate, you have efficiently coordinated the serving of old values. The obligations column is the part that does not get skipped, because skipping it is exactly where the production incident comes from.

The Azure services that realize the patterns

Redis is the engine; Azure is how you run it as a managed service so you are not patching nodes and wiring replication by hand. For most of this series’ lifetime the managed offering has been Azure Cache for Redis, sold in tiers, and the tier you choose is not a performance knob alone, it decides which patterns you can implement safely. The platform is also in transition, which matters for any design you start today, so it is worth stating the landscape plainly and then framing the durable behavior that survives the transition.

Azure Cache for Redis has shipped in Basic, Standard, Premium, and the Enterprise and Enterprise Flash tiers. Basic is a single node with no replication and no availability guarantee; it is a development convenience and has no business on a production read path, because a single node restart empties the cache and there is no replica to serve through it. Standard runs a primary and a replica with automatic failover and carries a connectivity service-level agreement, which is the floor for production caching. Premium adds clustering for horizontal scale, data persistence, virtual network injection, and more replicas, which is the tier where session stores and large datasets belong. The Enterprise and Enterprise Flash tiers run on Redis Enterprise software and add the Redis modules, active geo-replication, and the highest availability targets, with Enterprise Flash extending capacity onto flash storage at a lower price per gigabyte. Treat every specific size, replica count, and availability percentage as a value to confirm against the current official limits at the time you build, because Azure revises these regularly.

The transition matters. Microsoft has introduced Azure Managed Redis as the successor offering, built on Redis Enterprise, and has announced a retirement path for the older tiers. Verify the current timelines and feature parity against the official retirement guidance before committing a new system, because dates and preview-versus-GA status move. The reason this does not change a single pattern in this article is the point worth holding onto: cache-aside, invalidation, time-to-live, session storage, persistence, and stampede avoidance are properties of how you use Redis, not of which Azure SKU sells it to you. A design that is correct on Azure Cache for Redis Premium is correct on Azure Managed Redis, because the correctness lives in your application’s read and write paths. Build to the patterns, choose the current managed offering, and the design ports.

Which Redis tier and persistence do I need?

Choose by obligation, not by size. If you only cache idempotent reads and can tolerate a cold start, a replicated production tier without persistence is enough. If Redis holds session state or counters you cannot lose, pick a tier that offers persistence and enable it. Confirm tier capabilities against current Azure documentation.

Persistence is the feature that turns Redis from a disposable speed layer into a store you can lose less from. Azure offers two persistence formats. RDB persistence takes point-in-time snapshots of the dataset on a configurable schedule and writes them to durable storage, so after a catastrophic loss the cache reconstructs from the most recent snapshot; the exposure is everything written since that snapshot. Append-only file persistence logs write operations and flushes the log roughly once per second, so the exposure on recovery is bounded to about the last second of writes rather than to the snapshot interval. The trade-off is the usual one: AOF gives a tighter recovery point at a throughput and storage cost, RDB gives cheaper, lighter snapshots with a larger possible data loss window. Note that an every-write AOF mode was retired in favor of the once-per-second flush because the synchronous variant degraded performance too far to recommend; design around the once-per-second guarantee and confirm the current options before relying on a specific recovery point. For a pure cache-aside layer over a durable database, persistence is often unnecessary, because the database is the source of truth and a cold cache simply re-warms through misses. For a session store or a counter that has no other home, persistence is the difference between a node restart that is invisible and one that logs every user out.

A reference design walked through

Concrete beats abstract, so here is cache-aside end to end against a realistic example: a product catalog where reads vastly outnumber writes and a price or description changes a few times a day. The source of truth is a database, and Redis sits in front of it. The pattern is identical whether the database is relational or a document store; if your source is a document database, the engineering depth on partitioning and request units is covered in the Cosmos DB material at /2022/02/14/azure-cosmos-db-engineering-guide/, and the reasoning about which property to key on transfers directly to how you shape cache keys.

The read path is a check, a branch, and a populate. Expressed in C# against the common StackExchange.Redis client, it reads as follows.

public async Task<Product> GetProductAsync(string productId)
{
    string key = $"product:{productId}";

    // 1. Try the cache first.
    RedisValue cached = await _db.StringGetAsync(key);
    if (cached.HasValue)
    {
        return JsonSerializer.Deserialize<Product>(cached!);
    }

    // 2. Miss: load from the source of truth.
    Product product = await _repository.LoadProductAsync(productId);
    if (product is null)
    {
        return null; // see the negative-caching note below
    }

    // 3. Populate the cache with a bounded TTL, then return.
    string payload = JsonSerializer.Serialize(product);
    await _db.StringSetAsync(key, payload, TimeSpan.FromMinutes(10));
    return product;
}

Three things in that snippet carry the whole pattern. The key is namespaced as product:{id} so that keys for different entity types never collide and so you can reason about, and selectively clear, a whole class of keys. The populate call sets a time-to-live of ten minutes, which is the bound half of the invalidate-and-bound rule expressed in one argument; pick the value from how stale the business can tolerate this data being in the worst case, not from a default you copied. And the miss path reads from the repository, the system of record, which is the only component allowed to be authoritative.

The write path is the half that teams skip, and it is two lines longer than they expect.

public async Task UpdateProductAsync(Product product)
{
    // 1. Write to the source of truth first.
    await _repository.SaveProductAsync(product);

    // 2. Invalidate the cached copy. Delete, do not overwrite.
    string key = $"product:{product.Id}";
    await _db.KeyDeleteAsync(key);
}

Order matters here. Write the source first, then delete the cache key. If you delete the cache first and then write the source, a concurrent read that misses can repopulate the cache from the old database value in the window before your write lands, and you are stale again with no event left to fix it. Writing the source first and deleting second means that any read which repopulates after your delete reads the new value. There is still a narrow race, covered in the consistency section, but source-first-then-delete is the ordering that minimizes it and is the one to default to.

A confirming check belongs in your toolkit so you can prove the layer behaves. From the Azure CLI you can exercise the cache directly to watch a populate and an invalidation, and to inspect the time-to-live the application set.

# Connect to the cache with redis-cli over TLS (substitute your host and key).
redis-cli -h mycache.redis.cache.windows.net -p 6380 --tls -a "$REDIS_KEY"

# After a read populates the key, inspect it and its remaining TTL.
> GET product:42
"{\"Id\":\"42\",\"Name\":\"...\",\"Price\":19.99}"
> TTL product:42
(integer) 587          # seconds remaining of the 10-minute bound

# Simulate an invalidation and confirm the next read will miss.
> DEL product:42
(integer) 1
> EXISTS product:42
(integer) 0

That TTL command is the single most useful diagnostic in a cache incident. If a key you expected to expire returns -1, it was set without an expiry, and you have found an unbounded entry, the second half of the rule. If a key returns a value you know is stale and has a positive TTL, you are inside the staleness window and the question is whether the write path that should have deleted it ran at all. Two commands, and the invalidate-and-bound rule has told you which branch of the bug you are in.

Should I cache a negative result?

Yes, with care. If a lookup for a missing record hits the database on every request, a hot 404 can stampede the source as effectively as a hot hit. Cache the absence with a short time-to-live, shorter than a normal entry, so a record created shortly after is not masked. Negative caching is a real lever and a real footgun.

The negative-caching decision is a good illustration of why caching is a pattern and not a switch. Caching the absence of a record protects the source from a flood of misses for keys that do not exist, which is a common and sometimes adversarial load. But a negative entry that lives as long as a positive one means a record created at 12:01 stays invisible until 12:11, which surprises users and support staff alike. The resolution is a deliberately shorter time-to-live on negative entries, often a fraction of the positive one, so you get the protection against the miss flood while keeping the window in which a newly created record is masked down to seconds. Make the choice explicitly, document the two different expiries, and you have turned a footgun into a tool.

Redis as a session and distributed-state store

The second major pattern Redis serves is not caching at all in the strict sense; it is shared state. When you scale an application tier horizontally, the instances are interchangeable, and any request can land on any instance. State that used to live in a single process, a logged-in user’s session, a shopping cart, a rate-limit counter, now has nowhere to live unless it lives outside the process. The two ways to solve this are sticky sessions, where a load balancer pins a user to one instance, and a shared store, where every instance reads and writes the same external state. Sticky sessions are fragile: lose the instance and you lose every session pinned to it, and you cannot rebalance load freely. A shared session store in Redis is the pattern that lets the application tier stay genuinely stateless, which is what makes the horizontal scaling described in the /2022/01/10/azure-app-service-deep-dive/ work without surprising users when an instance recycles.

The mechanics are straightforward and the obligations are specific. Each session is a key, usually a session identifier, holding a serialized blob of session data, with a time-to-live that matches the session timeout so abandoned sessions clean themselves up. A sliding expiration, where each request extends the time-to-live, keeps active users logged in while idle sessions expire on schedule. The first obligation is durability: a session store is not a disposable cache, because losing it logs everyone out, so this is exactly the workload that justifies persistence and a production tier with replication and failover. The second obligation is the expiry discipline that is identical to the caching case, because a session that never expires is a memory leak and a security exposure. The third is serialization stability: if you change the shape of the session object and an old blob is still in the store, your deserialization must tolerate it or you will throw on read for every user with a live session at deploy time.

How do I use Redis as a session store?

Store each session under its own key, serialize the session data as the value, and set a time-to-live equal to your session timeout, refreshing it on each request for a sliding window. Use a production tier with replication, enable persistence because losing the store logs users out, and version your serialized format so a deploy does not break live sessions.

The distinction between a session store and a cache is worth holding clearly, because the obligations diverge. A cache holds a copy of data that exists elsewhere; lose it and you re-fetch, paying latency but no correctness. A session store often holds the only copy of data; lose it and the data is gone. That single difference, whether the source of truth lives elsewhere, decides almost everything downstream: whether you need persistence, whether you can tolerate a cold start, how seriously you treat failover, and how much you should worry about the eviction policy reclaiming a key under memory pressure. Many outages trace to a team treating a session store like a cache, leaving persistence off and an aggressive eviction policy on, and then being surprised when a memory spike evicts live sessions. State the source-of-truth question first and the rest of the configuration follows.

Invalidation and TTL: the two disciplines

The invalidate-and-bound rule names two obligations, and they fail in different ways, so they deserve separate treatment. Invalidation is the active discipline: you know data changed, so you remove the stale copy now. The time-to-live is the passive discipline: you cannot guarantee you caught every change, so you cap the worst case. Engineers who internalize only one of these build caches that fail predictably, and the failure mode tells you which discipline they dropped.

How do I handle cache invalidation and TTL?

Invalidate on every write by deleting the affected keys, including out-of-band writes from jobs, triggers, and other services. Set a time-to-live on every entry as a backstop, sized to the staleness the business tolerates in the worst case. Invalidation gives freshness; the TTL caps the damage when an invalidation is missed. Use both, never one.

Invalidation is harder than it looks because the hard part is coverage, not mechanism. Deleting a key is one line. Knowing every place a value can change is the design problem. The obvious write path, your own update endpoint, is easy to instrument. The dangerous paths are the ones that do not run through your cache-aware code: a nightly batch job that adjusts prices directly in the database, a database trigger that cascades a change, a second service that owns part of the same data, an administrator running a manual correction. Each of those changes the source of truth without deleting the cached copy, and each produces the canonical incident, a value that is stale and stays stale until the time-to-live rescues it. The discipline is to inventory every writer to a cached entity and to ensure each one either invalidates or routes through code that does. Where a writer cannot be made cache-aware, the time-to-live is your only defense, which is one more reason it is not optional.

There is a spectrum of invalidation granularity worth choosing deliberately. Key-level invalidation deletes exactly the entry that changed, which is precise and cheap and is the default for entity reads. Sometimes a single write invalidates many derived entries: updating a product might invalidate a category listing, a search result page, and a homepage block that all embed it. Tracking those dependencies by hand is error-prone, so teams reach for tag-based or versioned invalidation. A versioned scheme embeds a version number in the key, product:42:v7, and bumps the version on write so old keys are orphaned and expire on their own rather than being hunted down; the cost is that orphaned entries occupy memory until they expire, which is fine when memory is ample and the time-to-live is modest. Choose key-level invalidation when the mapping from write to affected keys is simple and known, and reach for versioning when a single write fans out to many derived views you would otherwise have to enumerate.

The time-to-live decision is a business decision wearing an engineering costume. The question is not “what is a good default,” it is “how stale can this specific data be in the worst case before someone is harmed.” A product description can be ten minutes stale with no consequence. A price might tolerate a minute. An inventory count that gates whether you can sell an item might tolerate only seconds, or might not be a caching candidate at all. A feature flag that controls a kill switch should arguably bypass the cache or carry a very short bound, because the whole point of a kill switch is immediacy. Setting one time-to-live for the entire application is the mistake; the right design assigns expiries by data class, derived from the tolerance of each class, and writes that reasoning down so the next engineer does not undo it. The time-to-live is the maximum staleness you are promising your users, expressed in a single argument to a SET call, and it deserves the thought a promise to users deserves.

A subtle interaction between the two disciplines is worth flagging: a very long time-to-live makes invalidation coverage matter more, and a very short one makes it matter less but raises load on the source. If your expiry is one minute, a missed invalidation is at most a one-minute bug, and you might reasonably run with imperfect coverage. If your expiry is a day, every gap in invalidation coverage is a day-long stale-data bug, so you must be near-perfect on coverage. There is a design tension here with the stampede problem covered next: short expiries reduce staleness risk but increase miss rate and the chance of a herd hitting the source. The resolution is to size the time-to-live to the staleness tolerance first, then apply stampede control to the hot keys whose short expiries would otherwise hurt, rather than lengthening expiries past the tolerance to dodge the herd.

The cache stampede and how to avoid it

A cache stampede, also called a thundering herd or a dogpile, is the failure mode that turns a cache from a shield into a liability at the worst possible moment. It happens like this. A single key is hot, requested hundreds or thousands of times a second, and it is served entirely from cache, so the source sees almost no load for it. Then the key expires, or the cache node restarts and the key is gone. Now every one of those concurrent requests misses at the same instant, and every one of them independently decides to load the value from the source. The source, which was comfortably handling near-zero traffic for this key, is suddenly hit by the full concurrent request rate it was being shielded from. If recomputing the value is expensive, a database query, a downstream call, a heavy aggregation, the source buckles, latency spikes, timeouts cascade, and in the worst case the recompute itself fails under the load so the cache never repopulates and the herd never subsides. The cache made the system fragile precisely because it was doing its job well: the better the cache shields the source, the larger the herd when the shield drops.

How do I avoid a cache stampede?

Stop many simultaneous misses from all recomputing the same value. The main techniques are single-flight locking, where the first miss takes a short lock and recomputes while the others wait; early recomputation, where a value refreshes shortly before it expires; and jittered expiries, so keys do not all expire together. Apply these only to hot keys.

The first and most common technique is the single-flight lock, sometimes called request coalescing or mutex-based recomputation. When a request misses on a hot key, it attempts to acquire a short-lived lock for that key, commonly with a SET key value NX PX milliseconds that succeeds for exactly one requester. The winner recomputes the value from the source and populates the cache; the losers either wait briefly and re-read the cache, which now hits, or serve the previous stale value if one is available. The effect is that the source sees one recompute instead of ten thousand, and the herd is collapsed into a single request plus a brief wait for everyone else. The lock must have its own short expiry so that a crash by the lock holder does not deadlock every other requester forever, and the wait-and-retry must be bounded so a stuck recompute degrades gracefully rather than hanging the whole tier. This is the technique to reach for first because it directly attacks the mechanism: many misses becoming many recomputes.

The second technique is early recomputation, often implemented probabilistically. Rather than waiting for a key to expire and then having the herd discover the expiry simultaneously, each reader, as the key approaches its expiry, has a small and rising probability of volunteering to refresh the value early while continuing to serve the still-valid cached copy. Because the volunteering is randomized and happens before expiry, one reader typically refreshes the value before it ever goes stale, and the key’s effective expiry is reset without any reader ever experiencing a miss. The well-known formulation multiplies a random factor by how expensive the recompute is and how close the entry is to expiry, so expensive-to-recompute keys refresh earlier and cheaper ones later. The appeal is that no reader ever waits and the source sees a steady trickle of refreshes rather than a periodic spike. The cost is added complexity in the read path and the need to track each entry’s compute cost and remaining lifetime.

The third technique is the cheapest to apply and is often enough on its own: jitter the expiries. When a system warms its cache in a burst, at startup, after a deploy, or after a flush, every key is written at nearly the same moment, and if they all carry the same time-to-live they all expire at the same moment, producing a synchronized herd across many keys at once. Adding a random spread to each expiry, say a base of ten minutes plus or minus a random ninety seconds, desynchronizes the expiries so misses are spread over a window instead of arriving in a spike. Jitter does nothing for a single hot key expiring on its own, which is why it pairs with single-flight rather than replacing it, but it is one line of code and it eliminates the entire class of synchronized-expiry stampedes that startup and deploy create. The discipline is to never set a fixed time-to-live across a population of keys that filled together; always add jitter.

A fourth approach, serving stale while revalidating, deserves a mention because it changes the user experience favorably. Instead of treating an expired entry as absent, you treat it as stale-but-usable for a short grace period: a reader that finds an expired entry returns it immediately and triggers a background refresh, so the user never waits and the source sees one refresh. This is the same idea as early recomputation viewed from the other side of expiry, and it composes with single-flight, the background refresh taking the lock. The trade-off is that you have explicitly chosen to serve slightly stale data to avoid latency, which is fine for a product description and wrong for a price or a permission check. As always, the staleness tolerance of the data class decides whether the technique is allowed.

The meta-point about stampedes is that they are a property of hot keys, not of the cache as a whole, and the engineering effort should follow the heat. The vast majority of keys are requested infrequently enough that a miss costs one cheap source read and no one notices. A small number of keys carry most of the traffic, and those are the ones where a stampede control earns its complexity. Identifying the hot keys, through the monitoring covered in the /2024/06/10/azure-well-architected-framework/ performance pillar, lets you apply single-flight and early recomputation surgically rather than burdening every read path with machinery it does not need. Premature stampede control on cold keys is as much a mistake as missing it on hot ones; measure first, then protect what the measurement shows is hot.

Consistency between the cache and the source of truth

Every cache introduces a second copy of the truth, and two copies can disagree. The honest framing is that a cache-aside layer is eventually consistent by construction: there is a window, bounded by your invalidation latency and your time-to-live, in which a reader can see a value that no longer matches the source. The engineering question is not how to eliminate that window, which is impossible without giving up the cache, but how to size it to what the data class tolerates and how to avoid the races that widen it unexpectedly.

The race that surprises teams is the concurrent read-miss against write, and it is worth tracing once in full so you recognize it. Reader R misses on a key and reads value V1 from the database. Before R writes V1 into the cache, writer W updates the database to V2 and deletes the cache key, which is already empty. Now R completes and writes the stale V1 into the cache. The cache holds V1, the database holds V2, no further event is pending, and only the time-to-live will eventually correct it. This is the interleaving that delete-on-write and source-first ordering minimize but do not fully eliminate, because R’s read and W’s write can still straddle each other in exactly this order. The practical defenses are a modest time-to-live so the window self-heals, and, where the data genuinely cannot tolerate it, a stronger pattern than cache-aside.

The stronger patterns are the write strategies, and choosing among them is a consistency-versus-cost decision. Write-through has the application write the source and the cache together in the write path, so a read immediately after a write sees the new value with no window at all. The cost is write latency, because every write now pays for both stores, and complexity in handling a partial failure where one store succeeds and the other does not. Write-through suits data that is read immediately after it is written and cannot tolerate even a brief staleness, at the price of slower writes. Write-behind, also called write-back, inverts the latency trade: the application writes only to the cache and acknowledges immediately, while an asynchronous worker flushes the change to the source. This absorbs write bursts the source could not take synchronously and gives the fastest possible writes, but it moves the durability risk into the cache, because an acknowledged write that has not yet flushed is lost if the cache node fails before the flush. Write-behind is the pattern for high-throughput, loss-tolerant writes such as metrics or activity counters, and it demands a durable queue and a recovery story before it touches anything you cannot afford to lose.

For most systems, the right answer is cache-aside with invalidation for the read-heavy data, a short time-to-live to bound the eventual-consistency window, and a deliberate exception for the small set of data that cannot tolerate any staleness, which either uses write-through or bypasses the cache entirely. The mistake is to pick one strategy for the whole application. Consistency requirements vary by data class, and the design that survives a review states, for each class, how stale it may be and which strategy that tolerance implies. A catalog description is cache-aside with a ten-minute bound. A user’s account balance is either write-through or uncached. A session is a store, not a cache, and gets persistence. Naming the tolerance per class is the discipline; one global setting is the smell.

There is also the matter of what happens under memory pressure, because eviction is a silent form of inconsistency. Redis runs in memory, and when it fills, the eviction policy decides what to drop. For a pure cache, an eviction policy that drops the least-recently-used keys is correct and benign: an evicted key simply misses and repopulates. For a session store or any key that is the only copy of its data, eviction is data loss, and the policy must be set so that keys without an expiry are never evicted, or those keys must live in a separate instance from the disposable cache. Mixing disposable cache entries and irreplaceable state in one instance with one eviction policy is a recurring source of incidents, because the policy that is right for one is catastrophic for the other. Separate the two concerns, by instance or by policy, and the eviction behavior stops being a consistency hazard.

Designing the keyspace and key lifetime

The key scheme is the part of a Redis design that feels arbitrary on day one and turns out to govern almost everything later, so it is worth deciding deliberately rather than letting it grow by accident. A key is a string, and Redis imposes almost no structure on it, which means the structure is yours to impose and yours to regret if you skip it. The convention that scales is a namespaced, colon-delimited scheme that encodes the entity type, the identifier, and any variant that distinguishes one stored representation from another, so that product:42, user:1001:profile, and session:abc123 are immediately legible and never collide across entity types. A flat scheme where identifiers from different domains share a namespace is the source of the bug where clearing one class of entries accidentally removes another, and it is trivially avoided by prefixing.

The variant dimension matters more than teams expect. The same logical record often has several stored representations: a product serialized for an API response, the same product rendered into an HTML fragment, the same product filtered for a specific locale or currency. If all three live under product:42, one of them overwrites the others and you serve the wrong shape. Encoding the variant, product:42:api, product:42:html:en-us, keeps representations distinct and, just as importantly, makes invalidation tractable, because a write to the product can clear every representation under a known prefix rather than leaving orphaned variants behind. The discipline is to ask, for every value you store, what distinguishes this representation from another representation of the same record, and to put that answer in the key.

Key lifetime is the other half of the keyspace design, and it is where the time-to-live decision becomes concrete. Every populate sets an expiry, and the value of that expiry is, as established, a per-data-class business decision. What the keyspace design adds is the recognition that different key prefixes deserve different default lifetimes, and that encoding the lifetime policy alongside the prefix, even just in documentation and a small mapping in code, prevents the slow drift toward a single global expiry that erases the per-class reasoning. A products prefix carries a ten-minute default, a session prefix carries the session timeout, a negative-result prefix carries a deliberately short lifetime, and a configuration prefix might carry a longer one because configuration changes are rare and invalidated explicitly. Centralizing the populate logic so that the expiry is chosen from the prefix rather than passed ad hoc at each call site is the single most effective guard against the unbounded-entry bug, because it makes it structurally impossible to write a key without the lifetime its class demands.

Jitter belongs in the populate logic too, for the reasons the stampede section laid out. Rather than setting a flat ten-minute expiry on every entry in a prefix, the populate adds a small random spread, so a population of entries that fills in a burst does not expire in a synchronized wave. Implementing jitter once, in the centralized populate, means every key inherits it automatically and no individual call site has to remember. The combination, a namespaced prefix, a per-prefix default lifetime, and built-in jitter, turns the keyspace from a loose collection of strings into a small system with predictable behavior, and it is the foundation that makes invalidation coverage, stampede control, and clustering all tractable rather than each becoming its own retrofitting project.

How should I name and structure my cache keys?

Use a namespaced, colon-delimited scheme that encodes the entity type, the identifier, and any variant that distinguishes one stored representation from another, such as product:42:api or user:1001:profile. Namespacing prevents collisions across entity types, makes prefix-based invalidation possible, and, if you plan to cluster, lets you co-locate keys that must be touched together using hash tags.

One more keyspace concern surfaces only at scale, and it is worth planning for early because retrofitting it is painful: co-location for clustered deployments. When the keyspace shards across nodes, an operation that must touch several keys atomically only works if those keys hash to the same shard, which Redis controls through hash tags, the portion of a key enclosed in braces. If you know in advance that a set of keys will be read or modified together, embedding a shared hash tag in their names, {user:1001}:profile and {user:1001}:settings, forces them onto the same shard so multi-key operations remain possible after clustering. You do not need clustering on day one to benefit from this; you need only to anticipate which keys travel together and tag them now, so that the eventual move to a sharded topology is a configuration change rather than a keyspace rewrite. Designing the keyspace for the cluster you might one day run is far cheaper than reshaping a live keyspace under load.

Warming, eviction, and connection management

Three operational concerns sit between a correct caching design and one that survives production, and each has a failure mode that surprises teams who treated the design as finished once the read and write paths were right. The first is warming, the second is eviction, and the third is connection management, and all three are the kind of detail that does not appear in a tutorial yet shows up in the first incident.

Warming is the question of what the store holds when it starts empty, which it does after every deploy that replaces the instance, every node restart, and every flush. An empty store means every read is a miss until the keyspace refills, and for a system that depends on a high hit rate to shield its source, a cold start can mean the source briefly receives the full unshielded read load, which is the same hazard as a stampede but across the whole keyspace at once rather than a single key. The defenses are persistence, which lets a restart reload the prior contents rather than starting blank, and an explicit pre-warm step that populates the hottest keys before the instance begins serving traffic, so the first real requests hit a warm store. Persistence suits a store that holds the only copy of its data anyway; pre-warming suits a pure read cache where persistence is unnecessary but a cold start is still costly. The choice between them follows the same source-of-truth question that governs the rest of the design: if the data lives elsewhere, pre-warm; if it does not, persist. Either way, the cold-start behavior is something to design and to test, by restarting the instance in a non-production environment and watching whether the source absorbs the warm-up load gracefully.

Eviction is what happens when the instance fills its memory and must make room, and it is a silent correctness hazard for exactly the data that cannot tolerate it. Redis runs a configurable policy that decides which keys to drop under memory pressure, and the policies fall into two families: those that evict based on recency or frequency of use, suitable for disposable entries that can repopulate harmlessly, and those that refuse to evict keys lacking an expiry, which protect data that is the only copy of itself. The hazard appears when a single instance holds both disposable entries and irreplaceable state under one policy, because the policy that correctly drops a stale read entry will, under the same memory pressure, drop a live session or a counter that has no other home, producing the random-logout and lost-state incidents. The resolutions are the ones the consistency section named: separate the disposable from the authoritative by instance, or choose a policy that spares keys without an expiry and ensure the authoritative keys are configured to be spared. Sizing memory with headroom above the working set so eviction rarely fires at all is the complementary defense, and monitoring the evicted-keys counter is how you learn that you are running closer to the ceiling than you intended before it becomes an outage.

Connection management is the least glamorous of the three and a common cause of mysterious latency and connection-exhaustion errors. Redis connections are not free, and the client library that talks to Azure Cache for Redis is built around a connection multiplexer that is designed to be created once and shared across the whole application rather than opened per request. The recurring mistake is to instantiate a new connection for every operation, which under load exhausts the available connections, spikes latency as the client waits to connect, and produces timeout errors that look like a cache fault but are a client misuse. The pattern is to create the multiplexer once at startup, hold it as a singleton, and reuse it for every operation, letting it pipeline requests over the shared connection. A short command timeout on each operation, paired with the fail-open fallback the resilience section described, keeps a transient connection problem from blocking the request indefinitely. Connections also carry the security configuration, the TLS that protects data in transit and the authentication that gates access, both of which should be on for any production instance and verified against current Azure guidance, because a cache reachable without TLS or without authentication is an exposure regardless of how correct the caching logic above it is. Getting connection management right is what separates a design that performs in a tutorial from one that holds up under the concurrency of production traffic.

When caching fits and when it is overkill

Caching is not free, and the cost is not the Azure bill, it is the correctness obligation you take on. Every cached entity is now a thing that can be stale, a write path that must invalidate, an expiry that must be sized, and a hot key that might stampede. Before adopting the pattern, the question to answer is whether the read pressure justifies that obligation, because for some workloads it does not.

Caching fits cleanly when reads vastly outnumber writes on the same data, when the source is expensive to query relative to a cache read, and when the data tolerates at least some staleness. A product catalog, a user profile, a configuration blob, a rendered fragment, a reference dataset that changes rarely: these are textbook fits, because the same value is read thousands of times between writes, the database read it replaces is comparatively costly, and a few minutes of staleness harms no one. In these cases cache-aside with the invalidate-and-bound rule turns a database bottleneck into a memory read and the obligation is well spent.

Caching is overkill, or actively harmful, in several recognizable situations. Data that is written as often as it is read gains little, because the cache is invalidated almost as fast as it fills, and you have added a layer and an obligation for a hit rate that does not justify them. Data that cannot tolerate any staleness, a balance that gates a transaction, a permission that gates access, a kill switch, should either bypass the cache or use write-through, and reaching for cache-aside there imports a staleness window where none is acceptable. Data that is cheap to compute or fetch gains little from caching and adds an obligation for a savings that does not exist; caching a value that takes a microsecond to produce is pure overhead. And a system whose real problem is an inefficient query or a missing database index is one where caching papers over the root cause: the cache hides the slow query behind a hit rate, the slow query is still there on every miss and every cold start, and the team has spent its effort on a layer instead of on the index that would have fixed the source. Before caching, confirm the source is as fast as it reasonably can be, because caching a fixable slowness is a decision you will regret at the next cold start.

The decision rule is therefore a short sequence. Confirm the read-to-write ratio is high on the data in question. Confirm the source read you are replacing is genuinely expensive. Confirm the data tolerates the staleness a cache implies. Confirm the source itself is not the fixable problem. If all four hold, cache-aside with the invalidate-and-bound rule is the right pattern and the obligation is well spent. If any fails, the honest move is to fix the source, choose write-through, or leave the data uncached, rather than to adopt the pattern by reflex because caching is what one does when something is slow.

Estimating the memory budget and the hit rate

Two numbers decide whether a caching layer is doing its job, and neither is the throughput figure that marketing leads with. The first is the hit rate, the fraction of reads served from the in-memory layer rather than the source. The second is the memory headroom, how much of the working set the instance can hold before eviction starts dropping entries that should still be live. Both are measurable, both are tunable, and reasoning about them turns sizing from a guess into an estimate you can defend.

Start with the working set, the set of keys actually requested within a window. A read-heavy system rarely requests its entire dataset uniformly; a small fraction of records absorbs the large majority of reads, which is precisely why caching works at all. The memory you need is not the size of the whole dataset but the size of that hot working set plus headroom, because holding the rarely-read tail in memory buys almost nothing while consuming space the hot keys need. Estimate the working set by sampling the distinct keys requested over a representative window and multiplying by the average serialized value size, then add headroom so that normal growth and the occasional burst do not push the instance into eviction. Sizing to the working set rather than the whole dataset is usually the difference between a modest instance that serves a high hit rate and an oversized one that wastes money holding cold data.

The hit rate is the number that tells you whether the sizing is right and whether the layer is earning its keep. A high hit rate means most reads avoid the source, which is the whole point; a falling hit rate is an early warning that something changed, an expiry set too short so entries leave before they are re-read, an eviction policy firing because the working set outgrew the memory, or a key scheme so fragmented that logically identical reads land on different keys and never hit. The diagnostic value is in the trend: a hit rate that was high and is now sliding points at one of those three causes, and the fix follows from which. The interaction with the time-to-live is direct and worth holding onto, because it captures the central tension of the whole design. A longer expiry raises the hit rate by keeping entries around longer, but it widens the staleness window and raises the stakes on invalidation coverage. A shorter expiry tightens freshness but lowers the hit rate and raises miss-driven load on the source. The right expiry is the one that satisfies the staleness tolerance first; the hit rate is then whatever that tolerance allows, and if the resulting hit rate is too low to justify the layer, the honest conclusion may be that this data class was not a good caching candidate after all.

There is a useful sanity check hiding in these two numbers together. If you know the read rate, the hit rate, and the cost of a source read, you can estimate the source load the layer is shielding and the load it would face on a cold start or a stampede, which is exactly the figure that tells you whether your source has the headroom to survive a warm-up. A layer running at a very high hit rate is shielding the source from nearly all of its read load, which is reassuring until you realize it also means the source has never been sized for that load and a cold start will expose it. Sizing the source to survive the unshielded load, or ensuring pre-warming and stampede control prevent the source from ever seeing it, is a decision that falls out naturally once you have the hit rate and the read rate in front of you.

How to evolve the caching layer

A caching layer that is correct on day one drifts over time as the system grows, and the evolutions are predictable enough to plan for. The first is scale: a single Redis instance has a memory ceiling and a throughput ceiling, and a growing dataset or request rate eventually approaches both. The clustered tiers shard the keyspace across multiple nodes so capacity and throughput scale horizontally, and the migration is mostly transparent to cache-aside code because the client routes each key to its shard, but it does change a few things you must account for, chiefly that multi-key operations and transactions only work within a single shard, so a design that assumed it could atomically touch several keys must use hash tags to co-locate them or be redesigned. Plan the key scheme with eventual clustering in mind even on a single node, because retrofitting hash tags onto a live keyspace is painful.

The second evolution is observability, and it is the one most teams add too late. Early on you cannot answer the questions that matter: what is the hit rate, which keys are hot, what is the memory headroom, how often is eviction firing, what is the p99 latency of a cache read. Without those, you tune by guess and you discover stampedes only when they cause an outage. Wiring the cache metrics into your monitoring, the hit and miss counts, the evicted-keys counter, the memory usage, the connected-client count, and the server load, turns the cache from a black box into an instrument you can reason about, and it is the prerequisite for every other evolution because you cannot size a time-to-live, identify a hot key, or justify a tier change without it. The deeper treatment of these levers and their measured effects lives in the /2024/10/14/azure-redis-performance-guide/, which is where to go when the layer is correct and the next question is how fast and how cheap it can be made.

The third evolution is resilience: deciding what the application does when the cache itself is unavailable. A cache-aside layer should fail open, meaning that if Redis is unreachable the application reads from the source directly and continues serving, slower but correct, rather than failing the request because the cache is down. This sounds obvious and is frequently gotten wrong, because a naive implementation treats a Redis connection error as a fatal error and propagates it to the user, turning a cache outage into an application outage and inverting the entire purpose of the cache. The pattern is a short timeout on the cache call, a catch that falls through to the source on any cache error, and a circuit breaker so a sustained cache outage does not have every request paying the cache timeout before falling through. Failing open keeps the cache as the optimization it was meant to be rather than a new single point of failure, and it is the resilience property to verify before the layer goes to production, ideally by testing it: take the cache offline in a non-production environment and confirm the application slows rather than stops.

The fourth evolution is the platform migration that is now on the horizon for many teams: moving from the older Azure Cache for Redis tiers to Azure Managed Redis. Because the patterns are platform-agnostic, the application code that implements cache-aside, invalidation, and stampede control should not need to change; what changes is the connection configuration, the tier mapping, and possibly the persistence and clustering setup. Treat it as an infrastructure migration with a parallel-run and cutover plan rather than a rewrite, verify feature parity for anything you depend on against the current official guidance, and confirm the retirement timelines so you migrate on your schedule rather than under a deadline. The correctness you built into the read and write paths is exactly what makes the migration low-risk.

The recurring incidents and the fix each one needs

The patterns above become memorable when you attach them to the incidents engineers actually report, because every one of those incidents is the invalidate-and-bound rule failing in a specific way. Walking the common cases with their confirming check and their fix is the fastest way to make the rule operational, and it gives you a triage sequence the next time a cache-shaped bug appears.

The most common incident is stale data after a write that the team swears should have refreshed. A customer changed a setting, the database shows the new value, and the application keeps showing the old one for a while. The confirming check is two commands: read the key and inspect its time-to-live. If the key holds the old value with a positive expiry, the write path did not invalidate it, and the fix is to find the writer that changed the source without deleting the key. The usual culprit is an out-of-band writer, a job or a second service or an admin action, that never ran through cache-aware code. The fix is to make that writer invalidate, and the backstop, already in place if you followed the rule, is the time-to-live that limits how long the bug lasts while you find the gap.

The second incident is the value that is stale forever, never self-correcting no matter how long you wait. This is the unbounded-entry failure, and the confirming check is decisive: the key’s time-to-live returns negative one, meaning no expiry was ever set. The fix is to ensure every populate sets an expiry, and the prevention is to centralize the populate so no code path can write a key without a time-to-live. A single SET somewhere in the codebase that omits the expiry argument is enough to produce a key that outlives every reasonable staleness window, and because it only manifests when that particular key is written by that particular path, it is maddening to reproduce until you check the time-to-live and see the negative one.

The third incident is the periodic latency spike or outage that lines up with a cache event. The source database shows a burst of identical expensive queries at the moment a key expired or the cache restarted, and latency spikes until the burst clears. This is the stampede, and the confirming evidence is the correlation between a key’s expiry or a cache restart and the query burst on the source. The fix is single-flight on the hot key so one request recomputes while the rest wait, and the prevention for the deploy-time and restart-time variant is jittered expiries so a population of keys does not expire in lockstep. If the spike happens every time the cache node restarts rather than on a single key’s expiry, the issue is the whole cache going cold at once, and the fix combines persistence, so a restart reloads rather than starting empty, with a pre-warm step and jitter.

The fourth incident is users being logged out at random or losing state, which traces to treating a session store like a disposable cache. The confirming check is whether persistence is enabled and what the eviction policy is: if persistence is off, a node restart empties the store and logs everyone out, and if the eviction policy can evict keys without an expiry under memory pressure, a memory spike silently drops live sessions. The fix is to move session state to a tier with persistence enabled, set an eviction policy that never evicts the keys that are the only copy of their data, and ideally separate the session store from the disposable cache so the two never compete for memory under one policy.

The fifth incident is the read-after-write inconsistency where a user updates something and immediately sees the old value on the next page, then the correct value on a refresh. This is the concurrent read-miss-against-write race, or a write-through expectation applied to a cache-aside design. The confirming check is whether the read that showed the old value happened within the invalidation window and whether the populate and the delete could have interleaved. The fix depends on the tolerance: if a brief window is acceptable, the source-first-then-delete ordering and a short time-to-live are enough; if the data cannot tolerate even that window, the data class belongs on write-through or should bypass the cache, and the lesson is that this particular data was misclassified as cache-aside-tolerant when it was not.

Across all five, the diagnostic spine is the same: read the key, check the time-to-live, and ask which half of the invalidate-and-bound rule is failing. That is the payoff of having a named rule rather than a pile of techniques. The rule converts an intermittent, hard-to-reproduce, memory-dependent bug into a short sequence of commands that point at the cause.

The verdict

Caching with Azure Cache for Redis, or its successor Azure Managed Redis, is one of the highest-impact changes you can make to a read-heavy system, and it is also one of the easiest to get subtly wrong, because the read path that delivers the speed is the easy half and the write path that preserves the correctness is the half teams skip. The position this article defends is that caching is a design pattern with a correctness contract, not a pair of commands, and that the contract reduces to one rule worth memorizing: every write invalidates, every entry is bounded by a time-to-live. Build to that rule and the rest of the design follows. Cache-aside for read-heavy data, a write strategy chosen per data class rather than globally, a time-to-live sized to the staleness each class tolerates, single-flight and jitter on the keys hot enough to stampede, persistence and a production tier for anything that is a store rather than a cache, and fail-open resilience so a cache outage degrades performance rather than availability.

The deciding factor in whether a caching layer helps or harms is not the tier you buy or the throughput you can hit; it is whether you treated the cache as a copy that must be kept honest. A team that asks, for each cached data class, how stale it may be and which write paths can change it, builds a cache that speeds the system without ever serving a wrong answer. A team that drops in GET and SET and moves on builds a latent incident that surfaces weeks later as an irreproducible stale-data bug. The patterns, the tiers, and the techniques in this article all serve that single distinction. To implement and test cache-aside, watch an invalidation take effect, and reproduce a stampede so you can see single-flight defuse it, run the hands-on Azure labs and command library on VaultBook, where you can stand up a cache, drive load against a hot key, and confirm each pattern behaves the way the rule predicts.

Frequently Asked Questions

Q: What caching patterns work with Azure Cache for Redis?

The core patterns are cache-aside (also called lazy loading), where the application reads the cache, loads the source on a miss, and populates with an expiry; read-through, which centralizes that load logic in the client library; write-through, where writes update the source and the cache together for immediate read-after-write consistency; write-behind, where writes hit the cache and an async worker flushes to the source for high write throughput; using Redis as a session and distributed-state store; and stampede controls such as single-flight locking, early recomputation, and jittered expiries layered onto the hot keys. Most production systems combine several: cache-aside for the bulk of reads, a per-class write strategy, a time-to-live on every entry, and stampede protection only where a key is hot enough to need it. The pattern set is the same on Azure Managed Redis, because the patterns are properties of how the application uses Redis rather than of the managed offering selling it.

Q: How does the cache-aside pattern handle a write to the underlying data?

On a write, cache-aside writes the new value to the source of truth first and then deletes the affected cache key, rather than updating the key in place. The next read for that key misses and repopulates from the now-current source. Deleting instead of overwriting is deliberate: a delete is idempotent and cannot resurrect a stale value regardless of how it interleaves with a concurrent read, whereas an in-place overwrite can race with a read-miss populate and leave an old value in the cache with no event left to correct it. Writing the source before deleting the key also matters, because deleting first and writing second opens a window in which a concurrent read repopulates the cache from the old source value. The cost of delete-on-write is one extra miss after each write, which is almost always cheaper than the class of races that overwriting invites.

Q: How do I choose a time-to-live for cached data?

Size the time-to-live from how stale the specific data class can be in the worst case before someone is harmed, not from a default. A product description might tolerate ten minutes, a price a minute, an inventory count seconds, and a kill-switch flag essentially none. Setting one global expiry for the whole application is the common mistake; assign expiries by data class and write the reasoning down. Remember the interaction with invalidation coverage: a long expiry makes every gap in invalidation a long-lived bug, so it demands near-perfect coverage, while a short expiry forgives imperfect coverage but raises miss rate and stampede risk on hot keys. The time-to-live is the maximum staleness you are promising users, expressed in one argument to a SET call, so treat it as a promise and not a leftover default. Add random jitter across keys that fill together so they do not all expire at once.

Q: Why does my cache keep serving old data after the database changed?

Two causes produce this, and the time-to-live tells them apart. If the key holds the old value with a positive remaining expiry, a write path changed the source without invalidating the cache key, and the culprit is usually an out-of-band writer, a batch job, a database trigger, a second service, or a manual correction, that never ran through cache-aware code; the fix is to make that writer delete the key. If the key returns a time-to-live of negative one, it was populated without an expiry and is unbounded, so it will serve the old value indefinitely; the fix is to ensure every populate sets an expiry, ideally by centralizing the populate so no path can omit it. The diagnostic is always the same two commands: read the key and check its time-to-live, then ask which half of the invalidate-and-bound rule failed.

A stampede happens when a hot key expires and every concurrent request misses at once and independently recomputes the value, flooding the source. The primary defense is a single-flight lock: the first miss acquires a short-lived lock for the key and recomputes while the other requests briefly wait and then re-read the now-populated cache, or serve the previous stale value, so the source sees one recompute instead of thousands. Give the lock its own short expiry so a crashed holder cannot deadlock everyone. Layer on early or probabilistic recomputation, where a reader refreshes the value shortly before expiry while still serving the valid copy, so the key never actually goes cold. And jitter expiries so keys that filled together do not expire in lockstep. Apply these only to keys hot enough to matter, identified by monitoring, because most keys are cold enough that a plain miss costs nothing.

Q: Do I need data persistence on Azure Cache for Redis?

It depends on whether Redis holds a copy or the only copy. For a pure cache-aside layer over a durable database, persistence is usually unnecessary, because the database is the source of truth and a node restart simply re-warms the cache through misses; you pay latency, not correctness. For a session store, counters, or any data that has no other home, persistence is the difference between an invisible restart and one that loses that data, so enable it and choose a tier that offers it. Azure provides RDB persistence, which snapshots the dataset on a schedule with a recovery point as old as the last snapshot, and append-only-file persistence, which logs writes and flushes roughly once per second for a much tighter recovery point at a higher cost. Confirm the current persistence options, tiers, and recovery behavior against official documentation before relying on a specific guarantee, because these change.

Q: What is the difference between write-through and write-behind caching?

Write-through writes to the source of truth and the cache together within the request, so a read immediately after a write sees the new value with no staleness window; the cost is higher write latency, because each write pays for both stores, and added complexity handling a partial failure where one store succeeds. Write-behind, or write-back, writes only to the cache and acknowledges immediately while an asynchronous worker flushes the change to the source later; this gives the fastest writes and absorbs bursts the source could not take synchronously, but it moves durability risk into the cache, because an acknowledged write that has not yet flushed is lost if the cache node fails first. Use write-through for data read right after it is written that cannot tolerate staleness, and write-behind for high-throughput, loss-tolerant writes such as metrics or activity counters, backed by a durable queue and a recovery plan.

Q: Should I cache the result of a query that returns nothing?

Often yes, because a lookup for a missing record can hit the source on every request and a hot or adversarial stream of misses can stampede the database as effectively as a hot hit. Caching the absence with a short time-to-live protects the source from that flood. The catch is masking: a negative entry that lives as long as a positive one will hide a record that gets created shortly after the negative entry was cached, so a record created at 12:01 stays invisible until the negative entry expires. The resolution is a deliberately shorter time-to-live on negative entries, often a fraction of the positive expiry, so you get protection against the miss flood while keeping the window in which a new record is masked down to seconds. Decide it explicitly, document both expiries, and you have turned a footgun into a useful lever.

Q: How do I keep a session store from logging users out unexpectedly?

Stop treating it like a disposable cache. A session store often holds the only copy of its data, so three configurations matter. Enable persistence on a production tier so a node restart reloads sessions instead of starting empty, because without it a restart logs everyone out. Set an eviction policy that never evicts keys without an expiry, or keep sessions in a separate instance from the disposable cache, because an aggressive eviction policy that is correct for a cache will silently drop live sessions under memory pressure. And version the serialized session format so a deploy that changes the session object’s shape can still read blobs written by the previous version, rather than throwing on read for every user with a live session. Set each session’s time-to-live to the session timeout and refresh it on each request for a sliding window so active users stay logged in while idle sessions expire on schedule.

Q: Will adding a cache fix a slow database query?

Not really; it hides it. A cache serves the slow query’s result from memory on a hit, but the slow query still runs on every miss and on every cold start, so the underlying slowness is intact and will resurface the moment the cache is cold, evicted, or restarted, often as a stampede that is worse than the original slowness. Before caching, confirm the source is as fast as it reasonably can be: check for a missing index, an inefficient query plan, or an N-plus-one access pattern, because fixing those addresses the root cause while caching only masks it. Caching is the right tool when the source is already efficient and the problem is sheer read volume of the same data; it is the wrong tool when the real problem is a fixable query, where it spends your effort on a layer instead of on the index that would have solved it.

Q: What happens to my application if Azure Cache for Redis goes down?

It depends entirely on whether you designed the layer to fail open. A cache-aside layer should treat a cache outage as a reason to read from the source directly and continue serving, slower but correct, rather than as a fatal error. A naive implementation that propagates a Redis connection error to the user turns a cache outage into an application outage and inverts the cache’s whole purpose. The pattern is a short timeout on each cache call, a catch that falls through to the source on any cache error, and a circuit breaker so that during a sustained outage requests stop paying the cache timeout before falling through. Test this explicitly by taking the cache offline in a non-production environment and confirming the application slows rather than stops. Failing open keeps the cache the optimization it was meant to be instead of a new single point of failure.

Q: How does cache-aside differ from read-through caching?

Both fill the cache on a miss, but they differ in who owns the load logic. In cache-aside the application owns it: the application code checks the cache, and on a miss it explicitly reads the source and populates the cache. In read-through the cache library or provider owns it: the application asks the cache for the value, and the library transparently loads from the source on a miss and populates itself, so the application never sees the miss handling. Read-through centralizes the load logic, which reduces duplication and keeps every read path consistent, but it couples you to a library that supports the pattern and can hide the source access in a way that complicates debugging. Both still require an invalidation discipline on writes, because neither pattern knows when the source changed out of band. Cache-aside is the more common and more explicit choice on Azure Cache for Redis because it keeps the load and invalidation logic visible in application code.

Q: How do I invalidate many cached entries that all depend on one record?

When a single write fans out to many derived entries, a product update that affects a category page, a search result, and a homepage block, hand-tracking every dependent key is error-prone. Two approaches scale better than enumeration. Versioned keys embed a version number in the key, such as a per-entity or per-namespace version, and bump the version on write so all old keys are orphaned at once and expire on their own rather than being deleted individually; the cost is that orphaned entries occupy memory until their time-to-live elapses, which is acceptable when memory is ample and expiries are modest. Tag-based invalidation associates keys with tags and clears by tag, if your client or a maintained pattern supports it. Choose key-level deletion when the mapping from a write to its affected keys is simple and known, and reach for versioning when one write touches many derived views you would otherwise have to list out by hand.

Q: Is it safe to run a cache and a session store in the same Redis instance?

It is risky unless you separate the eviction concerns, because the two have opposite requirements. A disposable cache wants an eviction policy that drops keys under memory pressure, since an evicted cache entry simply misses and repopulates harmlessly. A session store holds the only copy of its data, so evicting a session is data loss. Running both under one instance with one eviction policy means the policy that is correct for the cache will silently drop live sessions during a memory spike, producing the random-logout incident. The safer designs are to keep sessions in a separate instance from the cache so each gets the policy it needs, or, if they must share an instance, to use an eviction policy that never evicts keys without an expiry and to ensure session keys are configured so the policy spares them. Stating which keys are the only copy of their data is the prerequisite to choosing safely.

Q: Does choosing a clustered Redis tier change how I write cache code?

Mostly not, with one important exception. Cache-aside reads and writes work the same way, because the client routes each key to the shard that owns it, so single-key operations are transparent across a clustered deployment. The exception is multi-key operations and transactions: in a clustered topology, an operation that touches several keys only works if those keys live on the same shard, so a design that assumed it could atomically read or modify several keys at once will break unless you co-locate those keys using a shared hash tag in the key name. Plan your key scheme with eventual clustering in mind even on a single node, because retrofitting hash tags onto a live keyspace to co-locate keys that must be touched together is painful and error-prone. Beyond that, the patterns, the invalidation discipline, and the time-to-live sizing are identical whether the cache is a single node or sharded across many.

Q: How is using Azure Cache for Redis different from a local in-process cache?

A local in-process cache lives in the memory of one application instance, so it is the fastest possible read but it is not shared: each instance has its own copy, so invalidation must reach every instance, and a value cached on one instance is a miss on another. A shared Redis cache lives outside the instances, so every instance sees the same cached value and a single invalidation clears it for all of them, which is exactly what stateless horizontal scaling needs. The trade-off is a network hop, so a Redis read is slower than an in-process read though far faster than the source query it replaces. Many systems use both in a layered design: a small in-process cache for the hottest values to avoid the network hop, backed by Redis as the shared layer, with the understanding that the in-process layer’s staleness is harder to invalidate and so should carry a very short time-to-live or hold only data that tolerates per-instance divergence.

Q: What should I monitor on a Redis caching layer?

Watch the hit and miss rates first, because the hit rate tells you whether the cache is earning its keep and a falling hit rate is an early warning of an undersized cache, an aggressive eviction policy, or expiries set too short. Watch the evicted-keys counter, because steady eviction means memory pressure that is silently dropping data and, on a mixed instance, possibly dropping sessions. Watch memory usage against the instance ceiling so you can scale or shard before you hit it rather than after. Watch the server load and the count of connected clients, since a connection leak or a load spike often precedes a latency problem. And correlate source-side query bursts with cache events, because a burst that lines up with a key’s expiry or a node restart is the signature of a stampede. These metrics are the prerequisite for sizing time-to-live values, identifying hot keys for stampede protection, and justifying a tier change, which is why wiring them up is an early rather than a late task.

Q: Can I use Redis for more than caching, like queues or rate limiting?

Yes, and many systems do, because Redis offers data structures beyond simple strings that serve adjacent patterns well. Counters with atomic increment back rate limiting, where you increment a per-client key with a short expiry and reject the request when the count crosses a threshold, all without a database round trip. Sorted sets back leaderboards and time-windowed rate limits. Lists and streams back lightweight queues and pub-sub messaging. The same correctness thinking applies: if the data in these structures is the only copy, such as a rate-limit counter you cannot afford to reset, it is a store and wants persistence and careful eviction, whereas if it is reconstructable it is a cache and can be disposable. Using Redis for these patterns is sound, but keep the distinction between disposable and authoritative data clear, because the configuration that is right for a cache is wrong for the data structures that are the source of truth for their own state.

Q: How do I migrate a caching layer to Azure Managed Redis without downtime?

Treat it as an infrastructure migration rather than a code rewrite, because the patterns are platform-agnostic and your cache-aside, invalidation, and stampede logic should not need to change. The work is in configuration: mapping the old tier to the new offering, reconfiguring the connection, and re-establishing persistence and clustering settings. Plan a parallel run where both caches exist and the application can be pointed at either, then cut over, ideally during a low-traffic window, accepting that the new cache starts cold and warms through misses, so confirm your stampede controls and source headroom can absorb the warm-up. Verify feature parity for anything you depend on against the current official guidance before cutover, since some features may differ or be in preview, and confirm the retirement timelines for the older tiers so you migrate on your own schedule rather than under a deadline. The correctness you built into the read and write paths is precisely what keeps this migration low-risk.

Q: Why delete the cache key on a write instead of updating it with the new value?

Because a delete is safe under concurrency in a way an update is not. Consider a read that misses and is about to populate the cache with the value it just read from the source, racing with a write that updates the source and the cache. If the two interleave so the read’s populate lands after the write’s update, an in-place update leaves the cache holding the older value the read fetched, with no further event to correct it until the time-to-live expires. A delete avoids this entire family of interleavings, because deleting the key in any order leaves the cache empty, and an empty cache is never wrong, only slower: the next read simply repopulates from the current source. The cost of delete-on-write is one extra cache miss after each write, which is almost always cheaper than the subtle, hard-to-reproduce stale-data races that updating in place invites. This is why cache-aside specifies delete, not overwrite, on the write path.