How Do Browsers Determine If a File is Cached
Last week someone asked me how browsers determine if a CSS file is returned from cache or not. My response was admittedly rather lame... So I went to go look up the full answer. The first place I went was the W3C web site. For an organization that helps establish standards for the web you would think their site would be a little less hard on the eyes.
Caching is established by the HTTP protocol Specification. The HTTP 1.1 specification is available here. Like everything else on the W3C site it is a dense read that will make your head swim with linked references, obscure statements, and general ambiguities [hmmm sounds like my blog]. The answer to the CSS question is that the file fails the "Validation" check and so the browser returns a new one if the file is changed otherwise the response the server gets is a 304 (Not modified) and so the request body is returned from cache.
Let's look at what the W3C has to say about caching.
Section 13 Caching In HTTP
Caching would be useless if it did not significantly improve performance. The goal of caching in HTTP/1.1 is to eliminate the need to send requests in many cases, and to eliminate the need to send full responses in many other cases. The former reduces the number of network round-trips required for many operations; we use an "expiration" mechanism for this purpose (see section 13.2). The latter reduces network bandwidth requirements; we use a "validation" mechanism for this purpose (see section 13.3).
There are two key constraints defined:
- Eliminate the need to send requests
- Eliminate the need to send full responses
These two constraints are handled separately via expiration and validation.
Expiration
When we set expiration we are saying that the page/content will no longer be valid after some date. Once past that date the server must request new content. That doesn't mean it will get new content. It just means it must request it.
The primary mechanism for avoiding requests is for an origin server to provide an explicit expiration time in the future
Simple enough right? Most of us who have build anything for the web have some experience with caching. Usually it is something that was cached that we didn't want to be cached. Ironically the spec defines a value that was made to address that exact scenario:
If an origin server wishes to force any HTTP/1.1 cache, no matter how it is configured, to validate every request, it SHOULD use the "must- revalidate" cache-control directive (see section 14)
I'm a little embarrassed to say I've never heard of the must-revalidate cache-control directive.
Validation
When a request has been sent it undergoes Validation. In this case there is a determination if the response is the same or similar enough to the previously sent response that the server does not have to send the original response body again.
When a cache has a stale entry that it would like to use as a response to a client's request, it first has to check with the origin server (or possibly an intermediate cache with a fresh response) to see if its cached entry is still usable. We call this "validating" the cache entry.
Validation occurs through Last-Modified-Date headers or ETags (Entity Tag Cache Validators). There is the concept of "weak" and "strong" validators. Last-Modified-Date headers are considered weak while ETags are usually considered strong, though it is possible to alter that.
Section 13.3.4 contains the rules for when to use Last-Modified vs. ETags.
I've read through this document a few times and each time I come away with more that I didn't know.
For a less dense read check out this document.
