After Google’s announcement of the impact of site speed on search rankings, many articles were written about the benefits of setting expires headers to control browser caching. However, after researching cache control extensively I found no articles that explain how to determine which resources of a site should be cached- so instead of explaining how to implement them (there are tons of resources out there, like Apache.org) this article explains what expires headers are and how they benefit SEO, explains the dangers of improper implementation, and offers some insight on preventing issues.
What Are Expires Headers and How Do They Benefit SEO?
Expires headers tell the browser whether a resource on a website needs to be requested from the source or if it can be fetched from the browser’s cache. When you set an expires header for a resource, such as all jpeg images, the browser will store those resources in its cache. The next time the visitor comes back to the page it will load faster, as the browser will already have those images available.
Improving a site’s loading speed ultimately improves its functionality and has many benefits including lower bounce rates, higher average time spent on site, etc. (because no one likes slow websites!). If your site is faster than your competitors’ sites, you may also see better rankings in Google’s search results. Another benefit is that better site speed can also improve the cost of hosting a site that sucks up a lot of bandwidth on your server.
There are many tools out there to help measure a site’s speed, such as Yslow, Google Pagespeed, Pingdom Tools, etc. and they all recommend implementing expires headers- but none of them explain what to consider before doing so. For example, the Yahoo Developer Network states that there are only two aspects to cache control: “For static resources: implement “Never expire” policy by setting far future Expires header,” and “For dynamic resources: use an appropriate Cache-Control header to help the browser with conditional requests.” What exactly is an “appropriate” header, though?
The Dangers
The ultimate purpose of setting expires headers is to avoid unnecessary HTTP requests, but how do you know when a request is and isn’t necessary or appropriate? Here are some things to think about before implementing expires headers:
Which resources on the site do you expect to update frequently? The obvious disadvantage to implementing expires headers is the fact that if a resource is set to expire too far into the future and you want to make updates to the site (whether planned or not), your visitors will not see the changes until the header expires in their browser. It is important to think about how far into the future you expect the resources to remain the same to determine the most appropriate expiration date. This seems obvious but it is important to evaluate even the smallest resources of a site before setting any expires headers. That being said, I highly recommend not using the ExpiresDefault header that sets a default time span for all resources. In most cases, you will not want to set a general expiry date for everything.
Is the site an ecommerce site? If so, you need to consider the effects of setting expires headers that may cause issues with the shopping cart. Ecommerce sites can run into major issues if expires headers are not set appropriately. For example, if a returning visitor tries to add products to their shopping cart but the resources are cached (such as html, css, any product image files, etc.), the cart will show products that were added in the past and not the recently added products. Of course, simply refreshing your cache with CTRL+F5 will fix the problem, but how many visitors are going to know that? They are going to become frustrated and end up leaving without buying anything.
- Can I cache only specific images/scripts/html/etc? Sometimes it is not appropriate to cache an entire group of scripts, images, or other static resources across the entire site, but it would be helpful to cache certain ones. For example, if I have an ecommerce site that is updated somewhat frequently, I may want to set expires headers for specific images rather than all the images on the entire site. Setting expires headers for certain resources, like header images that do not change, will allow the site to load faster, while the product images can be updated without the browser caching them. This will ensure that you are able to update product images and your customers will see the new pictures and, most importantly, the product images will not get cached or “stuck” in a user’s shopping cart.
-
How can I cache only specific resources? Using the example above, if you would want to add expires headers to only specific resources on your site you can do so in a couple different ways.
- One way is to create two separate asset folders: one for static resources and the other for resources that are frequently updated. Place all of the resources (such as images, scripts, etc.) that you would like to set far-future expirations for into the static folder and then add an .htaccess file to that folder that includes the expires headers. Place the rest of the resources that you do not want cached into the other folder . The folder with frequently updated resources should also contain an htaccess file, but should have a no-cache header explicitly stated in it (so the opposite of the other folder). This ensures the content won’t be cached.
- If your site is built on a framework like CodeIgniter, simply add static sub-folders within your images, css, and scripts folders and you can add individual .htaccess files to each with the appropriate expires headers.
- For .NET sites, you can separate specific resources of a page you would like to cache by “fragment caching.” Basically, you would create your own controls that would contain expires headers for the specific resources you want cached. Take a look at Caching Portions of an ASP.NET Page for more information on how to do this.
-
Another way that is recommended by Google Developers is to use “fingerprinting” to cache resources that change occasionally. There is an example of how this is done on the Google Developers Best Practices page, but to sum it up- according to Google Developers:
“You accomplish this by embedding a fingerprint of the resource in its URL (i.e. the file path). When the resource changes, so does its fingerprint, and in turn, so does its URL. As soon as the URL changes, the browser is forced to re-fetch the resource. Fingerprinting allows you to set expiry dates long into the future even for resources that change more frequently than that. Of course, this technique requires that all of the pages that reference the resource know about the fingerprinted URL, which may or may not be feasible, depending on how your pages are coded.”
Every site is going to have different requirements and different needs for functionality. Not all resources on a site absolutely need to be cached either. For instance, html generally loads pretty quickly so there is usually no real need to cache it regardless of how terribly slow your site may be. If your site is slow (especially your homepage), I would recommend taking the questions above into consideration and focusing on the major resources like images and scripts that tend to be the main cause of slow sites . If you can master expires caching and understand when it is and isn’t appropriate you can greatly improve the functionality, usability and search engine-friendliness of your site.
So, what do you think- did I forget anything? Do you know of any other potential risks of setting expires headers? What do you consider are best practices for adding expires headers?
Great post! I just had that discussion with a client where they asked me what deadline they should set on the header. There is no true guide out there with best practices and example so maybe this conversation will get this started :)What happens if you have an expires headers in the <head> of your page with a date different than the one set up in htaccess? Which one would get priority?
Thanks, this is a question that I would like to test out, but I believe the .htaccess has the authority here because meta data is used by browser caches, but not usually by proxy caches. From my research, meta tags are generally considered less reliable and many believe that meta tags should not be used at all for cache control. I am interested to hear if anyone else has found this out through trial and error though.
Yeah, I also believe that htaccess would get the priority here but this would be an interesting experiment. I'm already running one at the moment and running out of time but it might try that afterwards.
Maybe a nice addition: if you wish to force an update (for cached images/CSS) you can always add timestamp to image/CSS URLs (image.jpg?s=12495884). Nice down to earth entry!
IMO this wont have any effect on SEO at all.
It doesnt matter much, if at all, if a user browser caches content or not as per SEO. Why? Its the search engine bots that determines the page speed, not visitor browsers. Because search engines like Google servers will most likely not cache any of the contents on their memory from the previous visits to a site; they will get wiped mainly because of the huge number of sites they check every day.
Considering the page speed tool exists, that they have a browser simulator (See that pac-man analogy blog post from a while back), it would be trivial for them to load the page twice to see how quickly the page loads, or simulate the removal of cached items
It would be something they check explicitly, not via random caching.
Either way it should effect your conversion ratios by improving the user experiance.
Hello, thanks for reading my post. I can understand the argument that faster page loading speed may not have a huge impact on your site's rankings but it will have an impact on the user experience, as MyHolidayMarketing pointed out- which may make an impact on your rankings in other ways.
One of the reasons SEO is so interesting to me is because it has so many different aspects that play a role, and user experience is a big one. Visitors do not like to wait around for your site's pages to load - slow websites equal much higher bounce rates and lower conversions. Google wants your site to function efficiently and load quickly because it knows those are important characteristics to users. As Google states, "caching is a double win: you reduce round-trip time by eliminating numerous HTTP requests for the required resources, and you substantially reduce the total payload size of the responses." How Google measures your site's page speed when it determines your ranking is not entirely clear, but I think making the effort to improve your site for user experience is definitely a plus regardless.
@questfore Its really an interesting concept. SEO benefit or not, all sites should be optimized for a better page load time. Fast page loading could lead to the many great things as per user experience as well as business.
When setting global expire headers for all files of a certain filetype, a general best practice is change the file name every time you modify the file. That way file.v1.js will become file.v2.js.
If you use a dynamic CSS/JS serving script (for example like Minify: https://code.google.com/p/minify/), you can use a dynamic URL to request the file, without the need to actually rename the actual resource. I think minify has a feature for that where they automatically set the expire header to long if you append a numeric value to the request URL.
You should mention that browser caching is good in some cases, but often serverside caching is preferred. You can't control when the cache is invalidated with browser caching. This can be a major issue that serverside caching solves. Serverside caching has the same speed benefit. The difference is negligible, especially if you are serving up images and includes in a single request.
As far as eCommerce goes, Magento has a really excellent system for managing cache. If you are looking for a more general solution, Alternative Page Cache (APC) is easy to use and has a slew of great tutorials.
Hi - questfore.Probably a dum question (so appologies if it is) - Are there elements on a page that can be updated outside of the main html cache? For example, can you serve up the bulk of the site's code in cache and then use dynamic javascript to load rss feeds into that static page?Vast majority of sites on the web don't have fresh news - so they can probably get away with caching the site. Seems to me that you could tie the caching process to your own internal processes for publishing content for 24-hours and then as part of your publishing process get into the the habit of pushing updates all at the same time.Also - can you throttle caching? So you set the site to cache for say 23-hours and then set no-cache for the 1-hour whilst you do the content updates?
Hi Robin,
The primary issue with changing your cache timing is that the browser doesn't know you've changed it. Assuming you have a 23-hour cache on a page that updates once per day - a browser comes to the site at 11:30 and it told 'this content won't change for 23 hours'. You make your updates at noon and change the cache timing to 1 hour. That returning browser won't look for a new copy until 23 hours have passed - it has to send a new request to the server in order to realize the cache timing has changed. At least that's my understanding of the life cycle of it.
To your point on the dynamic js - if the page the js resides on is cached, then the javascript won't really play a role here. Something like an RSS fee should never be cached - it's assumed you always want to get the freshest content available. Now pulling the latest story onto your home page could be a different story - in a situation like that, you may want to only query the feed every 3-4 hours to cut down on cross-page requests.
Using a blog as an example, I think the reverse of the throttle cache would be most appropriate. The main page of your blog that lists the most recently added articles could have a 1 or 2-hour cache (or higher depending on traffic). Individual articles referenced within (eg a blog post itself) could have their page cached for a week or a month, as it's assumed it won't update very often. Both pages however ( the main listing page and the blog post) could have the same cache headers for static images such as header/footer images.
Hope that helps!
Brilliant. Thank-you for such a comprehensive answer!
I'll need to run some of this past my tech team to see what we can and can't do with our CMS solutions. I definitely think cache would help because it seems to me dynamic CMSs makes lots (& possible unecessary) of checks for updates to the database.
Thanks again :-)
I grappled with this so much last year, fighting with my web host to try to impliment a lot of these settings. I wish I'd known then of how little importance these things are. Thanks for the post nonetheless.
Page speed always matters a lot for the better performance of the website and expires header make it easy for the browser to load it quickly because a single second delay in website loading impact a lot on the website progress and each second delay cause a great decrease in the website traffic.
When I read about page speed seo factor so I try to find theme which load very quickly and I found it after deep research. Also I remove all images and extra plugins from my site to open it quickly. Now my site load between 2 to 3 seconds and I check it on gtmetrix, google page speed test tool and on tools.pingdom all tools show me different statistics which I mentioned in this image. https://postimg.org/image/ytg9zr6rn/full/ also google page speed test show different statistics if I check it now and just after some second and again after some seconds so all stats will be different.
I want to know my site don't have images now so I want to know that are images necessary to get high rank in google search or my site can get high rank without images in posts or as featured image.
Also I want to know that gtmetrix says that
To speed up page load times for visitors of your site, remove as many landing page redirections as possible, and make any required redirections cacheable if possible.
https:// nonwwwsite is a short-cacheable (1 hour) redirect to https:// nonwwwsite
There is also an error about Optimize images I don't know how to compress images without losing quality and I also have very low size of images in gif format.
So what is this and how can I fix it.
Also there is a problem with https if you can help me . My site is a blog site and now after https feedburner do not get feed from my site. is there any way to exclude feed page from https.
Better use expired header date or use fingerprint method to make fast the loading of website, please some one help me, this expired header date is a recomendation from google page insight, i wanna lauch my blog, but i wanna my blog better on load. Please some one, this is my Blog https://www.mugianto.web.id/
Sorry for my bad Grammar, and thanks alot for your attention.
Regards
Mugianto
is there a relation between expire headers and crawling by google?
with a special interest in rich snippet data on a magento mulitstore enviorment using full page caching?
Great post, It is true that importance of website speed is a factor in SEO and we need to optimize the code to lower the site opening time.
Thanks, gave good idea.
Great post! I agree with @rizz0 - it sounds like he's using a fingerprinting method just like we do.
Wow great post! So how does this benefit SEO exactly? Especially SEO of those with multiple product results pages, like eCommerce sites?
Thanks Dmitry, using expires headers effectively can greatly increase your page loading speed because it can eliminate unnecessary browser requests. Page loading speed often gets overlooked, but after trying it out on a few sites myself I realized how powerful a tool expires headers really can be. I use Pingdom page speed checker, Yslow, and even Google Pagespeed and I was able to measure how much of a difference they make- and I was impressed! I didn't want to go into too much detail about the benefits of using expires headers for SEO on this post because I have found tons of great articles on that already- let me share a few with you:
From Google: https://developers.google.com/speed/docs/best-practices/caching
From fortheloveofseo.com: https://fortheloveofseo.com/2012/leverage-browser-caching-how-to-add-expires-headers/
Ask Apache: https://www.askapache.com/hacking/speed-site-caching-cache-control.html
hope that helps
Thank you!
Did you check this particular page with YSLOW. I think you should. I believe its very difficult to maintain such standards in the real world. And some times it does not matter at all if you have great contents and followers. Contents will bring the followers and the followers will bring in the fans. That is how it works.
Thanks
Arun
You make a good point Arun, the quality of information on this site definitely trumps its page speed scores that I see from both Pingdom and YSlow, but of course page speed is only one of the hundreds of factors that play into a website's performance in search. Plus- not all of the websites that we work on are as content-rich as SEOmoz. Sometimes getting enough quality content for clients (especially ones that are heavily regulated for legal reasons) is even more difficult in the real world. Every website is going to have different strengths and weaknesses.
Thanks for your input!
Hi CJ,
Thanks for writing a useful post about Expires headers, with a different angle that I've seen before! If implemented wrong, it can really mess up the experience for your website visitors - not only for ecommerce sites, but also for forums or other sites that shouldn't have caching for everything.
You mention that there are many tutorials out there about how to implement browser caching, but I was really battling with finding easy-to-understand info about it, and eventually ended up writing a tutorial about it that I hope can help anyone that wants to understand how to do it: https://fortheloveofseo.com/2012/leverage-browser-caching-how-to-add-expires-headers/
Cheers, Tess
Thanks tessneale, I actually just recommended that same link to Dmitry- it is a good one for sure
Is anyone aware of current ways / reasons to use or avoid expires headers after Penguin 2? As penguin considers how fast a site loads even more, are there imporved methods to accomplish this?