We've talked plenty in the past about methods to control search engine spiders' access to documents on your website, and we've discussed cloaking in several depths as well. But I feel that an under-utilized and extremely powerful methodology for serving unique content in different ways to visitors and search engines, based on the different experiences sought by the two, is critical to advanced search engine optimization. So.... Let's dive in!
What's a Cookie?
A cookie is a small text file that websites can leave on a visitor's hard disk, helping them to track that person over time. Cookies are the reason Amazon.com remembers your username between visits and the reason you don't necessarily need to log in to your Hotmail account every time you open your browser. Cookie data typically contains a short set of information about when you last accessed a site, an ID number, and, potentially, information about your visit.
As a website developer, you can create options to remember your visitors using cookies for tracking purposes or to display different information to users based on their actions or preferences. Common uses include remembering a username, maintaining a shopping cart, or keeping track of previously viewed content. For example, if you've signed up for an account with SEOmoz, we'll give you options in your My Account page about how you want to view the blog and remember that the next time you visit.
What are Session IDs?
Session IDs are virtually identical to cookies in functionality, with one big difference. Upon closing your browser (or re-starting), session ID information is no longer stored on your hard drive (usually). The website you were interacting with may remember your data or actions, but they cannot retrieve session IDs from your machine that don't persist (and session IDs by default expire when the browser shuts down). In essence, they're more like temporary cookies (although, as we'll see below, there are options to control this).
While technically speaking, session IDs are just a form of cookies without an expiration date, it is possible to set session IDs with expiration dates similar to cookies (going out decades). In this sense, they're virtually identical to cookies. Session IDs do come with an important caveat, though - they are frequently used in the URL string, which can create serious problems for search engines (as every request produces a unique URL with duplicate content). A simple fix, however, uses conditional 301-redirecting to show bots a non-sessioned version of the page (described in detail here - search engine friendly cloaking by removing session IDs).
IMPORTANT NOTE: Any user has the ability to turn off both session IDs and/or cookies in their browser settings. This often makes web browsing considerably more difficult, and many sites will actually display a page saying that cookies/sessions are required to view their content or interact. Cookies, persistent though they may be, are also deleted by users on a semi-regular basis. This Comscore study from 2007 found that 33% of web users deleted their cookies at least once per month.
How do Search Engines Interpret Cookies & Session IDs?
They don't. Search engine spiders aren't built to accept cookies or session IDs and act as browsers with this functionality shut off. However, unlike visitors with non-cookie-accepting browsers, the crawlers can sometimes reach sequestered content by virtue of webmasters who want to specifically let them through. Many sites will have pages that require cookie or sessions to be enabled, but will have special rules for search engine bots, permitting them to access the content as well. Although this is technically cloaking, search engines generally allow this type of segmented content delivery.
Despite the occasional access granted engines to cookie/session-restricted pages, the vast majority of cookie & session ID usage create content, links, and pages that limit access. As web developers, we can leverage the power of this "accepted cloaking" to build more intelligent sites and pages that function in optimal ways for both humans and engines.
Why Would I Want to Use Cookies or Session IDs to Control Search Engine Access?
There are numerous potential tactics to leverage cookies and session IDs for search engine control. Below, I've listed many of the major strategies one can implement with these tools, but there are certainly limitless other possibilities:
- Show Multiple Navigation Paths While Sculpting the Flow of Link Juice
Visitors to a website often have complex needs for the ways in which they'd like to view or access content. Your site may benefit from offering many paths to reaching content (by date, topic, tag, relationship, ratings, etc), but expend PageRank or link juice that would be better optimized focusing on a single, search-engine-friendly navigational structure. By showing one group of navigation to cookied users and another to the engines, you can effectively have your cake and eat it, too. - Keep Limited Pieces of a Page's Content Out of the Engines' Indices
Many pages may contain both content that you'd like to show to search engines and pieces you'd prefer only appeared for human visitors. These could include ads, login-restricted information, links, or even rich media. Once again, showing non-cookied users the plain version and cookie-accepting visitors the extended information can be invaluable. Note that this is often used in conjunction with a login, so only registered users can access the full content (think sites like Facebook or LinkedIn). - Grant Access to "Human-Only" or "Registered User-Only" Pages
As with snippets of content, there are often entire pages or sections of a site on which you'd like to restrict search engine access. This can be easily accomplished with cookies/sessions and even help to bring in search traffic that may convert to "registered-user" status. For example, if you had desirable content that you wished to restrict, you could create a page with a short snippet and an offer to continue reading upon registration, which would then allow access to that work at the same URL. Registered visitors would continue linking to the same URL spiders index and rank, yet wouldn't give away the content for free in a cached version. Be aware that in these instances, the search engines will only be able to "see" the content you've listed on the non-registered-user page, so be careful to target your titles and snippets with keywords to receive the most traffic possible. You can see examples of this at sites like the Economist, the New York Times, and WebMasterWorld. - Avoid Duplicate Content Issues
One of the most promising areas for cookie/session use is to prohibit spiders from reaching multiple versions of the same content, while allowing visitors to get the version they prefer. As an example, here at SEOmoz, logged-in users can see full blog entries on our blog homepage, but search engines and non-registered users will see only the snippets. This prevents our content from being listed on multiple pages (the blog homepage and the specific post pages), and provides a positive user experience for our members. I discussed this specifically in this post on dealing with pagination and duplicate content on blogs. - Display Content Based on a User's Actions or Patterns of Action
Many sites like to keep track of their users' activities and serve targeted content that is more likely to fit their interests. In the case of many media websites, this means advertising, while for e-commerce sites, it's more likely to be related or recently-viewed products. Bluefly.com is a good example of this - showing visitors the clothing they've most recently browsed.
Hopefully, this brief tutorial has given you a chance for that "Eureka!" moment to help inspire some clever cookie & session use for your SEO campaigns. As always, I'd love to hear your feedback about how you use these features on your own sites.
Hi everybody ; i ' ve been reading a lot on seomoz; but never did post coment ; so here is my first post ..(fingers are shaking)
I m currently working on a shopping cart , (as each visit of a bot on an same page would generate a new session Id ...so a new url but with the same content ... so he we are with great case of dupe content)
to avoid duplicate content ,we use .htaccess with a rule that removes the session id and 301 to the the page with no session id; the rule is activated on list of known bots.
it s a simple solution for a big problem, i m not sue it s suitable for all cases of dupe content but it definetly helps for shopping carts ...
I'm not sure its really true that search engines (Google at least) don't accept cookies. I recently (well 6 months ago) created a site that checks for cookies before allowing customers access to the shopping cart. If cookies are disabled it sends the user to a info page on the topic Google indexed the actual shopping cart page perfectly well, they totally bypassed the "cookie info" page, and never indexed that at all. Cookie checking was done entirely via PHP code.
The php code functions like this:
Kind of complicated, but its not too bad since its all in a few functions.
How is the redirect done?
Google would not follow some types of redirect. They would ignore it.
King - This sounds like a good test project. I would like to confirm this myself for each search engine.
Accepting cookies is trivial, so I wouldn't be surprised if my tests matched your results.
I would be interested to hear about the results of your test...
I just confirmed search engines do not support cookies. I will write a post shortly with all the details.
I like how Google(bot, anyway) is this hugely awesome looking graphic, while poor MSN is just a blue circle.
That's MSN.com, not Live Search. They have their own bot, and he's pretty cool looking. I'll try to trot him out on the blog more often.
Do eeeeeeit.
Just did - check out the new post on Microsoft buying Yahoo! :)
Both Yahoo and MSN seem ...um... evil (is it only me?)... Google always seems so cute btw... :)
So Rand, what if you wanted a search engine to get past a login form to see all your content but wanted users to have to login and/or register. Should you detect the bots and serve up a session to them (without using it in the URL)? If this was done then all people would have to do is change their user-agent to get access w/o logging in or registering.
Any good solutions?
Hi Rand,
Thanks for the link to my article (enarion.net)! Any ideas how this could be improved?
Thx,
Tobias
My first post :) and I have a question:
Our website use cookies for premium content. It gives 5 visits free and the age is 6 months.
After 5 visits it serves subscription box even the visitor is coming from Google or any other search engine.
Does this falls in cloaking? any one seen any warning or penalty for such issue?
just like to point out that many of the bots nowadays do accept cookies, googlebot, msnbot, slurp(yahoo!) all accept cookies.
Hi!
I enjoyed reading the article. It is very interesting and it made me think of things in another way. I am glad to be a member of SEOMoz.
P.S. Yeah, the graphics are very cool as Lorisa rightly stated before!
I don't know if this is something you see as part of the purpose of the blog, but I bet a lot of people would appreciate some sample code for cookies and explanations of what the code would do.
cookies is great for those users who are fun of browsing so many sites.,
Hey, this is a great idea. I never thought of using cookies for cloaking!
Technically speaking, session IDs are cookies too. The original concept, as you suggest, was avoiding expiration dates. Without expiration dates the browser won't persist the cookies to disk and will only keep them in memory. However, as r_wetzlamayr says, popular programming languages do specify an expiration date by default. They just make it short (a couple of hours, a few minutes, etc). You can experience this when you are automatically logged out from most dynamic sites when you haven't done anything for while.
Cookies are usually great for the consumer and the site owner as well, but in the illustration I think the PC would eat the cookie rather than store it for later... :-)
Rand,
Nice post. Like seeing this more advanced level of SEO on SEOmoz.
The company I will most likely be working for in the next couple of weeks (final revisions being made to offer letter) has significant duplicate content issues created by content versus the standard canonicalization or URL structure issues (though they have those issues too). A specific question during the interview process was given to me regarding this problem and I provided a solution via JavaScript but doing so via cookies would be even better.
Keep this 'deeper' content flowing please!
Brent D. Payne
great article, and I too stress the importance of avoiding dupe. content issues with sessionIDs.
I don't know why, but I just love these technical posts even though I don't always understand (or have use for) the information they contain. Maybe I just feel smarter after reading them?
Anyway, I really appreciate the time you put into posts like this, especially the cute and helpful graphics.
Great post. thanks
Rand:The website you were interacting with may remember your data or actions, but they cannot retrieve session IDs from your machine.This is not generaly true, and it solely depends on the session implementation.
For PHP, the session lifetime is determined by session_set_cookie_params, so a programmer can choose to expand a session's life time to any duration which seems fit. Up to the year 2037, just like for any cookie.
Thanks for the catch - I've edited the post and added in that note (along with some extra informationabout warnings for bots seeing session IDs in the URL). Need to be more careful when publishing late at night...
I love how reading an article about something that every SEO has addressed at some point in their careers inspires new thought! I've been using sessionid's for years (and cloaking them) but i just has a eureka moment with a site that's duplicating a snippent of content in the page template..
Cloak the snippet, duh!!
Great post Rand, cheers
richardbaxterseo
How do you turn off session IDs in the browser?
The session ID is in the URL in the link on the site you are viewing.
The session data is stored on the server. The expiry of the session is controlled by the server.
I could be wrong, but I think by turning off cookies, you also limit session IDs frmo being stored on your machine (obviously, this won't prevent sessions that are stored on the server side only, though).
Session IDs are in the URL, in the link you click on.
Sessions IDs can be implemented as either URL parameters or cookies. Most modern implementations today use cookies or a combination of both.
From the web server perspective, a user session is simply a collection of information about a current visitor that is accessible via an ID. That information is usually recorded in memory. However, there are many implementations that record them in the database or disk. That ID is given to the user in the form of a cookie or URL parameter.
If the user deletes the cookies or disable them, (and there is not URL parameter for the session ID) the server won't be able to extract the information (even though it is still memory).
There's a big difference between session IDs in URLs and session cookies. However, I can see why people can become confused about these things.
g1smd,
I am not sure what you mean with "a huge difference". Session IDs can be implemented as cookies with short expiration dates, session cookies or URL parameters.
See https://cookies.lcs.mit.edu/seq_sessionid.html
See https://searchsoftwarequality.techtarget.com/sDefinition/0,,sid92_gci1158582,00.html
While doing the search, I found a new technique I was not aware of. I need to see if current browsers and servers support it. There is an (new?) HTTP header called Session-Id. See https://www.w3.org/TR/WD-session-id