Introduction from Rand: Guest poster Eric Enge (of Stone Temple Media) was gracious enough to contribute an immense effort on this impressive guide. In related news, he's done a brilliant, not-to-miss interview with Matt Cutts that was released just tonight. Thanks a ton, Eric - we hope to feature many more of your contributions in the future.

Hidden Text is one of the challenges faced by webmasters and search engines. Spammers continue to use hidden text to stuff keywords into their pages for purposes of artificially boosting their rankings. Search engines seek to figure out when spammers are doing this, and then then take appropriate action. For the average every day webmaster, one challenge is that there are many ways to create hidden text unintentionally, and no one wants to be penalized for something they did not intend to do. To start our look at hidden text, let's examine Google's Webmaster Guidelines for hidden text, to see the bottom line:

If your site is perceived to contain hidden text and links that are deceptive in intent, your site may be removed from the Google index, and will not appear in search results pages

Obviously, this is a fate we all want to avoid. Note the use of the word "perceived" in the above snippet. Doesn't sound like a simple black and white problem, does it? In fact, it's not, so let's look at some of the forms of hidden text.

A Few Ways to Create Hidden Text

There are many techniques for creating hidden text. Some of these can be done without the use of CSS, and they are usually fairly easy to detect:

  1. Make your text and background colors identical (or virtually identical) - this is the original method used by spammers for creating hidden text. It's easy to detect, and I am not aware of any legitimate use for this technique.
  2. Set the font size for text to 0, or to a negative number. This is also easy to detect, and I can't think of any legit use for it either.
  3. Use a Noscript tag. Here is some sample code for this:

    <script type="text/javascript"> <!-- document.write("This text is not hidden") //--> </script> <noscript>This is hidden text</noscript>

    This is really only "pseudo hidden text". While it's possible to make the text contained within the noscript tags different from what is in the Javascript, about 3% of users will see it, and that's more than enough to generate spam report complaints to the search engines. In other words, stuffing a lot of keywords within noscript tags comes with a fair amount of risk.
  4. Text way below the fold. This is also a "psuedo hidden text" technique - that of providing content that is really not there for users. So while it is visible, the text is clearly out of the "action oriented" area of a page, and resides well below the fold, and the user needs to scroll down to see it. The text could well be directly related to the site's basic purpose, and the intent in this case would be that of "keyword stuffing". It's hard to detect algorithmically, but, under human review I would conjecture that it would be seen as a poor quality signal.
CSS Based Methods for Hiding Text

CSS techniques for creating hidden text are more interesting because they are much harder for search engine crawlers to detect unless they crawl and interpret the CSS. Most crawlers don't do that currently. Here are a few methods for using CSS to hide text:

  1. Specify an attribute of display:none. Here is a sample snippet for that:

    <div class="hiddentext" style="display: none;">This text is hidden </div>

    When you use display:none, the specified text does not display on the screen, and it is as if the element is simply not there (it has no effect on the placement of any other items on the page). One example use for this attribute is in dynamically creating printable versions of your articles. You can take the existing HTML version of a page, and create a print page by replicating the page, but applying the display:none attribute to the navigation and advertising elements of the page. It's a great technique that allows you to algorithmically create print pages for your articles quite easily. This technique is also used legitimately for the creation of menus, such as DHTML menus.
  2. Specify an attribute of visibility: hidden. Here is a sample snippet for that:

    <div class="hiddentext" style="visibility:hidden">This text is hidden </div>

    This technique varies from that of display:none. While it also makes the text invisible, the space that the text would have occupied is still used up in the page layout. The space simply shows up as a blank area.
  3. Use the z-index command to place your text on a layer below the currently viewable layer. The z-index command is just like any other property. Here is an example of what this could look like:

    .hiddentext
    {
    position: absolute
    top: 120px;
    left: 250px;
    z-index: 0;
    }

    .visibletext
    {
    position: absolute
    top: 120px;
    left: 250px;
    z-index: 1;
    }


    The "visibletext" div is visible simply because it has a greater z-index than the "hiddentext" div. Of course, it does not take too much of a scan of the CSS to detect this technique.
  4. Fahrner Image Replacement. This is usually done using CSS to place the image over HTML text. It works simply because the text does not appear to be invisible when you scan the HTML. However, after the text is drawn, if you place an image over the same spot, the text will be covered up by the image. One potential legitimate use for this is to make the text available in HTML for the visually impaired, and for search engines, while rendering a better looking version of the text in an image. Susan Moskwa at Google commented on a Google Groups thread about this and said "if your intent is purely to improve the visual user experience (e.g. by replacing some text with a fancier image of that same text), you don't need to worry."
  5. Use CSS to position the text off the screen. Sample code would look as follows:

    .hiddentext
    {
    position: absolute
    top: 0px;
    left: -5000px;
    }


    This is another oldie, but goodie. A revised version of this would be to define a label for a table, so that the table is easier for people using screen readers (with impaired vision) to use:

    .hiddentext
    {
    position:absolute;
    left:0px;
    top:-500px;
    width:1px;
    height:1px;
    overflow:hidden;
    }


    This variant can then be used as a class for label tags within a table. The result is therefore accessible to screen readers, but does not clutter up the screen for users who have normal vision. However, while the intent may be pure here, there is a risk of the search engines misinterpreting your intent.
Flash based methods for hiding text
  1. Scalable Inman Flash Replacement (sIFR). sIFR is a technique that uses Javascript to read in HTML text and render it in Flash instead. The essential fact to focus on here is that the method guarantees that the HTML content and the Flash content are identical. One great use for this is to render the text in an anti-aliased font. This can provide a great improvement in the presentation of your site. At a recent Search Engine Marketing New England (SEMNE) event, Dan Crow, head of Google's crawl team, said that as long as this technique is used in moderation, that it was OK. However, extensive use of sIFR could be interpreted as a poor site quality signal.
  2. SWFObject. Unlike sIFR, this method does not guarantee that the HTML and the content in the Flash are the same. SWFObject does not reference the text in the HTML at all. It simply draws a pre-compiled Flash movie in place of the HTML. At the same SEMNE event referenced in the prior point, Dan Crow indicated that this technique was "dangerous". Even though this technique could be used for entirely legitimate reasons (e.g. the same purpose as outlined for sIFR above), there is no way for Google to detect that. Worse still, since an approved technique exists, it just looks bad when you use an unapproved technique.

Unintentionally Creating Hidden Text

There are a few ways that this happens. One of the most common methods is that your Content Management System (CMS) has some of these techniques built into it. This is actually quite common. In particular, some of the CSS based methods are used by CMS systems. For example, many CMS systems use the display:none technique to to implement drop-down menus, or other widgets that the user clicks on that then "expand" to display more text. Tab folders would be a great example of this. Sometimes the display:none technique is used in user generated content systems where the page normally shows the number of comments on a post, but chooses to suppress the text "0 Comments" in the event that no comments have been made yet.

Another common way that people create hidden text occurs when they start providing enhancements for the visually impaired. As with the example provided above of using hidden lables within a table, it comes about because you are trying to place text in a place that will make it look cluttered to a user with normal vision. The solution people use to serve both audiences is to hide the text from the sighted users.

Detecting Hidden Text

So how does Google do at detecting all of these types of hidden text, and telling whether or not the purpose is a legitimate one v.s. a illegitimate one? A recent post titled Number One on Google Using Hidden Text gives you reason to think that it's not as simple as it sounds. That noted, there are some techniques that Google has clearly labelled as bad, or intuitively just seem bad. These are:

  1. White text on a white background
  2. Setting the font size to 0, or a negative number
  3. SWFObject
  4. Specify an attribute of visibility:hidden
  5. Using the z-index command - someone tell me if I am giving this technique a bad rap, but it smells like trouble to me
Just stay away from these techniques, because by using them you are simply asking to get slapped. There are some methods that could be abused, but may be OK in some cases:
  1. Use CSS to position the text off the screen. This is one of those things that can be abused, or could be used legitimately for improved the experience of users with impaired vision as we discussed above.
  2. Use a Noscript tag. There is a real application for this to deal with those users who have Javascript disabled. This is about 3% or so of the web surfing public.
  3. Text way below the fold. As note before, it is not really hidden text, but it's intent is not good, and it's likely to be seen as a poor quality signal.
  4. Specify an attribute of display:none. This technique certainly can be abused, but it is also commonly used for many types of things as a coding technique with legitimate intent.
  5. Fahrner Image Replacement. I have listed this technique here, even though the Google Guidelines identify this as a no-no. However, one cannot overlook the comments by Susan Moskwa above.
  6. sIFR. The beauty of this is that it by definition shows the same text as the HTML, but still, use it in moderation.
How you get discovered
  1. Putting keywords unrelated to the rest of your content is a sure flag
  2. Putting too many keywords in your "legitimately" hidden text. Too much text in there in general could inspire someone to take a closer look
  3. Use a legitimate technique, but use it too much, so it raises an "investigate me" flag
  4. Use an edgy amount of hidden text in seemingly legitimate ways, but then also participate in several other edgy techniques. This will also raise an "investigate me" flag.
  5. Have a competitor report you. It is in your competitor's interest to do so, and it happens all the time. Google guarantees that all authenticated spam reports are reviewed.
  6. Have your site reviewed by a human. However, this happens, there is no upside to this, only downside.

Google's Position on Hidden Text

It's always good to start with the Google Guidelines for Hidden Text, but you need to look a bit deeper than that. Note the Berghausen, Dan Crow, and Susan Moskwa comments I have referenced above, as well as these statements by Googlers:

In the following Google Groups thread Googler Susan Moskwa had this to say:

Of course, as with many techniques, there are shades of gray between "this is clearly deceptive and wrong" and "this is perfectly acceptable". Matt did say that hiding text moves you a step further towards the gray area. But if you're running a perfectly legitimate site, you don't need to worry about it. If, on the other hand, your site already exhibits a bunch of other semi-shady techniques, hidden text starts to look like one more item on that list. It's like how 1 grain of sand isn't noticeable, but many grains together start to look like a beach.
Related to this is a recent posting by Matt Cutts on Threadwatch
If you're straight-out using CSS to hide text, don't be surprised if that is called spam. I'm not saying that mouseovers or DHTML text or have-a-logo-but-also-have-text is spam; I answered that last one at a conference when I said "imagine how it would look to a visitor, a competitor, or someone checking out a spam report. If you show your company's name and it's Expo Markers instead of an Expo Markers logo, you should be fine. If the text you decide to show is 'Expo Markers cheap online discount buy online Expo Markers sale ...' then I would be more cautious, because that can look bad.
And, in my most recent interview with Matt Cutts, we spoke about hidden text.
Typically with hidden text, a regular person can look at it and instantly tell that it is hidden text. There are certainly great cases you could conjure up where that is not the case, but the vast majority of the time it's relatively obvious. So, for that it would typically be a removal for 30 days.

Then, if the site removes the hidden text or does a reconsideration request directly after that it could be shorter. But, if they continue to leave up that hidden text then that penalty could get longer.
Summary

All these statements suggest that Google does try to detect intent, and is not going to ban a site solely because of someone using hidden text in a way that appears to be legitimate. This does open the door to those who want to abuse this. If someone stuffs a few words in a bit lf legitimate looking text here or there, it's hard to detect algorithmically. However, this is a trap door and an accident waiting to happen. Many webmasters who choose to walk the line on this technique may well be walking the line on other techniques. Google, and the other search engines, relay on this to out real abusers. Also, competitors are anxious to expose those sites that are over the line.

Witness the commentary in my recent interview with Matt Cutts. We talked about a blog post that a relatively little known blog about a competitor ranking for the term access panel using hidden text. Matt Cutts had picked up on this quite quickly, and Google was prepared to take action on it. However, it turns out that the site that was "outed," responded and removed the hidden text, so as Matt indicated in our interview, he removed the offending text. The point is that your competitor wants to report you for doing bad things. That motivation should be a strong deterrent to abusing these techniques.

Ultimately, intent is one of the most important factors. Don't use these techniques to abuse the system. Too much of a good thing turns into a very bad thing. Also, use them in commonly used ways. This is no time to invent some novel new way to apply hidden text to making your site design snazzy or better. For better or worse, doing something unusual, even if your intent is pure, is just asking for trouble. While the search engines want to treat your site appropriately, you make it harder for them by inventing new and unusual coding techniques. Stick to the methods that are commonly in use by others, and you will be better off. In addition, even if your use is completely legitimate, you still need to use any hidden text techniques in moderation. Extensive use of any technique, even in perfectly legitimate ways exposes you to risk. This may by wrong or unfair in some ways, but it's the world we live in. Being morally right, but banned, does not help anyone at the end of the day.

Sources: