5 Developer Snafus that Kill Your Traffic

You've developed what you consider to be a killer new site. Yet, for reasons unknown to you, nobody is visiting. Your design is amazing and your code is solid. But the search engines seem to think you're invisible, and you don't know why. You might be committing one of the 5 developer snafus that kill your traffic.

1. Bad robots.txt file

When the search engine spiders come to scan you, they look at the robots.txt file and use it as a guide to the rest of your site. It's simply a file named "robots.txt" sitting on your web server, usually in the root, which tells them which pages on your site to index and which ones to not.

The main issue most developers come across with the robots.txt file is the use of wildcards. Imagine your robots.txt file has the directive:

User-agent: googlebot
Disallow: /manage

You put this directive in place to block indexing on the "manage" directory, which theoretically houses the administration functions of your site. But because of the robots.txt wildcard functionality, besides blocking the manage directory and all of its descendents, you also blocked any files starting with the text "manage" -- such as manage.html or management.html -- which you actually wanted indexed!

To fix the problem and only block the manage directory and its descendents, add a trailing slash to the "Disallow" line:

User-agent: googlebot
Disallow: /manage/

The robots.txt file is also very picky on its syntax and case. The following are some other common problems many developers run in to:

Adding illegal comments to the file
Adding extra spaces at the beginning of a line
Swapping the order of the "User-agent" and "Disallow" lines
Having more than one directory/file per line
Incorrect capitalization, such as "User-Agent"

If you think your robots.txt file may be to blame for your lack of traffic, take a look at this online robots.txt validator.

2. Hiding content behind user signup

Users of search engines have become accustomed to being able to search for a term, clicking a listing in the search results, then quickly browsing that site and accessing the information they were looking for. Any barriers to that process causes users to high-tail it back to the SERPs for another site that offers an unobstructed path to content. Surfers have a very low tolerance for anything other than the path of least resistance to their information. Requiring users to create an account to access relevant site content is a huge barrier to traffic, and one that can be easily avoided.

I can understand that some webmasters may want to collect user information right off the bat, but you're doing both you and your users a disservice. Requiring the process restricts the search engines from accessing your content. The spiders are just like a normal users, but they can't sign up for an account to gain access to the content. If they don't have access to it, they can't index it. And if people do manage to make it to your site in the first place, the barrier presented by registration causes them to leave your site for less obtrusive ones.

3. Accurate title tag

I've seen far too many developers not worry about setting an accurate title tag for the pages on their site. Why not take the time to code up some display logic to show the correct title for pages on your site? Mostly it's laziness; that's the excuse I used to give. :)

But in all actuality, the title tag is essential to building traffic. Most search engines use the title tag as the link text for your site and as a tool to tell what the page is about. Having a site full of the same titles hurts both of those factors, lowering your rankings and traffic.

4. Session IDs in the open

Session IDs in URLs are bad, but I think I'll let Dan Thies speak for this himself.

Feeding session IDs in URLs to spiders is one of the dumbest things you can do to a website

And I don't disagree. But not only are they bad for SEO, they're horrible for security. With the session ID in the URL, any unsuspecting user who copies and pastes their URL in to an online message board will unknowingly tell everyone their session ID. A hacker can then use it to perpetrate the session version of identity theft. That is bad my friend. Very bad.

For SEO, session IDs in the URL can also result in duplicate content being indexed by the search engine spiders. This means the same page gets indexed hundreds or thousands of times, which is never a good thing for your ranking or people trying to find you from search engines.

5. Nasty HTML

To facilitate the search engines reading your pages correctly, it's best practice to code your pages in a semantically correct fashion with as few syntax errors as possible. Incorrect code can be unreadable for the search engines, potentially sending their bots in to infinite loops or causing other problems.

For semantics, I would recommend using the CSS driven design method. Having your markup (HTML) separate from your styling (CSS) is not only beneficial for design and development, but by using vanilla HTML to markup your content the search engines are able to parse through your page much easier. In addition, the use of <h1> and other tags correctly in page markup helps to tell the search engines what your page is about, further improving how the search engines index your pages.

To code with the least syntax errors as possible, I would suggest running your page through the WC3 page validator. It's indespensible for finding markup errors (such as a missing quote mark or end tag) and helping your page conform to web standards, which is always a good thing.