We've got a special guest on Whiteboard Friday this week: Rand! After a few weeks of awesome stuff from our guest stars, Rand was missing the limelight and wanted to get back in front of the camera.
So what's the topic? Indexation issues. What do you do when you have a large (or ginormous) site and thousands of your pages simply aren't getting indexed? Well, a lot of times it means you lose clicks and lose business, but there are ways to solve this problem...watch and learn.
SEOmoz Whiteboard Friday-Solving Indexation Problems from Scott Willoughby on Vimeo.
UPDATE: Here's a link to the post I mentioned about different types of cloaking.
What the heck is "indexation"?
Can we kill that non-word, and speak proper English now please?
* "How to Solve Indexing Issues".
@g1smd - I think Scott was just using a little bit of creatility there.
If we let the real word "indexing" slip to the wrong word "indexation", the next thing you know people will be talking about "indexisation" (some already are!), and that process then eventually gets to be called "indexisationing". I don't want to go there.
@g1smd - Don't you think you're overindexsensationalizing this a bit? :)
I was kidding with my earlier comment, referencing John McCain's recent gaffe' using his own word "creatility" in a speech.
Great, thanks. I like when you repeat stuff, it's like a never-ending reminder. Thanks for this reminder.
Nice video, we have refreshed our knowledge base about basic seo things. But, as a non-american, english is my second (or fifth who knows (yes i am pentalingual) :P ) language and i saw that i have a bit listening problems, (dont worry Rand, you speak clearly, it is my problem i know :) ) and have understood just 90% of the speech.
I think it would be great if there will be text version of videos, for non-american visitors, at least for pro-members. You will be going on your way on being the world's best seo community.
Vusal
Hi Rand,
do you consider 301ing each request that would result in a 404 as a good thing? I mean if the page has really moved and there are some links pointing to it, 301 surely is the way to go, but let's say a user types in a funny, non-existing url such as www.yourdomain.com/foobarasdfasdf - then, from my point of view, a 404 would be appropriate.
Or are there any advantages to just 301 all 404s?
If it has moved - 301.
If it has never existed - 404.
If it no longer exists - a custom 404 with links to "similar" content is probably best.
Michael,
I have it on good accord to avoid 301 redirects for all 404s. If it used to exist then you should be fine but if it never existed then I wouldn't do it.
Google can easily test for this and they already do when you try to verify a site in Webmaster Tools with an html file. So they have the ability and I have it on good accord that it is best to avoid 301s for all 404s.
Brent
Indexation: it all started when the SQL dudes created "INDEXES" instead of "INDICES"...
Anyway...we're linking to major category pages in our nav with one set of anchor text, and from our bot-visible sitemap with another, in an attempt to have our category pages rank for two vaguely synonymous terms. Example:
"maui vacation specials" and "maui vacation packages"
Of course, both forms are sprinkled about in the body, the description, the title tag, and one of them in H1 etc., but unless I'm smoking crack we're definitely seeing anchor text on INTERNAL links having a big effect on whether a page ranks well or not for the specific phrase.
Remember, these are internal links, not external...thoughts on whether I'm diluting one for the other? My very rough measurements over the past 10 days since we've tried that seem to show fabulous improvements in rankings for the alternate terms, and not really any drop for the main terms.
Internal linking and the ability to get the optimal anchor text is hugely important. External links of course are hugely important and is what most people think about when talking about "link building," but never overlook the ability to signal and reinforce a page's topical focus through internal linking.
As for dilution...
I think the only way to be sure is to experiment and measure. Every site and industry is unique and how these signals are interpreted may vary from one to another. What you are doing might be beneficial, especially since the two might be seen as synonomous and therefore self-reinforcing, but another usage or this used on another site might decrease the net effect. You'll want to continue to monitor over time to see whether this stays the course.
Rand, great video, I swear you have our conference room bugged. 2 weeks ago I started pushing to get some type of " bot navigtion" for our 65 million articles on highbeam.com. The content in this video outlines exactly what myself and my team put together to get our 64 million pages indexed (currently we only have about 18 million indexed). Great video, and its good to have some backing to show we are going down the correct path.
I really like that you reminded everyone to atleast consider a no follow on every link. It's a completely opposite, yet more effective way to evaluate the situation from a standpoint of priority.
I have good experiences to use only a category driven navigation.
This is especially prevalent on large ecommerce sites, particularly due to the filtering/sorting functionality, which is often a third-party add on anyway. It often becomes even more complicated than just providing multiple paths because these systems are often not exactly search friendly, resulting in horrendous URLs or infinite spider traps.
The end result is often navigational excess at the mid-level of a site, leading to URL bloat and potential duplication, or cannibalization at the very least. Making matters worse is that these pages may be of low value to the search engines and appear as nothing more than search results.
So really, there are a multitude of issues often rolled up here.
I agree that minimizing this redundancy and reducing the number of paths may be valuable. This is definitely an area for diminishing returns. However, I think a one-path strategy could be too harsh. For instance, rather than having every filtering option crawlable, in addition to the category path, a brand path might also be extremely valuable. This also assumes that the pages of these paths are strong enough to stand on their own.
While an HTML sitemap can still be an important conduit, I get the feeling that Google (and others no doubt) are treating this page with less importance than they once did, at least based on PageRank valuation. Seems that sitemaps on more and more sites seem to be carrying less and less PageRank compared to their peer pages. That said, these still may be important pages for flow.
Most sites though could benefit from a little navigational pruning.
Here's a thought: often a nav menu won't have semantically accurate and keyword-focused anchor text links because of design/space constraints.
Would it be an acceptable method of 'white hat cloaking' to create some category nav links purely for search engines and nofollow everything else?
Also I would be a bit worried about nofollowing my 'sitemap' link - surely this is an important way of getting search engines to index deep pages and also distribute link juice? Would adding a Robots noindex,follow metatag to the site map page be better?
I'm not up to worrying about indexing yet...so you can call it whatever you want.
I do have a question though:
I just read an E-book that said that Google reads a page from the top left to the bottom left of the page and then works from the center down.
If this is true, then everything on my blog pages in the left column, which has nothing to do with my post, is being read first by google.
The e-book says this is no good and I should fix it by putting a blank spot on the top left of the page which would force google to the middle of the page.
Is this true?Is the fix a good one?Does any of this matter?
GoogleBot doesn't 'see' the pages as a human do because it doesn't visually interprets the pages. GoogleBot simply reads the page's source code, line per line, from top to bottom, in the same order it was written.
If you want to know what is 'seen' by GoogleBot first, open your webpage and check the source code. Then, try to find out what the code at the top of the page is for and you will know what GoogleBot catches first.
In most cases, if your menu is on the left side of your screen and your content at the right side, the code for your menu will be probably before your content. This mean that GoogleBot will see your menu before your content.
There are multiple ways to put your content before your menu's code by positioning DIVs with CSS. You might want to learn HTML and CSS to be able to do this or simply ask a web developer.
Good advice G-Force. I might also suggest a tool like www.seo-browser.com or SEOmoz's own Crawl Test tool to get a better idea of search engine handling. DaveN's mistitled, but spiffy Keyword Tool is also a good option.
I'd also be really worried about the accuracy of that e-book :(
yes, me too. today there are plenty of useless ebooks that serve for someones email marketing game.
to Masked Millionaire,
you could find it out if you think it logically.
it is funny that sometimes people just forget that googlebot is a robot. :)
Rand, I think after your googlebot maskot this situation started :P
I thought Google DID "see" pages to an extent - in that it attempts to read the content in terms of how the human sees it. Surely they have to do that to negate some of the nastier tactics (hidden text for example).
Google has routines which can discern that a group of internal links within a div, or presented as individual list items within a list (and many other such common configurations), are the navigation bar.
Likewise, common headers and footers, common to multiple pages, are also very easy for them to pick out.
Banner areas, and run-of-site links, are also pretty easy to find and analyse.
Discounting all of those leaves you with the actual on-page content.
Well Rand, love the math skills. I really appreciate the ideas on nofollow. I have just implimented it on to my site and look forward to see what difference i get with the link juice.
Thansk again.
Very cool video. I really liked your advice on using nofollow on some nav methods, but still keeping them around for humans.
Ps: when we give a thumbs up to the videos, the thumbs go to scott, shouldn't they go to the person that presents the video content? Just a detail...
Hey Rand,
When you say "screw these guys" do you mean completely eliminate them as navigation elements or nofollow them?
Is nofollow the new way to say screw you?
I'm almost sure that he was talking about putting a nofollow on these links. As he said in the video, you might want to keep these navigation elements because there are great for humans.
So, keep them for your human visitors and nofollow them so your link juice will be only spread to your category pages instead of multiple similar pages.
Yeah - basically I just mean don't send link juice to alternate forms of navigation, because you're wasting it on pages that don't earn search traffic and draw juice away from areas that potentially need it to stay indexed.
I would really like to hear more on how you could integrate your 'nofollow' decisions into the bot 301 nav.
Levi - I'm afraid I don't really understand what you're asking. Maybe you could clarify?
I am experimenting a site called dating-places.org. We have over a million business profiles. I saw googlebot reach those pages but they don't appear in the index.
Is there any SEO expert that can offer some advice?