Late last week, Eric Enge of Stone Temple (and a co-author of mine on The Art of SEO) published a fascinating interview with Google's head of Webspam, Matt Cutts. I think the whole of the SEO community can agree that Matt taking time for these types of interviews is phenomenal and I can only hope he does more of them in the future. Understanding more about Google's positions, their technology and their goals will benefit website creators and marketers dramatically.
The interview itself is certainly worth a read, but as one mozzer noted to me during the email string on the subject "I'm embarassed to say I couldn't make it all the way through." Fair enough; and that's why I'm presenting Matt's primary points in graphical, cartoon format. I've also included some adlibbing, interpretation and fun into these. Only the bits surrounded by quotes were actually taken directly from Matt's words, so please do keep in mind that this is my opinion of what Matt means (along with the occassional editorial).
#1 - There is No Hard Indexation Cap; But Indexation Has Limits
#2 - Duplicate Content Might Hurt Your Indexation
#3 - Lots of Qualifiers on Whether Affiliate Links Count
#4 - 301 Redirects Pass Some, But Not All of a Page's Link Juice
#5 - Low Quality, Non-Unique Pages Might Drop Your Indexation
#6 - Faceted Navigation and PageRank Sculpting are Thorny Issues
Personally, I liked how much Eric pushed Matt with scenarios that would require some advanced methods of showing faceted navigation to users but not search engines. However, I also understand that Matt needs to take a position that's right for 95% of site owners 95% of the time or risk creating a new "PR sculpting" issue.
One other item that really stood out and got me excited was this response:
Matt Cutts: (with regard to links in ads) Our stance has not changed on that, and in fact we might put out a call for people to report more about link spam in the coming months. We have some new tools and technology coming online with ways to tackle that. We might put out a call for some feedback on different types of link spam sometime down the road.
That sounds really good - a huge frustration for the SEO world has been the fact that so many SEOs perceive their competitors to be outranking them with black/gray hat linking techniques and feel they must engage as well is order to stay competitive. Shutting this down or making SEOs feel that Google is taking consistent action when obvious manipulation is reported would go a long way to quelling this thorny problem.
My last recommendation is that you check out Eric's 29 Tidbits from my Interview with Matt Cutts; a post that summarizes a lot of the critical information and takeaways quite neatly.
To end, I thought I'd add the four questions I wish Eric would have asked Matt (maybe next time!):
- With Google's new recognition of internal anchor links and listings of those URLs in the search results, is it still safe to link to internal anchors on pages and trust that the link juice will flow to the page as a whole, or are content blocks inside individual pages now being treated as unique entities?
- With the handling of nofollow changing and Google crawling/executing Javascript, what's the best way to link to a document on the web so human visitors can access it but search engines cannot WITHOUT wasting link juice/PageRank (robots.txt, for example, couldn't do this) or cloaking?
- Does Google now (or will you in the future) consider the sharing/linking activities happening on Twitter, Facebook, etc. to have any impact on the overall link graph of the web (assuming we're talking only about those links that don't make their way onto standard web documents)?
- When people ask the question, "why is my competitor ranking so well with low quality/manipulative links?" you often reply that they should be careful in presuming that Google hasn't already discounted the value of spammy links and the competitor is actually ranking on the basis of quality link sources. This creates an environment where marketers are constantly trying to discern which links pass value and which don't - could you give advice for relatively savvy, experienced SEOs to help them make those determinations so they can pursue the right links and stop paying spammers for the wrong ones?
If you've got thoughts to share, questions outstanding from the interview or my amateur drawings or things you wish Eric had asked Matt, feel free to post them below.
I think there was an another important point that Matt Cutts Claimed.
"What we try to do is merge pages, rather than dropping them completely. If you link to three pages that are duplicates, a search engine might be able to realize that those three pages are duplicates and transfer the incoming link juice to those merged pages."
So Google will choose a page from all the duplicated versions to index and they might give that page credits from other duplicated versions.
*My last comment was cannibalised by your system...
I agree, this indeed is quite revealing. Although I think, that this may not work in every case that we would wish it to. For example Google may recognise exactly the same duplicates that were caused for example by ?gclid= parameter, but it cannot recognise properly duplicate products or some listings with different sorting parameters defined. So it is still necessary to use canonicals.
Yeah, it is absolutely important to prevent duplicating issues.
It is very uncertain to let Google make decisions on behalf of you, and they won't always be kind enough to transfer link juice.
Matt Cutts' statement indicated that Google is obviously aware of this issue as most of the webmaster don't have a clue what duplication issue is, and they now have the ability to determine duplication issue and bring back the link juice a page deserves.
It's won't affect most of the SEOers as we either implement 301 redirection or canonical tag to prevent this from happening, but it's definitely a good news to whom who doesn't know duplicating issue and suffer from it.
Yes, I found this part the most interesting, and one of the best examples of the complexities of SEO...little is truely black or white (not a hat reference), cut and dried, but a complex overlapping of obstacles and impact.
On one hand you have the promise of consolidated PageRankacross duplicates (assuming Google correctly identifies and associates with the desired URL) placed up against diminished crawl equity due to URL bloat.
Unfortunately, I think Matt really down played the diminished crawl equity concern with trying to down play the crawl limits...there are limits, whether they be hard and fast numbers, algorithmically calculated, or simply a matter of space and time.
From my own research across a number of sites, the level of "uniquely crawled" content within a month's time may be much less then people would expect. The homepage, top level pages, and some highly popular pages receive a lot of repeat visits, followed typically by a very small percentage of pages that may receive 1-3 repeat visits a month...often leaving the bulk of a site's pages not being seen within 30 days.
For me, eliminating or greatly reducing duplicate content through whatever means will still be priority #1, even if it means giving up on potential PageRank consolidation that Google might do on its own.
More than anything, this interview illustrated the need to test and measure the impact. How Google reacts to tactics may differ from site to site.
Clients however, who often want a clearly defined answer and expectation of the outcomes, won't be thrilled with the reality that "your mileage may vary."
Hi Identity,
About crawl limits, there is indeed a limit, even the universe has a limit i guess. (maybe it is sitll expanding who knows :D)
But this limit is far larger than what I've expected, from my personal experience.(Below is the story, it's a bit long)
I have a small client, about 1 month ago, they have around 200-300 pages indexed in Google and I found out (using Xenu) they have a "deliver to towns" module that generates tons of links (around 30K), these pages are generally duplicating each other except names and they have very limited content which means it is not important to Google at all.
I asked them to get rid of this module, they agreed to reduce the level from town and county, and they redirected all these town links to an important page around two weeks ago. (which I didn't expect)
This week, when I checked how many pages were indexed, I saw an unbelieveable number - 15000+.
This is a very small website, and Google has the ability to index 15000 more pages immediatel and 98% of them are duplicated version. So I guess it's quite hard to reach the crawl limit.
Also, Matt Cutts once mentioned (cannot remember where, maybe his blog) it doesn't require huge amount of resource to index pages at all. Google's data centres are mainly working on algorithm of ranking.
Take a look at the crawl frequency on the site's pages over time.
Once they are established into the site, does their crawl frequency stay the same? Being low value, probably not, but more importantly, how does it impact the crawl frequency of the more important pages...do you give up there to gain here?
This is also important to understand with regard to site changes and optimizations and giving them enough time to truly be crawled and for their impact, good or bad, to be realized.
Further to this point, if you could identify the crawl frequency of your pages or page types (i.e. blog pages vs product pages, etc.) you can determine which pages are "important" in Google's eyes and make decisions on where to further your SEO efforts.
Just b/c a page is crawled and indexed today doesn't mean it will stay there in perpetuity. Crawl frequency is a good indicator of quality.
nice illustrations
Nice illustrations rand! :D
Thanks for the excellent and clear illustrations... :) I hope your additional questions to Matt are answered at some point in the future. Very useful info!
I love the Twitter / Facebook question. From a technical standpoint, I would think that Google would be able to isolate those networks and apply similar authority / spam metrics to their pages as they do to the "traditional" web (whatever that is ;) ).
If they can determine the worth of pages across the Internet, developing trust metrics within Twitter would seem to me quite elementary, and Facebook only slightly more complicated. People share so much in these two places especially: even in the short term, they could be useful for discovery.
One other question I'd like asked regards dynamic parameters. We all still tell people to use static URLs and avoid multiple parameters, but is this still necessary? I'm well aware that AMP.com.au's URL for home loans is horrifying (https://www.amp.com.au/wps/portal/au/AMPAUCategory3C?vigurl=%2Fvgn-ext-templating%2Fv%2Findex.jsp%3Fvgnextoid%3Deb00ae205f711210VgnVCM10000081c0a8c0RCRD)
... and the site resolves with https. And embraces unnecessary virtual subfolders. But how much junk in a URL is too much in 2010?
I believe that regarding the Twitter / Facebook question, that Google will also kind of have some "PR" applied to the links, depending on:
- who posts the links? is it someone spamming Twitter, or posting only a a "normal" basis? So some kind of Authority will be playing a role into that
- the content around the link, even if 140 caracters won't make it easy : is this tweet relevant to users
- reputation: how many followers are reading that tweet? I guess that noboday under 15 000 would be taken into considerations
We all know that Google is figuring out our Social Media environment, therefore it is figuring out our Social Media Influence. So Google will be using this SMI do give weight to the links.
From my experience, Google has certainly gotten better over the years with dynamic parameters. I've seen some pretty horrendous URLs getting indexed fine...of course, those are also often the exception.
Working with a lot of ecommerce clients, you get to experience your fair share of URL blech. My general push is to try to get them cut down to at least 3 or less, and even then, trying to avoid overly complex or sessionID looking parameters. When I see sites getting over 3 parameters is when I really see hit or miss indexation.
It's often more complex than that though since these sites may often have huge levels of duplication due to parameter ordering, etc.
Agree it would be nice to get some more direct feedback from the engines. I think Yahoo and now Google's ability to allow webmasters the ability to inform them of parameters that can be dropped is about helping to reduce duplication but also illustrates how complex the URL constructs really are.
Good point about parameters and duplicate content.
On a related note, parameters can cause search reputation management issues due to malicious modification and linking / indexing :D. Does anyone remember in 2008 when we realised that we could change some big ecommerce site's image parameters to show very different images with product descriptions? I don't think it was Wal-Mart (perhaps K-Mart).
We could essentially create the online equivalent of this. Hilarity.
Ha! Nice.
Even keyword-friendly looking sites aren't immune. A number of sites have implemented, either on their own or through 3rd party CMS, keyword-rich URLs with a unique identifier.
So "this-is-my-page-1234" or "this-is-my-page/1234" allows the keywords to be changed to hearts content and the 1234 is really the identifier for the page.
Always fun showing clients the potential issue of this with a screengrab example of a dummied URL, like "this-is-not-the-product-you-are-looking-for-1234".
Or the other example where the title and/or heading for the page pulls in cues from the URL too. Which could totally lead to that spoofed but potentially negative viral impact.
Oh, I know :D I am currently looking at a large British site that, at its worst, is creating 9 versions of every page, the majority of which resolve on static URLs.
I also just got done dealing with a site that did just this with unique identifiers. Absolutely anything would pull the file from the database, so long as the number existed in the URL.
Nine versus infinity!
Is it possible to block more than 15 parameters? I am working on a site that has a lot of parameters that should be blocked. Webmastertools is suggesting to block more than 15 parameters, would it work to block more than 15?
Very interesting summary of the interview!
For me, the 301 not passing all the juice is a new one, although it always did make some sense to me
And, as you've mentioned, I'm glad Google will do some work on the link spam area.. that will make live a lot easier :)
A whole lot wiser now!
This sort of summary is great when you don't have tons of time each day to read about SEO because you're busy doing it!
Two thumbs up for the brevity and concentration of good/useful information.
Very nice illustrations and excellent follow up questions for the next round (at SMX perhaps?). There's not much about mobile, video and RTS, all hot topics for the year and all with search ranking criteria we'd all love Google to discuss. Additionally, I think advertisers would be willing to pay hard cash to get a ORM/SPAM representative at Google they could reach out to. Call it Prepaid Google. I know at least 10 brands that would pay $2k+ per month for such a service. Thoughts?
Thanks for the great post and illustrations!
Rand, you should definitely continue to do these. I could not stop laughing, yet acknowledging the points being made.
I would definately have liked an answer to rands second question.
With the handling of nofollow changing and Google crawling/executing Javascript, what's the best way to link to a document on the web so human visitors can access it but search engines cannot WITHOUT wasting link juice/PageRank (robots.txt, for example, couldn't do this) or cloaking?
With nofollow being debunked as a usable solution, what seems to be best practice these days amongst the seomoz readers?
Great Piece Rand. Thanks for this.
I really enjoyed those cartoons! Just great and funny.
Terrific graphical illustration of the interview. When we talk about the canonicalization issue what percentage of the copied content is considered duplicate? Do duplicate titles / meta tag descriptions impact the indexation and juice flow just like the body content?
Thanks for posting this, and illustrating it for the visual laerners out there. I read and watched Cutts' post yesterday, but you guys really laid it out there. Thanks!
The illustrations are wonderful, clear, and easy to follow. Thanks for breaking this down.
Thanks for this simplified version of the interview. It was easier for me to follow what Matt had to say!
Note to fellow SEOs: You should definitely get yourself a copy of The Art of SEO! This is the book that got me on the path to success in SEO and SEM, where other books failed!
This is why I visit this site everyday. You guys rock!
Thanks for posting this, and illustrating it for the visual laerners out there. I read and watched Cutts' post yesterday, but you guys really laid it out there. Thanks!
Nice work fot these funny and clear illustrations. Also a very good content about Google SEO, easy to understand and useful!
Thanks!
Balazs
I really appreciate the cartoon slides! Having to learn SEO on the fly and without a budget has been really tough for me, but I really liked reading this post since it feels so beginner-friendly and straightforward. I will have to come back and reread a few times to get all the "meat" from it, but that's mostly because I'm still such a newbie.
A question from someone still learning the lingo... "Affiliate links" are links on a website that point to a different website? And this is not the same as a reciprocal link, correct? I want to make sure I understand this
Please continue the cartoons!
Affiliate links are a type of paid link. That is the key factor in this discussion. That, for example, you create a blog to discuss a certain a certain niche topic and then create tons of affiliate links on your blog to drive traffic to your affiliates, does the link juice get passed on these links? Or should it.
These links generally create revenue for the blog owner if someone clicks on the link or if someone completes an action (purchase) after clicking on the link.
Bottom line is that it is a paid link.
Thank you! I understand that now!
I'd have to add that this is an awesome summary. The illustrations make it easy to understand the key points, although I will read through the interview itself in more depth when I feel like I have a better attention span. :)
That was great and explains a lot.
very informative stuff through clear and understandable illustrations....
thanks for the info
Love the cartoons, makes it far easier for people to understand and digest all of Eric's information. It is good that Google are quite open with their algorithms and changes including semantics and LDA. Really interesting article displayed very effectively!
Google Expert
Older post, but still a lot of relevant content.
Really cool illustration, I LOVE it!
Awesome, very clear and descriptive. One thing is highlighted again ... Pages should be useful for the user (don't create them just for the bots)
Good questions answered, great questions left unanswered at the bottom!
Agree on the comment about twitter mentioned. Does this mean that google will be looking more towards sites such as facebook, twitter, etc.?
The last question is the one that I have been asking for a few years now. I almost am believing that google does not discount black hat techniques that were used prior to them addressing them.
Lastly, I am glad to see that google is going to move forward on tackling spam. Personally I would like to see a forum type area within seomoz or a similar site where you can go and discuss spam and then all the users that agree that it is spam report it in their webmaster tools.
I also noticed twitter was mentioned. I am curious if it was just random that he mentioned twitter of all the social media sites or if that is a hint that twitter may eventually carry more weight than others.
I liked that everything is done with beautiful and clear picturesAll would be well explained
Nice one Rand.
A picture does indeed paint a thousand words!
Thanks for sharing, very nice article. Also by anychance do you know when google next update?
Canonical tag is great but listen to what Matt said carefully, try first to fix your site artitecture because canonical tag might not help your crawling efficiency since the search engine must visit each url to see the tag, on the other hand it might help you on the long run because the search engine do not need to visit the url again.
Paramter handling on the other hand would help crawl efficiency and indexing because search engines wont visit the urls that contain a parameter you are blocking.
Read what Vannesa Fox has to say about the use of canonical tag and crawl efficiency https://searchengineland.com/google-lets-you-tell-them-which-url-parameters-to-ignore-25925
Hahaha ...
Rand, this is so funny. Spaminator Matt and spaceboy Rand :D
And btw. its quit easy to understand the point. Looking forward to more of these slides :-)
Martin
"We might put out a call for some feedback on different types of link spam sometime down the road.
That sounds really good - a huge frustration for the SEO world has been the fact that so many SEOs perceive their competitors to be outranking them with black/gray hat linking techniques and feel they must engage as well is order to stay competitive."
I'm totally on the fence with this one Rand. Part of me likes that there is a way to (potentially) remove competitors from the top SERP positions if they are there due to a bunch of black hat methods.
The other part of me doesn't like that I'd be a Google snitch. I suppose it's the old school days code of conduct rearing it's ugly head "never squeal, even if it means you sit through hours of class punishment"
I don't have an answer, it just leaves me feeling...ambivalent.
These are perfect for explaining technical SEO details to non-SEOs. Not naming any names...just sayin'. ;)
Nice one Rand, we are all waiting, looking, reading and semi-guessing what is around the corner but with this article we can get 90% of it right in anticipation for there latest update.
What's it called Ceffeine or is it Ice tea!
Chris
SEO Top Page
Great post Rand! Would you consider this Google page https://www.google.com/urchin/usac.html as breaking the rules when it comes to PR sculpting? (turn js off and take a look).
Will need to grab a LARGE cup of coffee and sit down to read this...
why dont more places utilise video? would much rather kick back, headphones in and just listen to this rather than having to read tiny text on a screen...
Thanks for this.
It is something that is getting a lot of my time at the moment!
Just purchased a copy of your book (I didn't even know it had been released already!). Great illustrations on the topics covered in the interview.
the book is fantastic !
Which book did you purchased? can you give a link?
As per usual, great post Rand.
Excellent and clear - clear illustrations thanks...!
I think your cartoon #4 301 redirects removing some % of link juice is a Google mistake.
I would wager that most websites are intially built with lots of SEO and webmaster best practices mistakes. Many sites start small and as they grow, they come back and fix things like their URLs.
This attrition via 301 encourages people to live with their poorly setup URLs in order to preserve the link juice they have worked for years to acquire.
Google says if you change your PR4 page from www.example.com/shopid3234-cat32-prod1235523 to www.example.com/23-red-widget we may reward your keywords in the URL but we will take a portion of your PR in exchange for making us think more.
If google wants to target people who buy up 10 existing domains just to 301 to their site to make it stronger, then go for it - but thats a totally different problem than internal link changes for better usabililty.
Unfortunately, people abuse this with internal links, too. For example, you could take a very popular article or blog post and 301 it to a less popular one targeting more lucrative keywords (I'm not saying you should, but you could). I think this is Google's control valve on 301s, in a sense. Not a perfect solution, but the best they've got for now.
Then it would be nice if you could tell google prior to a URL reconstruction of the whole or a clean section of your site giving you the chance to not get penalized.
The one off 301's would be most likely to be abused but not sure how you would do this for a large portion of your site.
I'd just like to point out that Rand Apparently did all of these illustrations between about 8pm and 1am.
Technically, 10pm and 1am :-)
Your boss should give you a raise or more vacation time!! =)
Neither of those things are likely to happen in the near future, sadly, but I'm a pretty happy guy in general :-)
I'll fess up that I'm another one who had trouble making it through the interview transcript.
Great illustrations - thanks for breaking it down for us ADHD / "Visual Learner" types :)!
My favorite panel is for point #5. So a low quality page has: few links and sharing functions. So sharing functions is another indicator to Google that that "real users like this page"? But can't the tweets and diggs and others be manipulated? And what about if you have a lot of blog posts that aren't tweeted, etc? Does that bring down the overall SEO quality of your site?
Nice visualization rand! You always make things more understandable ;)
This graphical presentation of Matt's primary points has great added value. No problem reading it from start to end. Thanks a lot.
Unfortunately, my level of English is very low and have pictures taken at the level of intuition and basic vocabulary knowledge. Plain text translator copes, but the text in the figures for him a mystery. It would be good to take stock of graphic materials list.
I'm going to assist here and translate - I think our friend Xstroy is saying that those readers who use translation software to read the blog miss out on illustrations that contain blocks of text (because the translators can't read that image-based text). A fair point and food for thought.
Is that Dr. a doctorate in translation? Well done Pete!
Nice post. I did a similar post on my blog, but it's much funnier in terms of the animations.
It's Google, SouthPark style ; )
ImJonTucker(dot)com/google-marketing
it's always distilled down quite a bit, as my audience is small business owners and not pro SEO's, but it may be a good way to explain SEO to the non-SEO.
Thanks for this. I have been trying to figure out a simple way to "show" my management team the key points from this interview. I knew if I just had them read it they would be tuned out rather quickly.
Way to make these items identified by Mr. Cutts and co. much easier to digest :)
Thanks!
Ideally keep all your pages 2 hops away from the the home page, where possible. If you have a wordpress blog, and you have 100 categories, each category has 100 posts, you can have a site with 10k pages - and for each category page, keep 100 links to the individual posts. I think this is the cleanest way to have sites that are 10k pages or less. You just need to plan the categories well. And yes, do not keep category links on all individual pages. Just have the category links on the home page. This will ensure a linear link juice flow. Plus, use plug-ins such as SEO smart links to make links (with appropriate anchor text) flow from relevant pages and you have a killer site architecture.
Rand I like the last part of your presentation, #6, Amazon employ white hat cloaking and both Amazon and the engines win so what is the problem if Google come out and say get rid of duplicate content this way.
>Matt Cutts: (with regard to links in ads) Our stance has not changed on that, and in fact we might put out a call for people to report more about link spam in the coming months. We have some new tools and technology coming online with ways to tackle that. We might put out a call for some feedback on different types of link spam sometime down the road.
Ha - so is that their new technology? getting people to report stuff? nothing new then - as usual :D
> That sounds really good - a huge frustration for the SEO world has been the fact that so many SEOs perceive their competitors to be outranking them with black/gray hat linking techniques and feel they must engage as well is order to stay competitive. Shutting this down or making SEOs feel that Google is taking consistent action when obvious manipulation is reported would go a long way to quelling this thorny problem.
Well dohh - SPAM=Sites Positioned Above Mine - frustration, what frustration? you do not rank for pharmacy terms without forum spam and comment spam etc etc, and there are other markets where other types of links are required to rank. You just gotta do what you gotta do. Besides I've seen Google overlook so many really bad things being reported, I mean things outlawed by their webmaster guidelines even - not just those somebody considers "manipulation" subjectively.
Thanks mate for the the wonderful descriptive illustration. Its been of great help. Now i understand slightly better what should be done to make google bot visit my site