It's that time of the month... There's new data in Linkscape's index, which means Open Site Explorer, Moz's SEO Toolbar, our API and many of our other tools all have new links and metrics. Here are the stats for index 36:
- 41,278,073,331 (41.3 billion) URLs
- 491,446,900 (491 million) Subdomains
- 114,288,802 (114 million) Root Domains
- 416,475,264,279 (416 billion) Links
- Followed vs. Nofollowed
- 2.23% of all links found were nofollowed
- 57.63% of nofollowed links are internal, 42.37% are external
- Rel Canonical - 7.43% of all pages now employ a rel=canonical tag
- The average page has 60.88 links on it (down from 61.69 last index - this has been dropping from index to index for many months now)
- 51.50 internal links on average
- 9.38 external links on average
We're in the process of testing some much larger indices, as well as some new processes for index updates that will allow us to maintain several levels of freshness in our dataset. Look for some exciting news on this horizon in the next 2-3 months.
In addition to the Linkscape update, I'm excited to share some very early results from SEOmoz's 2011 Search Ranking Factors. The results this year come from two sources - the opinions of SEO professionals via survey AND a broad correlation analysis of Google's results. This morning at SMX Munich, I presented the following slide deck:
This is still early data and not fully vetted, so please give a bit more leeway than normal. That said, here were some of the top takeaways for me, personally.
- Facebook + Twitter are Big. Even if you don't believe that it's having a major impact on the rankings directly, the correlation shows that those sites/pages which perform well in social media outperform in search rankings. If you're in either profession - SEO or social media marketing - you should probably be working to strengthen your skillset on both sides.
- Partial Anchor Text. This was heavily discussed on several panels today, and the opinions of voters suggest that SEOs have noticed too - exact match anchor text seems to be less powerful than partial match anchor text (at least, sometimes). To be honest, I voted and thought that exact was still stronger, but it could be that Google's getting pickier about spammy, overly precise keyword-match anchors (and that's a good thing, IMO).
- Exact Match Domains? The correlation this round is strikingly lower than last June's data. Perhaps Google's taken some action here, or maybe other factors are at play, but either way, the votes from the SEOs suggest that we haven't yet seen the bottom of exact match domain name value.
- Something's Funny About Nofollows. The correlation data here is downright weird. Maybe nofollowed links are simply well-correlated to followed links, maybe Google's actually using some nofollow signals or perhaps, nofollows often come from good places (like social media sites, profiles, blog comments, discussion forums, etc.) that correlate to good, useful, high quality sites/brands.
- Domain-Wide Signals Matter. After the Panda/Farmer update, no one in SEO should be surprised that Google's looking at domains as well as pages, but the correlation and the strength SEOs ascribed in the voting both surprised me.
I'm really excited about the full report and data, which should be released sometime in May.
BTW - If you have questions about the correlation data specifics, feel free to post below, but know that we haven't done all the work or released all the data we intend to in the future. As before, we'll be making the entire dataset available so anyone can replicate our results.
In correlation to nofollow links and the use of Twitter/Facebook; I have seen some pretty compelling evidence recently while developing a new website that can support these arguements.
I have noticed that by using a nice steady stream of nofollow back links, on quality domains, has been the main push behind the domain climbing the SERPS and ranking well. The domain does have some quality, link juice passing back links, but the movement did not initiate until the nofollow back links started. This information corresponded directly with information seen in my Webmaster Tools. Don't get me wrong, you still need link juice to continue this march in the SERPS, but the power of nofollow links is starting to change (in my opinion).
As for utilizing the Twitter and Facebook world, yes there is some great strength here. The main problem I have noticed is that those who use this type of media to back link only to their domain have been devalued by Google. The crawlers see are only seeing links going back to your domain and nothing else. No diversity. The accounts that have been valued the most are those that interact, have a diverse range of back links (new articles, blogs, etc on other domains), receiving mentions back to your account (@accountname) and remain active (providing a steady stream of links and updates). Matt Cutts has mentioned that Google's crawlers see Twitter no differently than a regular page on a website. Why treat it differently? Once again, I've correlated this data with information in Webmaster Tools.
I hope this made some sense. Btw, I was actually more excited about the Linkscape update, but this post just gave me something else to comment on. Thanks Rand!
This is a great response, thank you for sharing your personal experience.
Thumbs up.
DATA! Om nom nom. Thank you!
My thoughts exactly. There's something nifty about going through a 45 page slideshow and seeing all the colorful graphs and charts. Very informative and very sexy!
Agree with Kris - yummy data & simple. I especially like slide 3, opening disclaimer!
WELL done!
Lovely lovely data. :-)
I'm finding more competitor links through open site explorer with the new linkscape update. However, I am also seeing that Google has not cached all of these links. Does this mean that open site explorer is crawling deeper than the google bot in these instances? Would it be safe to say that a link is worthless if it's not cached?
Smart question. If the linking page isn't in Google's index, it most likely won't provide any ranking value. It might in Bing/Yahoo depending if they have an index of the linking page.
That kind of shows that there are certain differences in how both bots work. Open Site Explorers bots don't have the same goals as Google does since Google is a search engine and Open Site Explorer has built a very strong backlink graph but isn't a search engine.
Really excited to see the report... so much has happened in the last year!!
It would be very interesting if SEO Moz could analyze sitewide data for sites that took a hit in the Panda update, and what correlations seem most important for keeping sites strong and out of a Panda negative algorithm filter.
Indeed we did! https://www.seomoz.org/blog/googles-farmer-update-analysis-of-winners-vs-losers Unfortunately, we found no link-based factors that seemed to predict the changes. My suspicion, personally, is that user/usage data were the vast majority of this update (and Google used a machine learning model based on the input of their quality raters).
You guys put together some great information in that article.
To follow up on your suspicion, if Google used user/usage data (that's an awkward phrase :) then it seems like it may be a bit difficult for sites to dig themselves out of any Panda penalties placed on them unless they clean up areas on their sites that they believe may have set the user/usage scale to tip against their favor.
What's unkown is whether or not making sites changes to fix any potential Panda issues will fix rankings automatically over time, as long as the right changes that need to be made are made to improve negatively affected sites. With a pure algorithmic update, if you fix the right areas on your site, you should improve your domain score that may have been dinged after the Panda update.
With a system of user gathered data and a machine learning model based on the input of quality raters, it seems like webmasters would have to contact Google to re-assess the pages that they reviewed to re-evaluate their scores that were attributed into the machine learning model.
I hope that my comment makes sense. Basicly said, site owners may need to both make fixes on their sites and THEN contact Google to be re-evaluated.
The latter seems close to impossible to achieve though as there aren't many means for contacting Google for any sort of evaluation.
In regards to some of the new rankings metrics that you guys put together, the domain brand level metrics seem VERY important on high search demand, high traffic broad keywords. I have seen Ebay, Walmart and Amazon move up recently for all sorts of high level queries where their pages have little unique content, little to no external links pointing to them, and a not so optimized internal structure of linking to the pages. They are thriving on their strong brand and internal strength flowing through their sites. Google must want to rank these pages as searchers are probably using the broad phrase plus "amazon" or broad phrase plus "walmart". They want to get users to where they want to go, so these must be updates to get them their quicker.
Contacting Google could be as simple pinging the search engine or publishing a new, detailed article so that the bot can come find it. Once that happens, if the structure of your site is sound, the bot will travel around your site and it'll look at all the pages. If it fixes any of the problems that got it slammed because of Panda, that'll probably go into effect.
However, if you do use Webmaster Tools to tell Google that you have fixed your site they are generally very responsive. I have seen sites going from being kicked out of the index back to their usual rankings between leaving the office late one day and getting in early the next morning - no waiting around needed.
My own data reflects duplicate content being the biggest loser of the update. I have seen my two testbed sites (One running unique content, one running spun content from Unique Article Wizard) drop dramatically in rank. Meanwhile my unique content sites were barely affected by the change, many of them even went up.
A huge amount of the scaremongering from the panda update came from article directory communities which allowed duplicate content. My reasoning is that these sites contained duplicate content which lowered the overall domain value, negatively affecting articles which previously gained more benefit from the domain trust of a site.
I always thought that nofollow in fact had some weight, and links are easy to find!
Curious about the Twitter link data.
On Twitter most URLs are shortened and on Facebook most are left full length. Did your data follow the links throuhg bit.ly, goo.gl etc.. to find the redirected link?
They do, since shortened URLs almost all use 301 redirects which both our index and the search engines will follow.
avoid ht.ly - increasingly popular but bars the juice flow..
j.mp doesn't do 301 either, at least the last time i checked
j.mp is from bit.ly actually and it does do a 301. FYI :)
Thanks for the heads up. I wanted to mention hex.io as one service you wanna avoid but they are redirecting to j.mp. Hex.io did 302 redirects sometime ago...glad bit.ly changed that!
Bit.ly is my personal fav right now but you could use Google's own shortener too. I believe they ll introduce some nice tracking tools and build on it really well considering how they want to get in on the social media.
Increadiably you have huge knowledge and data sets, as well as analysis power you have Rand, I must say you are mind blowing man.
I wonder if the questions that led to the drop in links being such a big piece of the pie had been asked six months ago how different the results would be. We only started noticing links having less effect around about then (or maybe even less than six months ago). Is that the same for anyone else?
Finally got some concerte data to support my facts, I was damn sure about the nofollow tag & after seeing the correlation data here I can say I was right about nofollow tag from the beginning. Also other then social media site Facebook & twitter, I found digg & stumbleUpon have a great impact on rankings.
After this post, I have concrete data to support my opinions regarding the ranking factors.
Call me crazy but for the last two weeks I've seen a lot of nofollow links help in SERPs.
My tests weren't controlled, so I can't be sure, but I have a strong hunch that they are now supporting followed links.
Hello,
I need help with understanding of "Keyword-agnostic features" in the presentaion (slide 14th). What does it mean?
Great job!
Thanks
Adrian
For the calculation of correlation, did you use Spearman's correlation coefficient?
my bad.i did not pay attenton to the first slides
Still kind of unfair for websites with really great content but with a nische where the visitors aren't that oriented in Facebook. For us geeks, Facebook and Twitter are obvious - but far from everyone thinks the same.
Thx for the post Rand. The presentation was great. I wonder how much has changed a year later... New article came out talking about the spam accounts on FB & Twitter...
So now I want a pet dolphin. No, seriously this has been really helpful andyes Correlation is not Causation but correlation is a pattern and patterns help to inform our decisions. Great slide show - thanks so much for sharing and this can help inform some of my clients as to the reasons why SEO is not as straight forward as they think it might be!
thank you very much. it's helpful to me.
It's nice to see our inferences substantiated. Our take is that this is spot on. Our thoughts and observations:
1. Twitter rocks - Can't speak to FB as our data sets are too small to budge the social graph in our favor
2. Anchor text is all about the quality of the description of the image and then info provided (no dupes)
3. Exact matches are being devalued without proper support (a little promotion goes a long way with these still)
4. I think Google has effectively disregarded the nofollow (just an observation)
5. Domain Signals - Fundamentals and attention to detail still win the day. (i.e. structure, navigation, semantic markup, etc.)
Thanks again. I love this site. :)
Terrific presentation. Thanks for sharing it. Google has instructed webmasters to not get carried away with no-follow so it seems natural (to me anyway) that they would start utilizing no-follow more. Too many newspaper sites, for example, use no-follow and Google doesn't want to exclude their votes.
My Santa wish list:
- Lots more social data and analysis. Does a social vote "last" as long as a link? What are the best kinds of social votes/mentions/shares, etc.? What is the impact of a social vote at the page level vs. domain level?
- Cut the data by site type. What are the differences between a shopping site and a content site? What are the strongest factors for shopping sites?
Great information. Now I just want more. :)
I'm curious why some of the respondants weren't sure about Class C IPs and the difference between internal/external links - can we be sure of their other opinions if they're confused about SEO basics?
Hi, whats the difference between a Facebook "share" and a "like" same thing?
paul, read here: https://www.seomoz.org/blog/early-ranking-factors-data-an-april-linkscape-update#jtc138163
@SeoMoz Staff <---
Where can i get a link to a small dataset (or last years?) like the one in the slide presentation of awesomeness?
Cuz I'z guessing the full dataset isn't like --tiny ;)
Totally want to run some tests @ home - thnkx!
Thanks for the question! We're still in the preliminary stages of gathering and understanding the data so we don't have a download of it available yet. However, when we put out the full version of the Ranking Factors we will have portion (if not all of it) available for download. Thanks!
can you plz just send a message to my account when it's available - ty!
You'll definitely want to watch the blog for when the data comes out. We haven't set up a specific list to notify people when it's ready. Thanks!
How specific was the question on document length? I mean, could it have been answered by people percieving the question as "How much effect for a page if the length is between 50 words and 300?" or "How much effect for 500 words on a page or 1,000?" Because the former would of course get all votes for longer documents, but not nesessarily the latter... right?
I actually had the opposite experience with exact match vs partial match. After February the rankings for exact match anchors stood pretty much the same where rankings for partial match keywords dropped on few sites.
As always great info ... and it's doubles when it's confirmes your ideas :)
I love the dolphin analogy in your slides Rand. Perfect explanation. Consider it stolen!
Kevin
Rand,
Some great data. I have always been a fan of the work you have undertaken in analysing potential factors with actual rankings, and it is good to see some much-needed extensions to the data set. I had always felt that 10-20 results per query was too low, as I mentioned after the whole LDA data discussion, so it is nice to see these beefed-up results.
I think the real takeaway is just how much social media is influencing the web and just how much it has grown even since 2009. However, I still look upon this data as dubious when it comes to ranking causation - my opinion is that people just want to share content they like, so good content will be linked to from all types of media. Having said that, I think how a piece of content is being shared across the web can be measured well by looking at social signals. What is your opinion on social directly affecting search engine rankings?
I was on a panel with a Googler and a Bing guy directly after presenting this and both were quite keen to say that they use and will continue to use social data directly. Google did say, however, that, at least for "likes" in Facebook, they're "significantly less powerful than a high quality, trusted link." They weren't as forthcoming about the potential influence of tweeted links or Facebook "shares."
If i had only one SEO post to read per year... it would definitely be this one ... GREAT POST RAND!!! Thanks for your insights!!!
I guess that the document length is to be expected - the longer a document (within reason) the more informative it is. I'm looking forward to finding out more about if / how Google uses user behaviour in its analysis.
More data to analyse in order to improve our rankings and stay ahead of the games
I'm looking forward to the full report in May
At last the first datas - we have been long waiting. After following some tweets of your presentatation in Munich today I hoped to see them soon :-).
I have two questions:
1) Exact match anchor text appears slightly less well correlated than partial anchor text in external links. Do I well understand the meaning: If my keywords are e.g. "Hotel Austria", then the link with the anchor text "Hotel Austria" correlates worse then only "Hotel" ?
2) On slidedeck 27 - why do you suggest that some voters didn't fully understand the internal / external link anchors choice?
Have a nice time in Germany!
On #1 - The data suggests that those pages with more links that contain partial match anchor text have an even higher correlation with higher rankings than those that contain exact match anchor links. Voters, too, seemed to feel that partial match was quite powerful/influential. That doesn't necessarily mean that in trying to rank for "Hotel Austria" you'll get more value out of links that say "Hotel" - I believe our definition of partial match was more similar to "phrase match" (will double check with Matt), so perhaps "find a great hotel in austria" or "hotel austria rates and locations" would be good choices. It definitely corresponds to a more "natural" link profile, too, which could explain some of that correlation.
On #2 - We meant putting links in anchor text on the page itself, but I'm worried that many voters interpreted that as "external links with anchor text" which was asked in a separate section. We might need to remove that from the final report as I don't want confusion due to our poor survey design improperly influencing the findings.
I love these posts, OSE updates make me want to rush off and do lots of analysis anyway, but an update coupled with lots of data and food for thought is even better! Thanks! - Jenni
Hi Rand
Great post as always and thanks for the great data, is there any way (statistically) to filter out or take into account interconnected factors like nofollow vs followed/social signals or domain age vs. PageRank?
Thanks
Mark
I just love this data. My brain freaks out and I added 6 new things to test in my "SEO - Todo" list based on this. The Anchor text really surprised me. I was certain "perfect" anchortext was the only really strong option and I never considered something else. But it makes sense that partials is just as valuable. I learn more from this data then I do from all SEO Blogposts in a year :-) Thanks a bunch Rand !
Best regards,
Niklas Aronsson
Is facebook share the same as facebook like? I'm a little confused.
Facebook 'share' is posting it on a friend's wall, as a status update or on a page/group, whereas as 'like' is effectively a vote for that content or page
A "share" isn't just posting on a friend's wall, it's posting on your own wall as well.
"share" - when the link is posted as a link or status update on ANY wall (yours, friends, pages)
"like" - when a 'page', comment, video, photo or note on facebook is liked or when a page on another website is 'liked' using the OpenGraph API, generally through a button with a thumbs up that says "Like This" or a link on facebook directly below the content that says "like"
Share requires the user to copy paste the entire link into their status update. No surprise that Google/Bing would want to count this, if they do. Back in the day of Google's birth everyone had a webpage on a free hosting site, then everyone had a blog, now everyone has a Facebook. It's basically a link, but they are nofollowed by facebook.
Like requires less savvy and effort and takes less time to do, in most cases a single click.
Question: Would google/bing weight the a link posted by a facebook profile with 4,000 friends different from one with 500? What about a 'page' with 100,000 'like's as opposed to one with 200?
One thing I've been wondering if it's possible to quantify - how much of the Survey is based on hard data vs gut feelings? Both can be accurate, but it would be interesting to get an idea of the ratio :)
I do not pretend to speak in the name of SEOmoz, but the methodology used seems quite clear if you see quietly the preliminary infos in the preso:
1) List the potential ranking factors nowadays;
2) Test them aseptically;
3) At the same time ask to a panel (quite large) of "smart guys" (using the Rand words);
4) Compare the result given by the tests with the answers given by the partecipants in the surveys, in order to see if the test datas are perceived as ranking factors also by actual SEOs.
If I am wrong, please Rand correct me :)
That's pretty much what we've done. On the survey side, though, it's tough to know for the 132 respondants, how many are actively testing vs. simply observing and reporting their gut instincts. This, however, is our goal - to compare opinion data alongside correlations (and, in the future, causal, modeled data).
I put a Facebook link on my clients website (in the header) and I started getting people to hit the "Like Button". I wrote an email to everyone on the list and a few people actually linked to my client's website! Free Free Free.
So a no follow link is now better than a follow link. That's what I get from this.
That's a terrible takeaway! As the deck noted, with cool data like this comes a responsibility to share it accurately. We don't know why nofollow links are so well correlated; they could have a lot of overlap with followed links, they could correlate well with sites that participate on the social web heavily (which also means they get lots of other positive signals). An overly simplistic explanation or takeaway can be a dangerous thing with this data, so please use responsibly :-)
Thank you for the info. It is very interesting stuff.
correct me if i'm wrong here rand: lots of large sites on the internet only offer nofollow links, i.e. wikipedia and twitter. the correlation could just be showing that these sites with nofollows are listing sites that already have a good amount of dofollow links from around the web and therefore rank well and are considered relevant enough to be mentioned with nofollow links.
however, my take on it is to continue to pursue those high level nofollow links, you never know when someone will take those link resources/references and add them to somewhere else on the web as a dofollow link.
It was quite interesting to see the results of this survey. The question, in this case is, is the majority right?