Existential threats to SEO
Rand called "Not Provided" the First Existential Threat to SEO in 2013. While 100% Not Provided was certainly one of the largest and most egregious data grabs by Google, it was part of a long and continued history of Google pulling data sources which benefit search engine optimizers.
A brief history
- Nov 2010 - Deprecate search API
- Oct 2011 - Google begins Not Provided
- Feb 2012 - Sampled data in Google Analytics
- Aug 2013 - Google Keyword Tool closed
- Sep 2013 - Not Provided ramped up
- Feb 2015 - Link Operator degraded
- Jan 2016 - Search API killed
- Mar 2016 - Google ends Toolbar PageRank
- Aug 2016 - Keyword Planner restricted to paid
I don't intend to say that Google made any of these decisions specifically to harm SEOs, but that the decisions did harm SEO is inarguable. In our industry, like many others, data is power. Without access to SERP, keyword, and analytics data, our and our industry's collective judgement is clouded. A recent survey of SEOs showed that data is more important to them than ever, despite these data retractions.
So how do we proceed in a world in which we need data more and more but our access is steadily restricted by the powers that be? Perhaps we have an answer — clickstream data.
What is clickstream data?
First, let's give a quick definition of clickstream data to those who are not yet familiar. The most straightforward definition I've seen is: Clickstream data is the data collected during clickstream analysis which includes pages a user visits and the sequential stream of clicks they create as they move across the web, hence "clickstream data".
If you've spent any time analyzing your funnel or looking at how users move through your site, you have utilized clickstream data in performing clickstream analysis. However, traditionally, clickstream data is restricted to sites you own. But what if we could see how users behave across the web — not just our own sites? What keywords they search, what pages they visit, and how they navigate the web? With that data, we could begin to fill in the data gaps previously lost to Google.
I think it's worthwhile to point out the concerns presented by clickstream data. As a webmaster, you must be thoughtful about what you do with user data. You have access to the referrers which brought visitors to your site, you know what they click on, you might even have usernames, emails, and passwords. In the same manner, being vigilant about anonymizing data and excluding personally identifiable information (PII) has to be the first priority in using clickstream data. Moz and our partners remain vigilant, including our latest partner Jumpshot, whose algorithms for removing PII are industry-leading.
What can we do?
So let's have some fun, shall we? Let's start to talk about all the great things we can do with clickstream data. Below, I'll outline a half dozen or so insights we've gleaned from clickstream data that are relevant to search marketers and Internet users in general. First, let me give credit where credit is due — the data for these insights have come from 2 excellent partners: Clickstre.am and Jumpshot.
Popping the filter bubble
It isn't very often that the interests of search engine marketers and social scientists intersect, so this is a rare opportunity for me to blend my career with my formal education. Search engines like Google personalize results in a number of ways. We regularly see personalization of search results in the form of geolocation, previous sites visited, or even SERP features tailored to things Google knows about us as users. One question posed by social scientists is whether this personalization creates a filter bubble, where users only see information relative to their interests. Of particular concern is whether this filter bubble could influence important informational queries like those related to political candidates. Does Google show uniform results for political candidate queries, or do they show you the results you want to see based on their personalization models?
Well, with clickstream data we can answer this question quite clearly by looking at the number of unique URLs which users click on from a SERP. Personalized keywords should result in a higher number of unique URLs clicked, as users see different URLs from one another. We randomly selected 50 search-click pairs (a searched keyword and the URL the user clicked on) for the following keywords to get an idea of how personalized the SERPs were.
- Dropbox - 10
- Google - 12
- Donald Trump - 14
- Hillary Clinton - 14
- Facebook - 15
- Note 7 - 16
- Heart Disease - 16
- Banks Near Me - 107
- Landscaping Company - 260
As you can see, a highly personalized keyword like "banks near me" or "landscaping company" — which are dependent upon location —receive a large number of unique URLs clicked. This is to be expected and validates the model to a degree. However, candidate names like "Hillary Clinton" and "Donald Trump" are personalized no more than major brands like Dropbox, Google, or Facebook and products like the Samsung Note 7. It appears that the hypothetical filter bubble has burst — most users see the exact same results as one another.
Biased search behavior
But is that all we need to ask? Can we learn more about the political behavior of users online? It turns out we can. One of the truly interesting features of clickstream data is the ability to do "also-searched" analysis. We can look at clickstream data and determine whether or not a person or group of people are more likely to search for one phrase or another after first searching for a particular phrase. We dove into the clickstream data to see if there were any material differences between subsequent searches of individuals who looked for "donald trump" and "hillary clinton," respectively. While the majority of the searches were quite the same, as you would expect, searching for things like "youtube" or "facebook," there were some very interesting differences.
For example, individuals who searched for "donald trump" were 2x as likely to then go on to search for "Omar Mateen" than individuals who previously searched for "hillary clinton." Omar Mateen was the Orlando shooter. Individuals who searched for "Hillary Clinton" were about 60% more likely to search for "Philando Castile," the victim of a police shooting and, in particular, one of the more egregious examples. So it seems — at least from this early evidence —that people carry their biases to the search engines, rather than search engines pushing bias back upon them.
Getting a real click-through rate model
Search marketers have been looking at click-through rate (CTR) models since the beginning of our craft, trying to predict traffic and earnings under a set of assumptions that have all but disappeared since the days of 10 blue links. With the advent of SERP features like answer boxes, the knowledge graph, and Twitter feeds in the search results, it has been hard to garner exactly what level of traffic we would derive from any given position.
With clickstream data, we have a path to uncovering those mysteries. For starters, the click-through rate curve is dead. Sorry folks, but it has been for quite some time and any allegiance to it should be categorized as willful neglect.
We have to begin building somewhere, so at Moz we start with opportunity metrics (like the one introduced by Dr. Pete, which can be found in Keyword Explorer) which depreciate the potential search traffic available from a keyword based on the presence of SERP features. We can use clickstream data to learn the non-linear relationship between SERP features and CTR, which is often counter-intuitive.
Let's take a quick quiz.
Which SERP has the highest organic click-through rate?
- A SERP with just news
- A SERP with just top ads
- A SERP with sitelinks, knowledge panel, tweets, and ads at the top
Strangely enough, it's the last that has the highest click-through rate to organic. Why? It turns out that the only queries that get that bizarre combination of SERP features are for important brands, like Louis Vuitton or BMW. Subsequently, nearly 100% of the click traffic goes to the #1 sitelink, which is the brand website.
Perhaps even more strangely, pages with top ads deliver more organic clicks than those with just news. News tends to entice users more than advertisements.
It would be nearly impossible to come to these revelations without clickstream data, but now we can use the data to find the unique relationships between SERP features and click-through rates.
In production: Better volume data
Perhaps Moz's most well-known usage of clickstream data is our volume metric in Keyword Explorer. There has been a long history of search marketers using Google's keyword volume as a metric to predict traffic and prioritize keywords. While (not provided) hit SEOs the hardest, it seems like the recent Google Keyword Planner ranges are taking a toll as well.
So how do we address this with clickstream data? Unfortunately, it isn't as cut-and-dry as simply replacing Google's data with Jumpshot or a 3rd party provider. There are several steps involved — here are just a few.
- Data ingestion and clean-up
- Bias removal
- Modeling against Google Volume
- Disambiguation corrections
I can't stress how much attention to detail needs to go into these steps in order to make sure you're adding value with clickstream data rather than simply muddling things further. But I can say with confidence that our complex solutions have had a profoundly positive impact on the data we provide. Let me give you some disambiguation examples that were recently uncovered by our model.
Keyword | Google Value | Disambiguated |
cars part | 135000 | 2900 |
chopsuey | 74000 | 4400 |
treatment for mononucleosis | 4400 | 720 |
lorton va | 9900 | 8100 |
definition of customer service | 2400 | 1300 |
marion county detention center | 5400 | 4400 |
smoke again lyrics | 1900 | 880 |
should i get a phd | 480 | 320 |
oakley crosshair 2.0 | 1000 | 480 |
barter 6 download | 4400 | 590 |
how to build a shoe rack | 880 | 720 |
Look at the huge discrepancies here for the keyword "cars part." Most people search for "car parts" or "car part," but Google groups together the keyword "cars part," giving it a ridiculously high search value. We were able to use clickstream data to dramatically lower that number.
The same is true for "chopsuey." Most people search for it, correctly, as two separate words: "chop suey."
These corrections to Google search volume data are essential to make accurate, informed decisions about what content to create and how to properly optimize it. Without clickstream data on our side, we would be grossly misled, especially in aggregate data.
How much does this actually impact Google search volume? Roughly 25% of all keywords we process from Google data are corrected by clickstream data. This means tens of millions of keywords monthly.
Moving forward
The big question for marketers is now not only how do we respond to losses in data, but how do we prepare for future losses? A quick survey of SEOs revealed some of their future concerns...
Luckily, a blended model of crawled and clickstream data allows Moz to uniquely manage these types of losses. SERP and suggest data are all available through clickstream sources, piggybacking on real results rather than performing automated ones. Link data is already available through third-party indexes like MozScape, but can be improved even further with clickstream data that reveals the true popularity of individual links. All that being said, the future looks bright for this new blended data model, and we look forward to delivering upon its promises in the months and years to come.
And finally, a question for you...
As Moz continues to improve upon Keyword Explorer, we want to make that data more easily accessible to you. We hope to soon offer you an API, which will bring this data directly to you and your apps so that you can do more research than ever before. But we need your help in tailoring this API to your needs. If you have a moment, please answer this survey so we can piece together something that provides just what you need.
Good article.
I sincerely hope that Google don't push their war on data too far. As they've repeatedly said, they focus everything on making the search experience better for the user. I think webmasters being able to contribute to that involves some need for relevant data.
Take suggest scraping. Whilst it can be used in detrimental ways, it can also allow webmasters to hone in on the needs of web users and then create content which provides solutions to those needs. This contributes to making the internet a better place.
Encouraging people to manage their sites "naturally" (i.e. only think about the user and little else) is clearly Google's intention, but making webmasters blind is not going to be helpful either and could easily be detrimental to the quality of the search. If people can't see the target to shoot it, many will just fire blindly until they hit it.
It will never stop people trying to tailor their sites to grab as much traffic from Google as they can. The intention will always be there. The methods of data collection will just change, and gradually move this further and further away from Google's sphere of influence.
This has long been my shared concern as well. Being an amazing business is not identical with knowing how my users search for my product or services. This means that there is a disconnect between users and the solutions that could most make them happy. Google would like force businesses to use their paid platform to bridge this gap, but that ultimately hurts users.
I agree with you. we must also strive to optimize our efforts.
Very detailed post giving the current situation regarding clickstream data today. With the amount of personilization that Google has introduced in the search results page where results not only vary across google country domains , they vary across places as well.
Also the amount of blending (Adding video, images, news etc in the SERP) in the search results page are different across queries as well as sectors or niches.
The only reasonably accurate keyword data that we can obtain for free is the webmaster tool query report which for a new website will not mean anything as they are unlikely to rank. The Google Adword tool only shows wide ranges. So new websites will find it even more difficult as compared to established sites.
In this scenario SEO's who have been in the field for a long time and have worked across sectors will make a calculated guess based on old data, currently available estimates and their experience.
The Keyword tool provided by other search engine i.e. Bing is not very accurate as many keywords with a search volume of 10 to 1,000 in Google worldwide do not show up here due to it's smaller marketshare.
This scenario will increase the importance of tools like Moz keyword explorer as they not only show keyword estimates but also keyword difficulty and opportunity.
Thanks for your response! Unfortunately, we are finding that the Search Console query report isn't particularly accurate (especially with respect to rankings). It seems everything must be looked at with a suspicious eye.
A future post about Google Search Console's search analytics insights and learnings would be great. I wonder how they calculate the average ranking position for a certain timeframe. Is it median or average?
I second the idea of a Search Console accuracy post since I rely on that heavily!
Google Market Insight Finder Tool can also be helpful . Thanks for Sharing this Detailed post , will review further .
Quite an informative post on clickstream data ! We are definitely headed towards a path where Google will share lesser 'general' data and more 'personal' data. Hence we see a lot more data about our own site coming through in analytics and webmasters. But on the flipside they are slowly taking away a lot of general data collection tools has you have already mentioned in the post above.
Russ, it's mildly shocking that others doing research on keyword data are so far behind you and your team. Why? Because many, many tools use data that live "downstream" from search volume and CTR. Again, DownStream. That includes those just doing content strategy and planning: a more broad and far larger group than SEOs. For the very large websites, perhaps some of their ranking positions get normalized to show better how they do compared to their other websites.
But for most small businesses, their ranking in tools or Google Keyword Planner is plain stupid. These businesses/websites -- and our assumptions-- about traffic can swing from 2x too low to 20x too low or depending on CTR you discussed and Clickstream. I know that from overseeing major accts at SEMrush and working with the customer success team that fielded calls from smaller websites. Yes, SEMrush does 100s of other things, but most of them are downstream from clickstream data.
Not sure you've mentioned this much, but local search absolutely destroys results for locally based businesses. Yes, they can do a "custom crawl" with tools, but the results in tools, including yours, are meaningless if it's, for example, a "miami day spa." So, I think custom crawls are a key challenge. Moz KW blows away the competition for national searches and the challenges without personalized click data. Not sure how much you guys can crawl based on a city/geo search but for so many websites local/maps skews clickstream as much as anything else. Agree? You're making me feel woefully inadequate with my expertise on KWs (although I do stray into other areas, especially natural language processing and the Google AI Robot's abilities there). Keep it up. You're raising the bar for so many of us.
Seeing those words "not provided" is killing me every time I do an analysis on which keywords organically find their way to our site. My remedy is looking at the search console queries but sometimes I find some strange pattern on the webmaster tool. Some keywords are repeating and some only have one different word.
Example: cat food for cats, cat food for siamese cats
And both of them are different queries based on Google's search console.
Google is really making optimization more and more challenging.
Can we get a new search engine? :)
Perhaps a letter offering a DOS attack on their Keyword planner if this is not fixed up? Joking! Or, perhaps just pay Russian .gov since they were so effective in hacking servers to effect outcome of US elections and don't like any free speech found in Google's AdWords planning tools. Also just joking!
With the recent algo update from google, how is the use of DNI (dynamic number insertion) affected? Is it still ok to use DNI without being penalized from google? Thanks!
Hi Russ,
i have heard about your mentioned system this was a great article that would be great. can you give me some ideas about good publishers. i learnded some great plan. I am #Gustavo Woltmann from Panama.
thanks
Improvements in targeting enable advertisers to more accurately identify consumers’ interests. Then Google is restricting some Clickstream information Data Because they know it will reduce there revenue , since the auction will be less competitive . Since a Particular Publishers understands Exactly what its users wants then it will definitely reduce bidders .
Then that is what brought about Google making the Data more Private by only showing Data to those who are ready to bid , it will get to a time when we will have to pay to take a look at this Click stream Data.
Very nice read, thank you for all the time and effort!
Here's a thought...
Maybe....just maybe...one day....google will realize that their actions are pretty much shaping the internet....
Hey Russ,
Thanks for the great post I totally agree with you. Google is slowly forcing SEO's to start using Adwords and get paid traffic. The top 4 paid results has effected the organic click through rate a lot. Back in 2011 or 2012 if our CTR was 27% on 1st or 2nd position now its down to 5% to 10% on the same position and its all because of Ads.
Google is improving their ads so that advertisers get better CTR and is not really doing much for SEO's
Thanks,
Nouman
I think this will be the case going forward. Any opportunity Google has to replace natural results with paid, and where consumers don't seem to mind, they will. We just need better tools to help us uncover what opportunities remain.
Google's taking it to ridiculous extremes. I get hardly *any* organic results in the first page of my SERPs now, what with ads and maps and local and snippets....
You are correct I can barely see the title of the #2 organic result on some search terms
---------
Paid
---------
Map
--------
List of Places (Business)
-------
#1 Organic Result Title
#1 Organic Result Description
------
#2 Organic Result Title
------ <-------- Page cuts off and you have to scroll down to see anything.
Adwords can really serve as the a layer to marketing to really help SEOs retarget in the lead prospecting and lead nurturing phase. For a long time I just want not interested in pay per click, but now it's very difficult to convert and depending on what niche you're in, it's hard to get the TRUST. I did not pay ppc much attention in the past, but this year I've taken big steps in building buying personas, mapping out the phase, figuring out ppc, email list building, etc to knowing where to dominate in SEO and ramping up re-targeting on high conversion commercial keywords to get the lead into the final nutruing stage. Wheeww it's a lotta work, but helps in the big picture, however, building that big picture with not much data..........bothersome
In all fairness to this article, I'm concerned with what's coming in the lack of readily available and trust-able information to spot trends in content, user intent, user signals to help aid in what types of content we should build. I mean Click Stream is using Amazon's Redshift (data warehouse) to get more of angle for marketers to use. This part for me is hard, because SEO is really a huge part in establishing trust in the initial research phase for a user and reaching a particular buying persona in the seo stage or site or web application or landing page.
If all the data is going to be taken from us, this could be problematic and no offense to noobs or read-a-book-seo, it's going to get more expensive and harder for new/small guys to invest with sweat equity and money as we will need more and more and more to compete in Google's market place.
* I could keep going but I think you know what I'm trying to say. I'm concerned with the economies of scale to do marketing for new comers or organizations that don't realize they might have to spend more? Do we want Click Stream Data of our own?? What could that cost in learning how to put it together, higher, outsource, etc :-/
Unfortunately, clickstream data is quite expensive, but we get to split that across our huge customer base, so hopefully even the little guy can stick it to the man, so to speak, by using our tools!
Yeah its not surprising eventually adwords will again help your organic listings. Back in the day when your add landed on an adsense site, google would index your add n like 20k sites. That was the good old days. Now you have to be a genius to follow the ways to keep up on what works white hat. I have a question about my new site https://vipjourneys.com - is authorship still being used with googles algorithm to help listings or are they just using it to enforce that a story is not repurposed?
As far as I've seen this has been Google's priority from the beginning, as far as I can tell their ultimate end goal is to have all "internet marketing" go through their paid advertising stream.
Russ,
As others, I appreciate this analysis. Certainly the disambiguation examples are telling.
I think this is another example of the need for SEO strategy to target topics and people instead of looking for that perfect keyword phrase. Whenever we can view pages as appealing to a particular theme which is represented by a constellation of interrelated terms and appeals to a focused searcher intent I think we'll be more in line with what Google is biased to present in the SERPs.
At the very least, I think anyone pursuing that strategy would not make the mistake of getting too excited about a keyword such as "cars part" to begin with.
Ross
I agree with this "I think this is another example of the need for SEO strategy to target topics and people instead of looking for that perfect keyword phrase" but we also have to watch out for issues caused by aggregate data. Let's say you want to figure out what topic to write on. You find all the keywords related to that topic and see that it has a good bit of searches. But can you trust that aggregate number? If you aren't careful, you could end up with "car part", "car parts", "cars part" "cars parts" all showing huge numbers and being added together, giving you a huge misrepresentation of volume for the aggregate topic. In the end, getting data right always helps. And we hope to do just that.
Need to learn more about this from MOZ, did not realize that Clickstream data was being piped into the tools, I was wondering why Keyword Tool was so much more extensive than SEM RUSH!
I'm interested in a followup post to APIs in store, so I can propose this to our developers in house and executives. Something about data turning into information is like the F22 integrated systems and you need that in niche markets!
And our data will continue to get better and better over time as well! We are constantly refining and building models behind Keyword Explorer.
I will be keeping an eye on Moz Tools, I found the keyword analyzer was pretty useful, so I will invest more time learning about how to use the tools. I saw the trainings offered on Keyword Research, tried to get my boss to buy it. But any followup to this in the future in trainings would be greatly appreciated for a lot of marketers, thx again for this great post!
Hi Russ
You mentioned that you get the clicks stream data from searches ( not on your web property ) from Clickstre.am and Jumpshot. Are you at liberty to tell us how they collect the data? Do users opt-in to pass this data to them?
HI my name is sami jeen i am a blackhat hacker , unlike other hackers here that request for half payment upfront i dont do that i provide you with proof before you make payment , my payment method is bitcoin as it helps me stay anonymous , now i am able to hack almost anything as long as it is connected to the internet 1- I can hack any smart phone to spy on it , even monitor chat messages like watsapp or track it (maybe you want to know contact : [email protected]: 659839138 if your spouse is cheating on you or you want to know the wereabout of your kids ) 2- I can hack social media sites like twitter , instagram , facebook , myspace 3- I can hack any email address 4- I can hack into websites and edit records for you or take down a website 5- I can hack university web servers to change student grades 6- I can make bank transfers or send money to your credit cards if you need a hacker you should contact me contact : sami.hacking(at)gmail{dot}comICQ: 659839138 Hacker (computer security)