What will you learn from this post?
- How to get lots of Search Console data quickly and easily
- How to run a Python script
And who can do it? Hopefully, it should be accessible to any beginner.
Why do we use the API to get Search Console data?
At Distilled, we often want to use Google Search Console data, but getting it from the interface is incredibly clunky:
- You’re limited to the top 1,000 queries
- You have to apply each filter one at a time
- The interface is slow
- And if you want to do this regularly, you have to repeat this process often.
We can get around that by using the API. Now we can get up to 5,000 queries at a time, we can apply multiple filters instantly, and we can run multiple queries quickly and easily.
We do this with Python.
Why is it useful to be able to run Python scripts?
Being able to run scripts is incredibly valuable. There are lots of amazing scripts out there, both on Github and written by other people in the industry; using them, you can pull down data more quickly and faster than you otherwise could.
We’ll be using Python for this tutorial because it’s a very popular language, particularly when working with large amounts of data.
Crucially, you don’t need to be able to write in Python to use the scripts — you just need to understand some basics about how to use them.
With APIs you can pull data from all sorts of exciting places, far more quickly than through the user interface. You can also often get more data.
How do we run Python?
If you’re on a Mac or a PC, I’d recommend downloading Anaconda. That will get you set up and running with Python 3, and save a lot of fiddling around.
If you need administrator permission to install things on your work computer, then make sure you only install Anaconda for your user, not all users. If you try and install for all users, then you’ll need administrator permission.
Then we’re going to need a good shell (a command line interface, the place where you can run the script from). Mac has Terminal installed by default; on Windows, I would recommend Cmder.
Go ahead and install that.
(The rest of this tutorial is shown in Windows, but the same basic steps should be fine for a Mac!)
Double-check that Python has installed correctly
First open up the shell, type in python and hit enter.
Exit python by typing in exit().
Download our example script
For this example we’ll be using the search console script, written by one of our consultants, Stephan.
You can download it from his Github here. I’m not going to include a full tutorial on Git in this (although it’s a very useful tool for coding), so if you’re unsure how to clone a repository, just download the zip file:
Running our example script
Once we’ve downloaded the example script (and unzipped the folder, if necessary), we need to navigate in our shell to the folder where we just downloaded the script..
The command line functions like the Windows File Explorer or Finder that you normally use. Just like file explorer has a specific folder open, so does the command line, so we need to navigate to the folder where we have the script downloaded.
A command line shell functions a lot like a file explorer, only everything happens through text. You don’t get a mouse or a GUI.
Some command line basics
To change folders you’ll need some command line basics, most notably these two super-important commands:
- cd [path]
- ls -g
The first navigates you to the path given. (Or, if you use .. as your path, takes you a step backwards “cd ..”)
The second lists all the files and folders in the directory you’re currently in.
That’s all you need, but to make yourself faster there are two other things that are useful to know:
Hitting tab will cause the shell to try and complete the path you’re typing.
Suppose you’re in a folder with two files:
- Moz_1990_ranking_data.txt
- Moz_180_rankings.txt
- 180_rankings.txt
If you type:
- 180 and hit tab: It will autocomplete to 180_rankings.txt
- Moz and hit tab: It will autocomplete to Moz_
Secondly, hitting the up key goes through all the commands you’ve used. The reason a lot of people enjoy using the shell is they find it quicker than using a file explorer — and those two commands are a large part of that.
Congrats — now you’re ready to run the script. Next we need to get permission for the Google Search Console (GSC) API.
Turning on the API
In the same way you have to log in to see Search Console data, you need permission to use the API. Otherwise, anyone could get your data.
We also need to check whether the API is turned on — by default, it isn’t.
All the Google APIs live in the same place; Google Analytics is there, too. You can find them all at:
You'll need to sign in (making sure to use the Gmail account with access to your Search Console data). Then you can search for the Search Console API.
Once it’s selected, if it says "Enable here," you’ll need to enable it.
Once that's done we need to download an API key (which is equivalent to our password when signing into Search Console). A single API key gives you access to all of the Google services, in the same way that you use the same Gmail address to sign into Google Analytics and Search Console.
What is an API key? Different APIs have different types of keys. Sometimes it will just be a text string like "AHNSKDSJKS434SDJ"; other times it's a file. The most common Google API key is a file.
So how do we get our Google API key? Once we’ve enabled the API, we select the "Credentials" tab and then create credentials. The three main kinds of API key are a basic text key, user OAuth credentials, and service account keys.
The first is quick and simple, the second is more secure and intended for users who will authenticate with a login, the third for automated data pulling.
There are some subtleties around permissions with these that we don't really want to delve into here. The script is set up to use the second, so we’ll use that.
Go ahead and create an OAuth Client ID:
Ignore the pop-up and download the file from the credentials screen:
Move it to the same folder as your script. For ease of use, we’ll also rename it "credentials.json," which is what the script is expecting the API key to be called. (A script will tell you what it’s expecting the API key to be called when you run it, or will have this in the documentation... assuming it’s well-written, of course).
Crucial note: By default, most versions of Windows will hide file extensions. Rather than naming the file "credentials.json," you'll accidentally name it "credentials.json.json."
Because the file is already a JSON file, you can just name it "credentials" and check that the type is JSON. You can also turn on file extensions (instructions here) and then name it "credentials.json."
In the screenshot below, I have file extensions visible. I’m afraid I don’t know if something equivalent exists in Mac — if you do, drop it in the comments!
Running our script
And we’re ready to go!
Hopefully now you’ve navigated to the folder with the script in using cd:
Now we try and run the script:
We get a module missing error. Normally you can solve this by running:
- pip install missing_module — or, in our case,
- pip install httplib2
And because we’ll get several of these errors, we need to install a couple modules.
- pip install oauth2client
- pip install --user --upgrade google-api-python-client
Interesting side point: It’s worth noting that the flag "--user" is the "pip" command line equivalent to the choice you often see when installing programs on a computer to install for all users or just you. (We saw this with Anaconda earlier.) If you do see permissions errors appearing in the command line with pip, try adding --user. And back to our script.
Now that we’ve installed all the things the script needs, we can try again (remember, you can just press up to see the previous command). Now we should get the script help, which will tell us how to run it. Any well-documented script should return something like this:
First, pay attention to the last line. Which arguments are required?
- property_uri
- start_date
- end_date
Our script needs to have these 3 arguments first in that order. Which looks like:
python search_console_query.py <a href="https://www.distilled.net/">https://www.distilled.net/</a> 2017-02-05 2017-02-06
Run that command and remember to change the URL to a property you have access to!
Your browser will open up and you’ll need to log in and authenticate the script (because it’s the first time we’re running the script):
You should be taken to a page that doesn’t load. If you look at the script, it's now asking for an authentication code.
This is in the URL of the page, everything from the = up to the hash, which you’ll need to copy and paste back into the script and hit enter.
Check your folder where you saved the script and it should now look something like this:
The permission we gave the script is now saved in webmaster_credentials.dat. Each of our days of Search Console data we asked for sits in those CSV files. The script is designed to pull data for each day individually.
If we look back at our script options:
We can see some of the other options this script takes that we can use. This is where we can filter the results, change the country, device, etc.
- "Pages" takes a file of pages to individually query (example file)
- By default, it pulls for the entire property.
- "Devices" takes a space-separated list
- By default, it queries mobile, desktop, and tablet.
- Countries
- By default, it does worldwide. Takes a space-separated list of country codes.
- By default the script will pull 100 rows of data per day. The API allows a limit of up to 5,000.
Here are some example queries using those options and what they do:
#get top queries for the search console property
python search_console_query.py https://www.distilled.net/ 2017-02-05 2017-02-06
#get top queries for multiple pages stored in file_of_pages and aggregate together
python search_console_query.py https://www.distilled.net/ 2017-02-05 2017-02-06 --pages file_of_pages
#get top queries for the property from desktop and mobile
python search_console_query.py https://www.distilled.net/ 2017-02-05 2017-02-06 --devices desktop mobile
#get the top queries for the property from the US & the UK
python search_console_query.py https://www.distilled.net/ 2017-02-05 2017-02-06 --countries USA GBR
#get the 5000 top queries for the property
python search_console_query.py https://www.distilled.net/ 2017-02-05 2017-02-06 --max-rows-per-day 5000
Wow, new possibilities that I could not imagine. Python always seemed hard to learn, but this way you can get many benefits in a short period of time. Great post, thank you Dominic for the info.
I totally agree, I didn't think I would be able to do this then just tested it on a site and it works!
Cheers Dominic
For dumping the data to a SQL database, check out my script: https://searchwilderness.com/gwmt-data-python/
Thoroughly enjoyed all the script sharing! I put all mine in BigQuery!
5,000 rows is not "All data".
If you want all the data you can get it by combining Analytics landing page data with Search Console. I wrote about it 1.5 years ago - check here: https://moz.com/ugc/how-to-get-the-data-you-need-f... (process is similar to this one - but you also need to authenticate the Google Analytics API)
Thanks, just skimmed the post. Seems crazy slow? I'm downloading site's triple the size within the hour and that's with expensive DB table scans.
@Craig: The main reason why it's slow is because the initial data you can download in batch (first 1000 kw/landingpages) - however - the other data is requested one by one. If you have 6000 landingpages in Analytics it will need to request the 5000 remaining landingpages one by one to check the keywords - which takes time. I run the script on a separate server during the night - so I don't really mind it is slow to run. Could be that the script could be further optimised but for me it's not really an issue.
@Craig - If You dont have time, coding skills or need more links, use Clusteric Search Auditor to get up 100.000 rows from Google Search Console.
True the API has a limit of 5000, you have to use the GSC filters creatively to get all of it.
So for example, you could provide it with a list of all the pages of your site, or filter for the pages beginning with aa, ab, ac, etc. all the way down to zz.
I have this script built out internally to take a list of two pages, a static list and dynamic list (from GA), which it then goes through to pull down all the data.
You can get it all, unless you're truly huge (then you'll bump into rate limits), you've just gotta be a little creative with how you use it!
Pretty awesome blog post. But I got tripped up at a few points while learning this.
To get to the right folder (on Mac) I used: cd downloads then cd Google-Search-Console-bulk-query-master
Using "python search_console_query.py with href https://www.timdorrian.com 2017-02-05 2017-02-06" didn't work. Instead using just python search_console_query.py URL 2017-02-05 2017-02-06. Copy and pasting the code with href included was throwing an error, get rid of it for it to work.
Hope this helps anyone getting tripped up on the same points!
Great post! Really enjoyed it.
I'm just struggling to understand what else you can achieve with this data. Yes you get more data quicker and avoid the bad interface of GSC. But generally most pages aren't getting more than 1000 queries or at least I wouldn't be looking at all of the queries at that depth (unless I wanted to do some real complex research for a page).
Two ideas did cross my mind:
Do you have any examples of when it really is worth your time to use this rather than doing it through Google Search Console?
Hi Sally,
We have included 100k export, as out customers need this data mostly for internal tools/workflow and large audits. I agree, for blog owner with few posts, this what GSC offer is enough. Breaking down by page, its also a solution, just a painful one to get overview (like scorings on top level).
Hey Sally, so here are a couple that come to mind:
If i'm looking at the whole keyword landscape a site can play in, I'll pull as much of their search console as possible.
The speed thing is big, if I'm running a split test and want ranking data from different pages, I'll set-up tracking in something like STAT and I'll also pull down search console terms for 500 pages in each split test for example and track that over time.
Being able to schedule this regularly is a big deal, it means I can track more granually overtime and if you then store it in a database you can get some great long term performance stats, check out Paul's script above that he's generously shared!
The title can be misunderstood. You are showing how to get all search queries from the GSC, not all data (which would include linkdata, erros, parameter, crawling, etc)
I clicked the article because I hoped I would get more data
You can use the API to crawl errors as well: https://developers.google.com/webmaster-tools/sear...
Great catch, Julian! We just edited the title to "More of" for clarity. Thank you for pointing that out! :)
Great job on this Dom!
If you want to take it to another level, check out Pipulate by NY based SEO Mike Levin : https://github.com/miklevin/pipulate
Thanks dominic woodman & stephan-solomonidis! this tool deserves some love as well https://searchanalyticsforsheets.com/
Thanks. This is the type of stuff I always think I can't do, until I try. And now thanks to your guide I will try it. :)
For reference, when setting this up - python search_console_query.py https://www.distilled.net/ 2017-02-05 2017-02-06
Don't put the a href in, just put the domain and it will pop up the white screen
Hello Dominic, thank you for sharing this. Could you help me with the following command:
python search_console_query.py https://www.distilled.net/ 2017-02-05 2017-02-06 --pages file_of_pages.
I am getting "search_console_query.py: error: unrecognized arguments: --pages file_of_pages" error.
I need to download all the queries data from a particular month (rather than a day), any suggestions?
Hello Dominic !!
Thank you for your contribution. I'm a little new to this and I did not even know that this tool existed, I'll keep it in mind and devote a few hours to learning how to use it correctly.
Thanks again and best regards.
Thanks to you I have discovered a new world to get good results in a row
You will never go to sleep without knowing something new :)
This tactic could be very helpful because lots of SEO guys are getting trouble with Analytic's "Not Provided" keywords data. Now, we can have all data that what search queries are most likely to show our website.
Great share, Google is so protective of the how and why but the search console is actually once place where they proactively research out to you and make you aware of errors. Invaluable for SEO. A combination of understand the webmaster tools and Google Analytics can give you great insight on your website.
I had everything running yesterday, and ran a few reports successfully. I came back to it today and tried running the same query and am getting the following error:
WARNING:googleapiclient.http:Encountered 403 Forbidden with reason "forbidden"
ERROR:root:Request failed
I tried deleting my webmaster credentials, and reentered the code, which was accepted. However, I'm still getting the same error.
I'm in the correct folder with the credentials and seach_console_query json files.
Any suggestions?
Hi, did this issue get solved at all? I am seeing the same error. When I run the script I get csv files returning empty.Thanks
I am glad that all my Python training can be be used at home. Installing Anaconda on a Windows box sounds painful. I just want to turn my old XP machine into a pure Linux box that much more after reading this article. I hate dealing with Window's "cleaver" ideas on what to do with file extensions.
Great info man! Yeah we were a little hesitant about using Python as it seemed complicated but feel better about it now. Great breakdown and how to use it and helpful examples!
It's a new technique, which should be studied. Thank you for this information