Well-researched personas can be a useful tool for marketers, but to do it correctly takes time. But what if you don’t have extra time? Using a mix of Followerwonk, Twitter, and the AIchemy language API, it’s possible to do top-level persona research very quickly. I’ve built a Python script that can help you answer two important questions about your target audience:
- What are the most common domains that my audience visits and spend time on? (Where should I be trying to get mentions/links/PR)
- What topics are they interested in or reading on those sites? (What content should I potentially create for these people)
You can get the script on Github: Twitter persona research
Once the script runs, the output is two CSV files. One is a list of the most commonly-shared domains by the group, the other is a list of the topics that the audience is interested in.
A quick introduction to Watson and the Alchemy API
The Alchemy API has been around a while, and they were recently acquired by the IBM Watson group. The language tool has 15 functions. I've used it in the past for language detection, sentiment analysis, and topic analysis. For this personas tool, I’ve used the Concepts feature. You can upload a block of text or ask it to fetch a URL for analysis. The output is then a list of concepts that are relevant to the page. For example, if I put the Distilled homepage into the tool, the concepts are:
Notice there are some strange things like Arianna Huffington listed, but running this tool over thousands of URLs and counting the occurrences takes care of any strange results. This highlights one of the interesting features of the tool: Alchemy isn’t just doing a keyword extraction task. Arianna Huffington isn’t mentioned anywhere on the Distilled homepage.
Alchemy has found the mention of Huffington Post and expanded on that concept. Notice that neither search engine optimization or Internet marketing are mentioned on the homepage, but have been listed as the two most relevant concepts. Pretty clever. The Alchemy site sums it up nicely:
"AlchemyAPI employs sophisticated text analysis techniques to concept tag documents in a manner similar to how humans would identify concepts. The concept tagging API is capable of making high-level abstractions by understanding how concepts relate, and can identify concepts that aren't necessarily directly referenced in the text.”
My thinking for this script is simple: If I get a list of all the links that certain people share and pass the URLs through the Alchemy tool, I should be able to extract the main concepts that the audience is interested in.
To use an example, let’s assume I want to know what topics the SEO community is interested in and what sites are most important in that community. My process is this:
- Find people that mention “SEO” in their Twitter bio using Followerwonk
- Get a sample of their most recent tweets using the Twitter API
- Pull out the most common domains that those people share
- Use the Alchemy Concepts API to summarize what the pages they share are about
- Output all of the above to a spreadsheet
Follow the steps below. Sorry, but the instructions below are for Mac only; the script will work for PCs, but I’m not sure of the terminal set up.
How to use the script
Step 1 – Finding people interested in SEO
Searching Followerwonk is the only manual part of the process. I might build it into the the script in future, but honestly, it’s too easy to just download the usernames from the interface.
Go into the "Search Bios" tab and enter the job title in quotes. In this case, that's "SEO." More common jobs will return a lot of results; I recommend setting some filters to avoid bots. For example, you might want to only include accounts with a certain number of followers, or accounts with less than a reasonable number of tweets. You can download these users in a CSV as shown in the bottom-right of the image below:
Everything else can be done automatically using the script.
Step 2 – Downloading the script from GitHub
Download the script from Github here: Twitter API using Python. Use the Download Zip link on the right hand side as shown below:
Step 3 – Sign up for Twitter and Alchemy API keys:
It’s easy to sign up using the links below:
Once you have the API keys, you need to install a couple of extra requirements for the script to work.
The easiest way to do that is to download Pip here: https://bootstrap.pypa.io/get-pip.py — save the page as “get-pip.py". Create a folder on your desktop and save the Git download and the “get-pip.py” file in it. You then need to open your terminal and navigate into that folder. You can read my previous post on how to use the command line here: The Beginner's Guide to the Command Line.
The steps below should get you there:
Open up the terminal and type:
“cd Desktop/”
“cd [foldername]”
You should now be in the folder with the get-pip.py file and the folder you downloaded from Github. Go back to the terminal and type:
“sudo python get-pip.py”
“sudo pip install -r requirements.txt”
Create two more files:
- usernames.txt – This is where you will add all of the Twitter handles you want to research
- api_keys.py – The file with your API keys for Alchemy and Twitter
In the api_keys file, paste the following and add the respective details:
watson_api_key = "[INSERT ALCHEMY KEY]"
twitter_ckey = "[INSERT TWITTER CKEY]"
twitter_csecret = "[INSERT CSECRET]"
twitter_atoken = "[INSERT TOKEN]"
twitter_asecret = "[INSERT ASECRET]"
Save and close the file.
Step 4 – Run the script
At this stage you should:
- Have a username.txt file with the Twitter handles you want to research
- Have downloaded the script from Github
- Have a file named api_keys.py with your details for Alchemy and Twitter
- Installed Pip and the requirements file
The main code of the script can be found in the “get_tweets.py” file.
To run the script, go into your terminal, navigate to the folder that you saved the script to (you should still be in the correct directory if you followed the steps above. Use “pwd” to print the directory you’re in). Once you are in the folder, run the script by going to the terminal and typing: “python get_tweets.py”. Depending on the number of usernames you entered, it should take a couple of minutes to run. I recommend starting with one or two to check that everything is working.
Once the script finishes running, it will have created two csv files in the folder you created:
- “domain + timestamp” – This includes all the domains that people tweeted and the count of each
- “concepts + timestamp” – This includes all the concepts that were extracted from the links that were shared
I did this process using “SEO” as the search term in Followerwonk. I used 50 or so profiles, which created the following results:
Top 30 domains shared:
Top 40 concepts
For the most part, I think the domains and topics are representative of the SEO community. The output above seems obvious to us, but try it for a topic that you’re not familiar with and it’s really helpful. The bigger the sample size, the better the results should be, but this is restricted by the API limitations.
Although it looks like a lot of steps, once you have this set up, it’s very easy to repeat — all you need to change is the usernames file. Using this tool can get you some top-level persona information in a very short amount of time.
Give it a try and let me know what you think.
Too technical for most but great nonetheless. You should bung this up on a page for people to play with :)
Interesting and informative blog.Thanks for sharing Craig!
Great concept!
Could you please provide further instructions for Windows users?
An interesting tool you have there. Even though I practically lost focus every two minutes because it was so abstract to me (I haven't used it, so it's pretty out of my reach of comprehension), I can see that the insights you have collected might be useful.
I sometimes get scared of the amount of data the search engines collect when we browse them. What about you?
Thanks Craig,
I'm trying this out now, but am getting the below error after typing "sudo pip install -r requirements.txt"
The directory '/Users/NAME/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/NAME/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
Any insnights for a non technical person on what this means and what I need to do?
Yes Shailesh thats quite simple explanation but that needs effort. Craigs article is good, these techniques will need time to understand for a general user but once there is a clear understanding, this API will make the life easy for many SEOs. I have used followerwonk but now I can make better use of it.
However I feel that we can get a lot of data like this without using any automated tools. It just needs focus and thorough analysis of the audience and channels they are using. The best thing that I liked about this API is that it gives a list of industry specific websites where audience in constantly going. I am not sure about accurate that data is but I would like to do a test.
Thanks Craig, I always get some valuable info from MOZ. This a perfect destination for me to learn new things. I a loving it.
Nice Write Craig ! There is aylien.com Api that i find interesting too
Do you use it?? I found it very useful, but a lilltle bit technical.
Yes i gave it a try, it is not bad
Coolio! I've been researching for the same subject previously and haven't caught this article of yours. It would have made the struggle bearable. But still, good stuff. Thanks!
Will definitely try this out. thanks craig
I'm assuming a Moz Pro account is needed to download the CSV file....
Interesting post! Great advice and thanks for sharing the script.
Nice Sharing!!!
Hey Craig,
I'm getting the same error as TechGal above:
The directory '/Users/NAME/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/NAME/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
Any thoughts on how to fix this?
The "sudo python get-pip.py" command doesn't work for me. It says sudo is not a recognized internal or external command, operable program or batch file.
Any ideas?
You're using Windows; the instructions above are for mac only. If you don't have Python installed, you can get a Windows installer for it from this page on the Python website. The `sudo` command doesn't exist on windows, so just miss it out and run `python get-pip.py`
Great article, Craig. Trying to duplicate your results but all profiles are coming back as private even though they are public. The script steps through the @names, but thinks they are all private. Any suggestions?
Try putting your own handle in first and checking that works. What is the error you're getting?
My username is first. I am not sure if it is a web service issue or not :) Going to try to pipe a few things out, just wasn't sure if you had seen something like this before.
Starting tweet collection...
3 usernames.
private profile
private profile
private profile
Starting Alchemy analysis of links...
0 links.
Done. :)
The access token secret had apparently expired :/ I saw the error after I printed the exception. I added except:
print "Exception"
traceback.print_exc()
To the exception portion to print out what is going on. Thanks again for the script it is excellent.
I get this issue as well, each account is marked private, I only signed up for my token today though any other suggestions as I don't get any error's even after including your code?
Hi All,
Hope this helps someone but I took the steps very literal and pasted my authentication details between the '[' in the api keys.txt file :)
Thanks Craig! I really like the thinking behind this analysis. I am curious, have you (or anyone else) run into the issue of having zero links in the output? I am running this from a PC so had to do a little extra work on the setup but have everything in order and am successfully running the query. Only issue is the output returns zero links. I feel this may have something to do with the urls command but not sure. Any suggestions?
This is a nice way to start a persona for your intended target market but it's too simplistic - when I create a persona I like to go out to people that the target audience and survey them, meet them and have a coffee with them and do some workshops with them to get a genuine understanding of what they're all about. It's only through a really deep understanding of their requirements that we're able to say we really know our customers. On the other hand I think some of the best insights come from the smallest nuggets of information so I'm sure sometimes the script will deliver some interesting results.
Great Work Craig, Its So Simple.. Easy.. Quick !!
Now it's too simple to catch the market. Just create a niche based twitter profile. Join the people, niche tweet and after some time download the csv and send messages to interested people.
I guess more than 70% lead generation if done effectively.
Thanks for this Great Post Craig.