Hi, I'm Chas, a developer at SEOmoz. I've been working here since last summer, and though I've written many lines of code, this is my very first blog post. Now, I'm happy to have the chance to reveal the super-secret project I've been working on: Blogscape!
Blogscape
What are the hottest topics on the web right now? How much does your web presence change day-to-day, or in response to an advertising campaign? How many links did your site receive from the blogosphere this week?
Blogscape is a data source built to answer these questions and more. It’s an ‘Information Feed Aggregator’ and has been monitoring about 10 million feeds since December 2007. These feeds come from any website which offers syndication, but they are mostly from Blogs and News sites.
While Linkscape crawls the web-at-large, Blogscape is focused on the 'fast-moving-web'. It stores and makes searchable the full content of syndication feeds (including link data!), and the newest data is made available several times every day.
Data from Blogscape will appear in various forms in upcoming SEOmoz products, but we’ve decided to release a version of our internal testing tool for it in...
SEOmoz Labs!
SEOmoz Labs is a place where our more adventurous PRO users can check out the ‘bleeding edge’ of SEOmoz technology and product design. From now on, Labs will be our showcase for new ideas and features, ranging from simple proof-of-concept tools to early releases of upcoming products. We’re hoping Labs will be a fun and exciting ‘sneak peek’ for our users – and a source of early feedback for our product development team.
There are some caveats associated with Labs – because these projects are works-in-progress, data from them could be inaccurate, and the projects themselves could become unavailable temporarily or permanently at any time. Beyond this, we are explicitly not offering user support for labs projects, so as to not take any time away from our ‘official projects’.
However, that doesn’t mean we don’t want to hear from you! In the upper right hand corner of each labs project page, there is a bright green button labeled “send feedback” (which is a link to [email protected]). Even though we won’t always be able to respond, we’d love to receive your comments, compliments, complaints, and bug reports. Feedback will be incorporated into our product design process, and labs projects with a significant amount of positive responses may find their way into formal projects sooner than others.
If you’re PRO and want to press on immediately, click here to check out SEOmoz Labs now! Otherwise, read on below for more details about Blogscape.
Queries and Graphs
The main Labs page for Blogscape has a box for queries, and a graph below to show how many posts match each query over time (the last 30 days, by default).
For example, the query
"Sean Penn", "Brad Pitt", "Richard Jenkins", "Frank Langella", "Mickey Rourke"displays the number of mentions of each Academy Award Nominee for ‘Performance by an actor in a leading role'. The (somewhat cramped) snapshot below shows the results for this query (taken on Feb 25th, 2009, you can view the live version by clicking here).
So, this graph shows the number of posts which mention each actor’s name for every day in February. The spike in mentions for all queries on Feb 23rd corresponds with the actual date of the Academy Awards. As I’m sure you know, Sean Penn took the Oscar for this one, and this is clearly reflected in the fact that he received twice as many hits as any other query on that day.
Viewing Posts
Blogscape stores the snippets of text found in each Feed – you can click any data point to view this information. For example, if you’d clicked the on the line for Sean Penn on Feb 23rd, you’d see something like this:
This view shows snippets from each post satisfying that query on that day. Posts are ordered by Blogrank, an internally calculated ranking metric. (Any feed can 'vote' for any other feed by linking to the website the feed comes from. Feeds with more votes have higher Blogrank.)
Each post has the following information:
• The original title of the post (clicking here takes you to the actual post)Advanced Queries
• A snippet of the description of the post
• The title of the feed the post came from (clicking here takes you to the main page of the feed's source)
• The feed’s Blogrank
• The URL for the feed itself
Beyond single terms or phrases in quotes, there are advanced query operators available. For example, you could search for posts containing the word ‘oscar’ or ‘oscars’ with the query
oscar | oscars (open this query)There are also query operators for finding posts which link to specific URLs, root domains, or subdomains. For example, you could search for posts which link to any URL at the root domain ‘oscar.com’ with the query
rd:oscar.com (open this query)A list of all available query operators can be found at the Blogscape help page.
Finally, each graph has option of being weighted by Blogrank (see checkbox on the right of the labs page). This makes the graph more of a measure of the ‘popularity’ of a query for any given day, instead of the raw number of matches for it. (Feeds with high Blogrank have many incoming links from other feeds, and tend to come from sources which are viewed by lots of people.)
Data Duplication
You may notice a message at the bottom of the ‘Posts’ view stating that “Posts very similar to these have been filtered from this list.” We’ve worked hard to battle data duplication in Blogscape by carefully canonicalizing feeds (many sites have several URLs for the same data) and posts within a feed. Nonetheless, there are situations where duplicate data is almost impossible to eliminate in advance (for example, some large sites have many feeds with content that occasionally overlaps).
To battle this problem, Blogscape does additional filtering of posts at query time. This filtering ensures that you see only the most relevant version of a post that occurs in Blogscape’s data stores multiple times. For this reason, some queries will have higher post counts on the frequency graphs than when viewing the Posts themselves. If you really want to view every post Blogscape has, you can click on the link at the bottom of the page to turn this feature off.
Data Quality
As I mentioned before, Blogscape has been monitoring a sizable portion of the Blogosphere for over a year. Nonetheless, we are striving to improve the quality of data within Blogscape, and we’ve very excited about two major upcoming improvements:
1. Monitoring of more high-quality feedsIt’s important for data quality to ensure that queries for a term return all posts mentioning that term, and it’s important for SEO that all link information is present. For these reasons, we’re adding functionality to Blogscape that will follow links from syndication feeds, and store the actual source content for future search. (Of course, the upcoming crawler will politely ignore sites which block it using Robots.txt – details on this will be released when the crawler goes live.)
We’ve added Feed Auto-Discovery logic to our processing of Linkscape crawl data, and will be using the results to make sure Blogscape always monitors the most important blogs from across the web.
2. Crawling of source pages
Based on our research, about half of syndication feeds don’t publish the entire content of their posts in their feeds – instead, they publish a truncated section of their content (or occasionally a hand-written summary of it). Most sites that do this also strip HTML from their feeds.
Movers and Shakers
Finally, an interesting use of the mountains of data stored by Blogscape is the search for hot trends, or ‘Movers and Shakers’. You can see the results of this process for several categories by clicking on the ‘Movers and Shakers’ links in the upper right hand corner of the Blogscape Labs page.
They tend to be most interesting (and stable) in weekly increments – you can view the top ‘mover and shaker’ phrases for this week here. On the day of writing this post, the top phrase is “Safari 4 beta,” which rose 26,632.4% this week (percent change over rank-weighted graphs). Right behind it is “Gary Locke,” which rose 25,101.7% over last week. On the labs page for this feature, you can click through and view the graph for each individual ‘mover and shaker’.
Conclusion
We’re excited to launch this feature, and even more excited about the data quality improvements we’ll be making on it in the next few months. If you’re PRO, check it out, and send your comments our way!
Fantastic to see the Moz pumping out new products like Blogscape. The addition of a Labs section is a great idea for the SEOmoz devs, feedback and recognition = 2 birds with 1 stone.
While some people in the past have whinged about the SEOmoz product suite, at least the Moz are out there getting things done, testing things and trying new products.
There's no better way than through empirical and experimental methods to learn about the web...
Bravo Chas.
I love the movers and shakers in root domains and subdomains. It's fascinating to see what sites are attracting lots of blogosphere attention (although I think we might need to raise the threshold limit for the future, as the Alpha version shows a few sites that got 20 links but have never been linked-to before).
Love this service, Chas - I honestly think this is as good and in some ways better than GG Blogsearch, Technorati, etc. and the graphing is clearly way more advanced. I'm particularly pumped to see all the metrics once those are ready - things like how many different blogs wrote about me, how many linked, which blogs write about me (or my competitors) the most, etc.
This tool's value extends way outside of SEO, but just for SEO, I see some link oppoortunities and linkbait inspiration that already makes me love this thing :-)
Congrats again! It's great to have your work for the past 12 months finally exposed in a public way.
Dam you, does that mean I need the pro account, hhhhhmmm might be time to get my wallet out.
Getting a Pro account isn't an expense, its an investment and one investment I will never ever regret that I made. The tools offered for the, in relation very low fee, is mindblowing.
Not even sure I would survive without my Pro account.
It's great to see announcements like this. I signed up for my Pro Account a few days ago and this adds even more value to an already great package.
Couldn't your link say "Blogscape! (PRO only)" to show the rabble that we're not welcome. Saves a bit of clicking around.
Also, and this has happened before, if I click the links - eg "you can view the live version by clicking here" - well I just get taken to the "my" page with no information as to why .. as before you should link to the "buy a PRO account" page with a splash at the top for "labs content is only viewable by PRO account holders, get access now!" (etc., etc.).
Bitter, can you tell.
PS: Every time I edit I get an extra line spacing.
Great suggestions pbhj - we'll get to work on those. Sorry for the tease.
No worries, I'm sure it's just the rush to market but I've noticed a few SEOs do this - they optimise the heck out of everything to get users where they want them and then forget to sell them anything. It's the line between the SEO and the business being promoted, just that line is still there (for some) when they're promoting their own services.
Just surprised to see that here.
Awesome. Can't wait to see how this affects Executive's opinions on the blogosphere.
Nice and wonder tool, I will use to follow my site!
This is FANTASTIC news. Congrats to the team.
I'm glad to be a PROMember to get a hands on look at this.
Online Reputation Management is going to be huge and this is a fantastic way to test the water.
I'm just putting together a SEO/SEM process and i believe this is going to be a great asset.
Thank you very much guys and gals.
Wow! Very unique stuff you come up with. I guess that's why you're my favorite site for SEO.
sucks to be me.
This comes out 2 days after my PRO membership ends.
Wish I could have afforded to keep it going
Nice to see the progresive movement forward on tool development, especially tools that will hopefully help reverse engineer data to extrapolate meaning for SEO's like myself.
This is definately inspired by Google, but hopefully, because SEOmoz has a big SEO following, shoudl keep true to advancement and furthering of useful and up-to-date tool sets...
Thanks guys, looks great!
This is so cool, really cool. it's inspired me that make me create labs on my own website :P.
if I may suggest, can SEOmoz team make some SEO tool docs for every tool on how to use the tools the most for SEO benefit.
Another great tool that is going to keep me a Pro.
somehow I kind of knew SEOMOZ would provide a tool like this one, as I heard Rand F. at the 'Measuring Success in a 2.0 World". Nobody can or could nowadays follow everything on all the blogs and forums there are out there.
Well I guess now Rand will 'lose' less time every day checking out what people are telling about him. ;-)
Thanks guys (SEO MOZ) !!
Is there a better use for this tool than to find new sites to secure new links from?
p.s. I'm using Firefox and (trying to) hide my referral so that it looks like I came to that site naturally, instead of from an SEO tool link.
p.s.s. To Rebecca (I think that's your name)
Screw Flanders !!!
I like it a lot for reputation monitoring, brand tracking & competitive analysis/comparison as well :-)
I guess you must be happy about this tool, and wonder if you were not the first person who had the idea about it... ;-)
It's been killing me keeping all this cool stuff secret. We're looking forward to seeing more and more stuff in labs...
Bond and share Will, don't be shy, bond and share ...
Brilliant coworkers are my favorite kind. How lucky am I to be part of such an incredible team?!
Yes, it indeed looks very interesting! And as Thomas M. Schmitz writes, a movers and shakers RSS feed would be great too.
Well, I look forward to testing it in depth - keep 'em comming :)
I love this tool!
I am really looking forward to intergrating it with my already obsessive daily reputation management routine. Thanks Chas!
And the hits keep a comin'. Congratulations.
Psst... can we get a Movers & Shakers RSS feed?
That could end up being very meta.
I dig, oh yes, I so dig.
Congrats on the New Launches! This looks like a great addition to Seomoz and even better for the ProMembers...
Awesomeness. I'm off to play now! Congrats to getting this all live and good work with labs too - I'm looking forward to testing ideas which aren't neatly polished and finished.
PS - wow the data is fast!
I am super excited about Blogscape and the launch SEOmoz Labs. Blogscape is HUGE, and we're all very proud to see this out in Labs. Nice work Chas!
Nice one guys - I'm really chuffed you've got this out finally. Can't wait to have a play around with it! (The MozLabs stuff, that is..) :-)
Wow. A private RSS feed would be awesome.
But seriously this is kick ass (pardon my french).
Skittles is all over the blogosphere (according to BlogScape). Obviously it's pretty relevant.
Great job Mozzers. It just gets better and better with you guys.
One of the few SEO company's that are actually producing public tools. Keep them coming. :)
I can't wait to see Linkscape grow to more urls and starting to get a solid chunk of Google's approximate reach (1 trillion, no duplicates). Right now, at 3.5% it's definitely the most cost effective and powerful tool that I've seen yet.
Top notch. BlogScape, FTW!
well done on this one guys, it looks like a great tool... i rushed to start using it on several terms and looks promising.
Chas - keep up the good work