Having access to data and large data sets is something any SEO worth his salt craves. Sure, managing a massive dataset or database can be a bit of a hassle, but having good information is key and there are a handful of uses for other people's/sites' data sets that are readily available for purchase online. Big budget linkbuilding isn't the only way to spend your SEO budget these days!

Let's take a look at five examples of datasets that you can easily and readily purchase and how you might go about using them.

Geocities

The true motivation for this article was a chat with Tom... and the fact that you can now get Geocities (yes, seriously, the whole thing) in the form of a 1 tb torrent (thanks Hackernews). How much is it going to cost you? Just your email address.

What would you do with Geocities, you ask? The sky's the limit with this one really! I'm not saying you want to use any of the great tickers and beautiful layout/seisure inducing colours for which Geocities is now famous.

However, you may very well want to use the huge volume of content that could quite easily be respun for your own purposes for a start? Or, use the epic designs for mapping out your new site- up to you!

Keyword Datasets

Why pay for keywords? Well, for starters, because sometimes you may find you have a client that has exhausted the entire set of data available through the adwords API (yes, this has seriously happened before). If the site is strong enough and you find you're still able to rank reasonably for long-tail terms post-MayDay there's no harm in creating some new content to target the long-tail. This isn't to suggest that you should buy keyphrases and not do the research yourself, but discussed, more data is almost always better than less.

And, most importantly- just because the data isn't in the API doesn't mean there isn't any search volume for it!

Some of the outfits out there selling keywords and keyphrases are:

  • SEM Rush (AdWords, Google Words, "hidden" keywords)
  • Hitwise also offers a range of products as well as one-off reports that will include a great deal of this information
  • WordStream can hook you up with access to a few trillion keyphrases as well - access to their API runs from $300-$2,500 per month depending upon how many units you're looking for (pricing details).
  • It's not yet up in it's final form but Rich Baxter is beta-testing a pretty darned good Keyword tool you may want to look at.
  • Finally, KeywordSpy is something I'd be interested in checking out.

This sort of thing won't come cheap, but it can be extremely valuable to the larger sites.

80 Legs Crawl Packages

Some of you may be familiar with using 80 legs as a tool to crawl and scrape your way through the interwebs. It's a tool that I've not spent nearly enough time with as I didn't find it quite as intuitive to use as Mozenda. However, the nice thing about 80 legs is that they have compensated for this a bit by offering packaged-up crawls.

The vast majority of the packages cost $350 per month (with the exception of the ebay motors crawl for $150/month) though the data you could pull off these is extremely valuable and saves you the trouble of doing any of the crawling yourself (or if your IP has been banned you naughty SEOs).

Again, these sets could be used for anything from price-comparison to market analysis and right on down to content creation and keyphrase research. If you're one of the fortunate few working in the space for which these are offered you should definitely have a look.

Twitter Census

So, the Twitter Census dataset is just an example of the variety of datasets you can buy from InfoChimps though the general concept of owning one year's worth of URLs, hashtags, and smiley usage seems like it could be used a number of ways. Either, you could create an infographic worthy of a link from the likes of Mashable, TechCrunch, etc.

Or, you could use the data to monitor keyphrase usage, common abbreviations, or any other sort of trend in social interaction (could be a great source of keyphrases as well as the search engines begin to take signals and include social directly in the SERPs. This set is currently placed at $300.

Linkscape

Rand was being a bit coy about this one and at time of press I wasn't able to get a serious price out of them but there's a price for everything right? Any serious bidders should probably get in touch with the SEOmoz team directly...

Along these lines, there are a number of other datasets that do not have a price set but I'm sure you could get your hands on with enough money and asking the right people. These would include: Backtype API data, Wordtracker, or Amazon's entire product catalogue. It all comes down to asking the right people, but ultimately anyone with a brain for business and a load of data would sell you their info if you know how to ask for it.

Bonus SWAG

Don't you just love it when you can get your hands on some awesome free stuff that you never knew you wanted in the first place? Well, thankfully, there are a few datasets that I came across that I thought were worth sharing and could give you some value for free.

Feel free to take a gander at these datasets and try to make use of the data! Can you say "infographic ammunition"?

The entire dataset from the New York Stock Exchange from 1970-current (Open, Close, High, Low, Volume).
Massive sets of US Census Data.
And for those of us based over in the UK - huge volumes of UK Government data right at your fingertips.

Other Huge Datasets to get stuck into:

Project Gutenberg for over 6,000 full books available online. These book lists at the very least could be of interest.
Any number of the Google Labs Datasets. My personal favourite of which is the "Broadband Penetration in Europe"
The Freebase data dump which happens to include 26gb of the world's information from the likes of Wikipedia, Freebase and a handful of other datasets.
Any number of epic datasets from Elastic Web's Public Datasets not to be missed! This includes Wikipedia, IMDB, Stack Overflow, etc.
 

Final Pro-Tip

One thing that you may have noticed is a byproduct of providing large datasets to people is that they tend to be solid gold for linkbait. We could focus an entire post around this but if you've got access to great data and you're not offering it out to your users/curious SEOs what are you thinking?! Publish the data, make it free to download, and require a link back for attribution for anyone who wants to use it- simples!