The search engines are considerably smarter than we often give them credit for, and one of the ways they've become so "intelligent" is through the data provided to them on the billions of web pages they crawl. Today, I'd like to walk through a visual tour of the search engines' process and abilities in the field of semantic analysis and understanding.

Google's Spider Crawling Billions of Web Pages

Googlebot crawls billions of pages across the web, indexing a text content equivalent to thousands of times the size of all the world's libraries combined. With all this massive amount of data in the index, Google can start to form some assumptions about the incidence and frequency of particular terms & phrases.

Googlebot Finding Spain & Iberia on Many Pages

One of Google's simplest powers is to be able to calculate the relationships between two or more terms/phrases. In the example above, Google's recognized that Spain & Iberia might be connected semantically. If we recall Dr. Garcia's lessons on term co-occurrence, we can see a simplistic way that this might be happening.

Co-Occurrence Calculation

Obviously, Google has even more sophisticated ways of breaking down and analyzing an individual page or even sections in a page. They could, for example, form tighter connections between words/phrases that frequently appear very close to each other in sentences or paragraphs. As these techniques get more refined and more advanced, Google could take on an almost artificial intelligence with regard to semantic connections.

Google's Powers of Semantic Analysis

Pretty impressive for a robot & a scary, mechanized spider, eh? So how does this apply to the practice of SEO, content authorship, or website building?

There are several hypotheses I've formed about how to optimize based on this knowledge:

  • Build a Semantically Intelligent Site Architecture
    Since the search engines have some data about what terms are relevant and related to one another, it can't hurt to use the most logical system possible to create an organization chart for your site's content. Usually, common sense works best, but you could always fall back on the co-occurrence calculations if you need to find out if that chicken stock recipe belongs under "french cooking" or "american classics."
  • Create Documents that Use Relevant Terms/Phrases
    If you're targeting the term "mortgages" but most of your content is about rental properties, you might find that changing it around to connect with more relevant content is valuable.
  • Get Links from Semantically Relevant Pages
    Term Co-Occurrence can be a great way to find out if a link from a page about surfing will be semantically beneficial to your page about snowboarding.
  • Understand Why that Page Might be Ranking
    Sometimes when we see a page ranking and run a few checks on the strength of the domain and the links pointing in, we might scratch our heads thinking, "how the heck is that ranking above my page?" I've experienced this queasy feeling plenty of times and found that after some careful analysis, it looked like many of the pages pointing to my domain and page weren't nearly as "connected" as the pages linking to my competitor. While links in number and authority are very powerful, there's little doubt that semantic connections and topical relationships play their part, too.
  • Get a Sense of What the Future Holds
    In a few years, will Google be smart enough to identify link "intent?" Could they have the semantic processing capability to extract psychological cues from the sentences and paragraphs surrounding a link? Would this help them to determine link weighting and link trust? Very possibly.

I don't use a heavy dose of co-occurrence calculation in most of my work, and it's actually rare that the topic comes up in a consulting contract, but I do believe that the more we know about search engines and the more we can see the machine at work when we look at the results, the better we'll be as SEOs.

I'd love to hear if anyone has additional uses for this kind of data or other relevant semantic analysis. One thing I've been wondering about on this topic is how search engines might use the statistical probability of word/phrase occurrence in rankings. Dr. Garcia touches on that more specifically here.