Web technologies and their adoption are advancing at a frenetic pace. Content is a game that every type of team and agency plays, so we’re all competing for a piece of that pie. Meanwhile, technical SEO is more complicated and more important than ever before and much of the SEO discussion has shied away from its growing technical components in favor of content marketing.
As a result, SEO is going through a renaissance wherein the technical components are coming back to the forefront and we need to be prepared. At the same time, a number of thought leaders have made statements that modern SEO is not technical. These statements misrepresent the opportunities and problems that have sprouted on the backs of newer technologies. They also contribute to an ever-growing technical knowledge gap within SEO as a marketing field and make it difficult for many SEOs to solve our new problems.
That resulting knowledge gap that’s been growing for the past couple of years influenced me to, for the first time, “tour” a presentation. I’d been giving my Technical SEO Renaissance talk in one form or another since January because I thought it was important to stoke a conversation around the fact that things have shifted and many organizations and websites may be behind the curve if they don’t account for these shifts. A number of things have happened that prove I’ve been on the right track since I began giving this presentation, so I figured it’s worth bringing the discussion to continue the discussion. Shall we?
An abridged history of SEO (according to me)
It’s interesting to think that the technical SEO has become a dying breed in recent years. There was a time when it was a prerequisite.
Personally, I started working on the web in 1995 as a high school intern at Microsoft. My title, like everyone else who worked on the web then, was "webmaster." This was well before the web profession splintered into myriad disciplines. There was no Front End vs. Backend. There was no DevOps or UX person. You were just a Webmaster.
Back then, before Yahoo, AltaVista, Lycos, Excite, and WebCrawler entered their heyday, we discovered the web by clicking linkrolls, using Gopher, Usenet, IRC, from magazines, and via email. Around the same time, IE and Netscape were engaged in the Browser Wars and you had more than one client-side scripting language to choose from. Frames were the rage.
Then the search engines showed up. Truthfully, at this time, I didn’t really think about how search engines worked. I just knew Lycos gave me what I believed to be the most trustworthy results to my queries. At that point, I had no idea that there was this underworld of people manipulating these portals into doing their bidding.
Enter SEO.
SEO was born of a cross-section of these webmasters, the subset of computer scientists that understood the otherwise esoteric field of information retrieval and those “Get Rich Quick on the Internet” folks. These Internet puppeteers were essentially magicians who traded tips and tricks in the almost dark corners of the web. They were basically nerds wringing dollars out of search engines through keyword stuffing, content spinning, and cloaking.
Then Google showed up to the party.
Early Google updates started the cat-and-mouse game that would shorten some perpetual vacations. To condense the last 15 years of search engine history into a short paragraph, Google changed the game from being about content pollution and link manipulation through a series of updates starting with Florida and more recently Panda and Penguin. After subsequent refinements of Panda and Penguin, the face of the SEO industry changed pretty dramatically. Many of the most arrogant “I can rank anything” SEOs turned white hat, started software companies, or cut their losses and did something else. That’s not to say that hacks and spam links don’t still work, because they certainly often do. Rather, Google’s sophistication finally discouraged a lot of people who no longer have the stomach for the roller coaster.
Simultaneously, people started to come into SEO from different disciplines. Well, people always came into SEO from very different professional histories, but it started to attract a lot more more actual “marketing” people. This makes a lot of sense because SEO as an industry has shifted heavily into a content marketing focus. After all, we’ve got to get those links somehow, right?
Naturally, this begat a lot of marketers marketing to marketers about marketing who made statements like “Modern SEO Requires Almost No Technical Expertise.”
Or one of my favorites, that may have attracted even more ire: “SEO is Makeup.”
While I, naturally, disagree with these statements, I understand why these folks would contribute these ideas in their thought leadership. Irrespective of the fact that I’ve worked with both gentlemen in the past in some capacity and know their predispositions towards content, the core point they're making is that many modern Content Management Systems do account for many of our time-honored SEO best practices. Google is pretty good at understanding what you’re talking about in your content. Ultimately, your organization’s focus needs to be on making something meaningful for your user base so you can deliver competitive marketing.
If you remember the last time I tried to make the case for a paradigm shift in the SEO space, you’d be right in thinking that I agree with that idea fundamentally. However, not at the cost of ignoring the fact that the technical landscape has changed. Technical SEO is the price of admission. Or, to quote Adam Audette, “SEO should be invisible,” not makeup.
Changes in web technology are causing a technical renaissance
In SEO, we often criticize developers for always wanting to deploy the new shiny thing. Moving forward, it's important that we understand the new shiny things so we can be more effective in optimizing them.
SEO has always had a healthy fear of JavaScript, and with good reason. Despite the fact that search engines have had the technology to crawl the web the same way we see it in a browser for at least 10 years, it has always been a crapshoot as to whether that content actually gets crawled and, more importantly, indexed.
When we’d initially examined the idea of headless browsing in 2011, the collective response was that the computational expense prohibited it at scale. But it seems that even if that is the case, Google believes enough of the web is rendered using JavaScript that it’s a worthy investment.
Over time more and more folks would examine this idea; ultimately, a comment from this ex-Googler on Hacker News would indicate that this has long been something Google understood needed conquering:
This was actually my primary role at Google from 2006 to 2010.
One of my first test cases was a certain date range of the Wall Street Journal's archives of their Chinese language pages, where all of the actual text was in a JavaScript string literal, and before my changes, Google thought all of these pages had identical content... just the navigation boilerplate. Since the WSJ didn't do this for its English language pages, my best guess is that they weren't trying to hide content from search engines, but rather trying to work around some old browser bug that incorrectly rendered (or made ugly) Chinese text, but somehow rendering text via JavaScript avoided the bug.
The really interesting parts were (1) trying to make sure that rendering was deterministic (so that identical pages always looked identical to Google for duplicate elimination purposes) (2) detecting when we deviated significantly from real browser behavior (so we didn't generate too many nonsense URLs for the crawler or too many bogus redirects), and (3) making the emulated browser look a bit like IE and Firefox (and later Chrome) at the some time, so we didn't get tons of pages that said "come back using IE" er "please download Firefox".
I ended up modifying SpiderMonkey's bytecode dispatch to help detect when the simulated browser had gone off into the weeds and was likely generating nonsense.
I went through a lot of trouble figuring out the order that different JavaScript events were fired off in IE, FireFox, and Chrome. It turns out that some pages actually fire off events in different orders between a freshly loaded page and a page if you hit the refresh button. (This is when I learned about holding down shift while hitting the browser's reload button to make it act like it was a fresh page fetch.)
At some point, some SEO figured out that random() was always returning 0.5. I'm not sure if anyone figured out that JavaScript always saw the date as sometime in the Summer of 2006, but I presume that has changed. I hope they now set the random seed and the date using a keyed cryptographic hash of all of the loaded javascript and page text, so it's deterministic but very difficult to game. (You can make the date determistic for a month and dates of different pages jump forward at different times by adding an HMAC of page content (mod number of seconds in a month) to the current time, rounding down that time to a month boundary, and then subtracting back the value you added earlier. This prevents excessive index churn from switching all dates at once, and yet gives each page a unique date.)
Now, consider these JavaScript usage statistics across the web from BuiltWith:
JavaScript is obviously here to stay. Most of the web is using it to render content in some form or another. This means there’s potential for search quality to plummet over time if Google couldn't make sense of what content is on pages rendered with JavaScript.
Additionally, Google’s own JavaScript MVW framework, AngularJS, has seen pretty strong adoption as of late. When I attended Google’s I/O conference a few months ago, the recent advancements of Progressive Web Apps and Firebase were being harped upon due to the speed and flexibility they bring to the web. You can only expect that developers will make a stronger push.
Sadly, despite BuiltVisible’s fantastic contributions to the subject, there hasn’t been enough discussion around Progressive Web Apps, Single-Page Applications, and JavaScript frameworks in the SEO space. Instead, there are arguments about 301s vs 302s. Perhaps the latest spike in adoption and the proliferation of PWAs, SPAs, and JS frameworks across different verticals will change that. At iPullRank, we’ve worked with a number of companies who have made the switch to Angular; there's a lot worth discussing on this specific topic.
Additionally, Facebook’s contribution to the JavaScript MVW frameworks, React, is being adopted for the very similar speed and benefits of flexibility in the development process.
However, regarding SEO, the key difference between Angular and React is that, from the beginning, React had a renderToString function built in which allows the content to render properly from the server side. This makes the question of indexation of React pages rather trivial.
AngularJS 1.x, on the other hand, has birthed an SEO best practice wherein you pre-render pages using headless browser-driven snapshot appliance such as Prerender.io, Brombone, etc. This is somewhat ironic, as it's Google’s own product. More on that later.
View Source is dead
As a result of the adoption of these JavaScript frameworks, using View Source to examine the code of a website is an obsolete practice. What you’re seeing in View Source is not the computed Document Object Model (DOM). Rather, you’re seeing the code before it's processed by the browser. The lack of understanding around why you might need to view a page’s code differently is another instance where having a more detailed understanding of the technical components of how the web works is more effective.
Depending on how the page is coded, you may see variables in the place of actual content, or you may not see the completed DOM tree that's there once the page has loaded completely. This is the fundamental reason why, as soon as an SEO hears that there’s JavaScript on the page, the recommendation is to make sure all content is visible without JavaScript.
To illustrate the point further, consider this View Source view of Seamless.com. If you look for the meta description or the rel-canonical on this page, you’ll find variables in the place of the actual copy:
If instead you look at the code in the Elements section of Chrome DevTools or Inspect Element in other browsers, you’ll find the fully executed DOM. You’ll see the variables are now filled in with copy. The URL for the rel-canonical is on the page, as is the meta description:
Since search engines are crawling this way, you may be missing out on the complete story of what's going on if you default to just using View Source to examine the code of the site.
HTTP/2 is on the way
One of Google’s largest points of emphasis is page speed. An understanding of how networking impacts page speed is definitely a must-have to be an effective SEO.
Before HTTP/2 was announced, the HyperText Transfer Protocol specification had not been updated in a very long time. In fact, we’ve been using HTTP/1.1 since 1999. HTTP/2 is a large departure from HTTP/1.1, and I encourage you to read up on it, as it will make a dramatic contribution to the speed of the web.
Quickly though, one of the biggest differences is that HTTP/2 will make use of one TCP (Transmission Control Protocol) connection per origin and “multiplex” the stream. If you’ve ever taken a look at the issues that Google PageSpeed Insights highlights, you’ll notice that one of the primary things that always comes up is limiting the number of HTTP requests/ This is what multiplexing helps eliminate; HTTP/2 opens up one connection to each server, pushing assets across it at the same time, often making determinations of required resources based on the initial resource. With browsers requiring Transport Layer Security (TLS) to leverage HTTP/2, it’s very likely that Google will make some sort of push in the near future to get websites to adopt it. After all, speed and security have been common threads throughout everything in the past five years.
As of late, more hosting providers have been highlighting the fact that they are making HTTP/2 available, which is probably why there’s been a significant jump in its usage this year. The beauty of HTTP/2 is that most browsers already support it and you don’t have to do much to enable it unless your site is not secure.
Definitely keep HTTP/2 on your radar, as it may be the culmination of what Google has been pushing for.
SEO tools are lagging behind search engines
When I think critically about this, SEO tools have always lagged behind the capabilities of search engines. That’s to be expected, though, because SEO tools are built by smaller teams and the most important things must be prioritized. A lack of technical understanding may lead to you believe the information from the tools you use when they are inaccurate.
When you review some of Google’s own documentation, you’ll find that some of my favorite tools are not in line with Google’s specifications. For instance, Google allows you to specify hreflang, rel-canonical, and x-robots in HTTP headers. There's a huge lack of consistency in SEO tools’ ability to check for those directives.
It's possible that you've performed an audit of a site and found it difficult to determine why a page has fallen out of the index. It very well could be because a developer was following Google’s documentation and specifying a directive in an HTTP header, but your SEO tool did not surface it. In fact, it’s generally better to set these at the HTTP header level than to add bytes to your download time by filling up every page’s <head> with them.
Google is crawling headless, despite the computational expense, because they recognize that so much of the web is being transformed by JavaScript. Recently, Screaming Frog made the shift to render the entire page using JS:
To my knowledge, none of the other crawling tools are doing this yet. I do recognize the fact that it would be considerably more expensive for all SEO tools to make this shift because cloud server usage is time-based and it takes significantly more time to render a page in a browser than to just download the main HTML file. How much time?
A ton more time, actually. I just wrote a simple script that just loads the HTML using both cURL and HorsemanJS. cURL took an average of 5.25 milliseconds to download the HTML of the Yahoo homepage. HorsemanJS, on the other hand, took an average of 25,839.25 milliseconds or roughly 26 seconds to render the page. It’s the difference between crawling 686,000 URLs an hour and 138.
Ideally, SEO tools would extract the technologies in use on the site or perform some sort of DIFF operation on a few pages and then offer the option to crawl headless if it’s deemed worthwhile.
Finally, Google’s specs on mobile also say that you can use client-side redirects. I’m not aware of a tool that tracks this. Now, I’m not saying leveraging JavaScript redirects for mobile is the way you should do it. Rather that Google allows it, so we should be able to inspect it easily.
Luckily, until SEO tools catch up, Chrome DevTools does handle a lot of these things. For instance, the HTTP Request and Response headers section will show you x-robots, hreflang, and rel-canonical HTTP headers.
You can also use DevTools’ GeoLocation Emulator to get view the web as though you are in a different location. For those of you who have fond memories of the nearEquals query parameter, this is another way you can get a sense of where you rank in precise locations.
Chrome DevTools also allows you to plug in your Android device and control it from your browser. There’s any number of use cases for this from an SEO perspective, but Simo Ahava wrote a great instructional post on how you can use it to debug your mobile analytics setup. You can do the same on iOS devices in Safari if you have a Mac.
What truly are rankings in 2016?
Rankings are a funny thing and, truthfully, have been for some time now. I, myself, was resistant to the idea of averaged rankings when Google rolled them out in Webmaster Tools/Search Console, but average rankings actually make a lot more sense than what we look at in standard ranking tools. Let me explain.
SEO tools pull rankings based on a situation that doesn't actually exist in the real world. The machines that scrape Google are meant to be clean and otherwise agnostic unless you explicitly specify a location. Effectively, these tools look to understand how rankings would look to users searching for the first time with no context or history with Google. Ranking software emulates a user who is logging onto the web for the first time ever and the first thing they think to do is search for “4ft fishing rod.” Then they continually search for a series of other related and/or unrelated queries without ever actually clicking on a result. Granted. some software may do other things to try and emulate that user, but either way they collect data that is not necessarily reflective of what real users see. And finally, with so many people tracking many of the same keywords so frequently, you have to wonder how much these tools inflate search volume.
The bottom line is that we are ignoring true user context, especially in the mobile arena.
Rankings tools that allow you to track mobile rankings usually let you define one context or they will simply specify “mobile phone” as an option. Cindy Krum’s research indicates that SERP features and rankings will be different based on the combination of user agent, phone make and model, browser, and even the content on their phone.
Rankings tools also ignore the user’s reality of choice. We’re in an era where there are simply so many elements that comprise the SERP, that #1 is simply NOT #1. In some cases, #1 is the 8th choice on the page and far below the fold.
With AdWords having a 4th ad slot, organic being pushed far below the fold, and users not being sure of the difference between organic and paid, being #1 in organic doesn’t mean what it used to. So when we look at rankings reports that tell us we’re number one, we're often deluding ourselves as to what outcome that will drive. When we report that to clients, we're not focusing on actionability or user context. Rather, we are focusing entirely on vanity.
Of course, rankings are not a business goal; they're a measure of potential or opportunity. No matter how much we talk about how they shouldn’t be the main KPI, rankings are still something that SEOs point at to show they’re moving the needle. Therefore we should consider thinking of organic rankings as being relative to the SERP features that surround them.
In other words, I’d like to see rankings include both the standard organic 1–10 ranking as well as the absolute position with regard to Paid, local packs, and featured snippets. Anything else is ignoring the impact of the choices that are overwhelmingly available to the user.
Recently, we’ve seen some upgrades to this effect with Moz making a big change to how they are surfacing features of rankings and I know a number of other tools have highlighted the organic features as well. Who will be the first to highlight the Integrated Search context? After all, many users don’t know the difference.
What is cloaking in 2016?
Cloaking is officially defined as showing search engines something different from the user. What does that mean when Google allows adaptive and responsive sites and crawls both headless and text-based? What does that mean when Googlebot respects 304 response codes?
Under adaptive and responsive models, it's often the case that more or less content is shown for different contexts. This is rare for responsive, as it's meant to reposition and size content by definition, but some implementations may instead reduce content components to make the viewing context work.
In the case when a site responds to screen resolution by changing what content is shown and more content is shown beyond the resolution that Googlebot renders, how do they distinguish that from cloaking?
Similarly, the 304 response code is way to indicate to the client that the content has not been modified since the last time it visited; therefore, there's no reason to download it again.
Googlebot adheres to this response code to keep from being a bandwidth hog. So what’s to stop a webmaster from getting one version of the page indexed, changing it, and then returning a 304?
I don’t know that there are definitive answers to those questions at this point. However, based on what I’m seeing in the wild, these have proven to be opportunities for technical SEOs that are still dedicated to testing and learning.
Crawling
Accessibility of content as a fundamental component that SEOs must examine has not changed. What has changed is the type of analytical effort that needs to go into it. It’s been established that Google’s crawling capabilities have improved dramatically and people like Eric Wu have done a great job of surfacing the granular detail of those capabilities with experiments like JSCrawlability.com
Similarly, I wanted to try an experiment to see how Googlebot behaves once it loads a page. Using LuckyOrange, I attempted to capture a video of Googlebot once it gets to the page:
I installed the LuckyOrange script on a page that hadn’t been indexed yet and set it up so that it only only fires if the user agent contains “googlebot.” Once I was set up, I then invoked Fetch and Render from Search Console. I’d hoped to see mouse scrolling or an attempt at a form fill. Instead, the cursor never moved and Googlebot was only on the page for a few seconds. Later on, I saw another hit from Googlebot to that URL and then the page appeared in the index shortly thereafter. There was no record of the second visit in LuckyOrange.
While I’d like to do more extensive testing on a bigger site to validate this finding, my hypothesis from this anecdotal experience is that Googlebot will come to the site and make a determination of whether a page/site needs to be crawled using the headless crawler. Based on that, they’ll come back to the site using the right crawler for the job.
I encourage you to give it a try as well. You don’t have to use LuckyOrange — you could use HotJar or anything else like it — but here’s my code for LuckyOrange:
jQuery(function() { Window.__lo_site_id = XXXX; if (navigator.userAgent.toLowerCase().indexOf(‘googlebot’) >) { var wa = document.createElement(‘script’); wa.type = ‘text/javascript’; wa.async = true; wa.src = (‘https’ == document.location.protocol ? ‘<a href="https://ssl">https://ssl</a>’ : ’<a href="https://cdn">https://cdn</a>’) + ‘.luckyorange.com/w.js’; var s = document.getElementByTagName(‘script’)[0]; s.parentNode.insertBefore(wa,s); // Tag it with Googlebot window._loq = window._low || []; window._loq .push([“tag”, “Googlebot”]); } ));
The moral of the story, however, is that what Google sees, how often they see it, and so on are still primary questions that we need to answer as SEOs. While it’s not sexy, log file analysis is an absolutely necessary exercise, especially for large-site SEO projects — perhaps now more than ever, due to the complexities of sites. I’d encourage you to listen to everything Marshall Simmonds says in general, but especially on this subject.
To that end, Google’s Crawl Stats in Search Console are utterly useless. These charts tell me what, exactly? Great, thanks Google, you crawled a bunch of pages at some point in February. Cool!
There are any number of log file analysis tools out there, from Kibana in the ELK stack to other tools such as Logz.io. However, the Screaming Frog team has made leaps and bounds in this arena with the recent release of their Log File Analyzer.
Of note with this tool is how easily it handles millions of records, which I hope is an indication of things to come with their Spider tool as well. Irrespective of who makes the tool, the insights that it helps you unlock are incredibly valuable in terms of what’s actually happening.
We had a client last year that was adamant that their losses in organic were not the result of the Penguin update. They believed that it might be due to turning off other traditional and digital campaigns that may have contributed to search volume, or perhaps seasonality or some other factor. Pulling the log files, I was able to layer all of the data from when all of their campaigns were running and show that it was none of those things; rather, Googlebot activity dropped tremendously right after the Penguin update and at the same time as their organic search traffic. The log files made it definitively obvious.
It follows conventionally held SEO wisdom that Googlebot crawls based on the pages that have the highest quality and/or quantity of links pointing to them. In layering the the number of social shares, links, and Googlebot visits for our latest clients, we’re finding that there's more correlation between social shares and crawl activity than links. In the data below, the section of the site with the most links actually gets crawled the least!
These are important insights that you may just be guessing at without taking the time to dig into your log files.
How log files help you understand AngularJS
Like any other web page or application, every request results in a record in the logs. But depending on how the server is setup, there are a ton of lessons that can come out of it with regard to AngularJS setups, especially if you’re pre-rendering using one of the snapshot technologies.
For one of our clients, we found that oftentimes when the snapshot system needed to refresh its cache, it took too long and timed out. Googlebot understands these as 5XX errors.
This behavior leads to those pages falling out of the index, and over time we saw pages jump back and forth between ranking very highly and disappearing altogether, or another page on the site taking its place.
Additionally, we found that there were many instances wherein Googlebot was being misidentified as a human user. In turn, Googlebot was served the AngularJS live page rather than the HTML snapshot. However, despite the fact that Googlebot was not seeing the HTML snapshots for these pages, these pages were still making it into the index and ranking just fine. So we ended up working with the client on a test to remove the snapshot system on sections of the site, and organic search traffic actually improved.
This is directly in line with what Google is saying in their deprecation announcement of the AJAX Crawling scheme. They are able to access content that is rendered using JavaScript and will index anything that is shown at load.
That's not to say that HTML snapshot systems are not worth using. The Googlebot behavior for pre-rendered pages is that they tend to be crawled more quickly and more frequently. My best guess is that this is due to the crawl being less computationally expensive for them to execute. All in all, I’d say using HTML snapshots is still the best practice, but definitely not the only way for Google see these types of sites.
According to Google, you shouldn’t serve snapshots just for them, but for the speed enhancements that the user gets as well.
In general, websites shouldn't pre-render pages only for Google — we expect that you might pre-render pages for performance benefits for users and that you would follow progressive enhancement guidelines. If you pre-render pages, make sure that the content served to Googlebot matches the user's experience, both how it looks and how it interacts. Serving Googlebot different content than a normal user would see is considered cloaking, and would be against our Webmaster Guidelines.
These are highly technical decisions that have a direct influence on organic search visibility. From my experience in interviewing SEOs to join our team at iPullRank over the last year, very few of them understand these concepts or are capable of diagnosing issues with HTML snapshots. These issues are now commonplace and will only continue to grow as these technologies continue to be adopted.
However, if we’re to serve snapshots to the user too, it begs the question: Why would we use the framework in the first place? Naturally, tech stack decisions are ones that are beyond the scope of just SEO, but you might consider a framework that doesn’t require such an appliance, like MeteorJS.
Alternatively, if you definitely want to stick with Angular, consider Angular 2, which supports the new Angular Universal. Angular Universal serves “isomorphic” JavaScript, which is another way to say that it pre-renders its content on the server side.
Angular 2 has a whole host of improvements over Angular 1.x, but I’ll let these Googlers tell you about them.
Before all of the crazy frameworks reared their confusing heads, Google has had one line of thought about emerging technologies — and that is “progressive enhancement.” With many new IoT devices on the horizon, we should be building websites to serve content for the lowest common denominator of functionality and save the bells and whistles for the devices that can render them.
If you're starting from scratch, a good approach is to build your site's structure and navigation using only HTML. Then, once you have the site's pages, links, and content in place, you can spice up the appearance and interface with AJAX. Googlebot will be happy looking at the HTML, while users with modern browsers can enjoy your AJAX bonuses.
In other words, make sure your content is accessible to everyone. Shoutout to Fili Weise for reminding me of that.
Scraping is the fundamental flawed core of SEO analysis
Scraping is fundamental to everything that our SEO tools do. cURL is a library for making and handling HTTP requests. Most popular programming languages have bindings for the library and, as such, most SEO tools leverage the library or something similar to download web pages.
Think of cURL as working similar to downloading a single file from an FTP; in terms of web pages, it doesn’t mean that the page can be viewed in its entirety, because you’re not downloading all of the required files.
This is a fundamental flaw of most SEO software for the very same reason View Source is not a valuable way to view a page’s code anymore. Because there are a number of JavaScript and/or CSS transformations that happen at load, and Google is crawling with headless browsers, you need to look at the Inspect (element) view of the code to get a sense of what Google can actually see.
This is where headless browsing comes into play.
One of the more popular headless browsing libraries is PhantomJS. Many tools outside of the SEO world are written using this library for browser automation. Netflix even has one for scraping and taking screenshots called Sketchy. PhantomJS is built from a rendering engine called QtWebkit, which is to say it’s forked from the same code that Safari (and Chrome before Google forked it into Blink) is based on. While PhantomJS is missing the features of the latest browsers, it has enough features to support most things we need for SEO analysis.
As you can see from the GitHub repository, HTML snapshot software such as Prerender.io is written using this library as well.
PhantomJS has a series of wrapper libraries that make it quite easy to use in a variety of different languages. For those of you interested in using it with NodeJS, check out HorsemanJS.
For those of you that are more familiar with PHP, check out PHP PhantomJS.
A more recent and better qualified addition to the headless browser party is Headless Chromium. As you might have guessed, this is a headless version of the Chrome browser. If I were a betting man, I’d say what we’re looking at here is a some sort of toned-down fork of Googlebot.
To that end, this is probably something that SEO companies should consider when rethinking their own crawling infrastructure in the future, if only for a premium tier of users. If you want to know more about Headless Chrome, check out what Sami Kyostila and Alex Clarke (both Googlers) had to say at BlinkOn 6:
Using in-browser scraping to do what your tools can’t
Although many SEO tools cannot examine the fully rendered DOM, that doesn’t mean that you, as an an individual SEO, have to miss out. Even without leveraging a headless browser, Chrome can be turned into a scraping machine with just a little bit of JavaScript. I’ve talked about this at length in my “How to Scrape Every Single Page on the Web” post. Using a little bit of jQuery, you can effectively select and print anything from a page to the JavaScript Console and then export it to a file in whatever structure you prefer.
Scraping this way allows you to skip a lot of the coding that's required to make sites believe you’re a real user, like authentication and cookie management that has to happen on the server side. Of course, this way of scraping is good for one-offs rather than building software around.
ArtooJS is a bookmarklet made to support in-browser scraping and automating scraping across a series of pages and saving the results to a file as JSON.
A more fully featured solution for this is the Chrome Extension, WebScraper.io. It requires no code and makes the whole process point-and-click.
How to approach content and linking from the technical context
Much of what SEO has been doing for the past few years has devolved into the creation of more content for more links. I don’t know that adding anything to the discussion around how to scale content or build more links is of value at this point, but I suspect there are some opportunities for existing links and content that are not top-of-mind for many people.
Google Looks at Entities First
Googlers announced recently that they look at entities first when reviewing a query. An entity is Google’s representation of proper nouns in their system to distinguish persons, places, and things, and inform their understanding of natural language. At this point in the talk, I ask people to put their hands up if they have an entity strategy. I’ve given the talk a dozen times at this point and there have only been two people to raise their hands.
Bill Slawski is the foremost thought leader on this topic, so I’m going to defer to his wisdom and encourage you to read:
- How Google May Perform Entity Recognition
- SEO and the New Search Results
- Entity Associations With Websites And Related Entities
I would also encourage you to use a natural language processing tool like AlchemyAPI or MonkeyLearn. Better still, use Google’s own Natural Language Processing API to extract entities. The difference between your standard keyword research and entity strategies is that your entity strategy needs to be built from your existing content. So in identifying entities, you’ll want to do your keyword research first and then run those landing pages through an entity extraction tool to see how they line up. You’ll also want to run your competitor landing pages through those same entity extraction APIs to identify what entities are being targeted for those keywords.
TF*IDF
Similarly, Term Frequency/Inverse Document Frequency or TF*IDF is a natural language processing technique that doesn’t get much discussion on this side of the pond. In fact, topic modeling algorithms have been the subject of much-heated debates in the SEO community in the past. The issue of concern is that topic modeling tools have the tendency to push us back towards the Dark Ages of keyword density, rather than considering the idea of creating content that has utility for users. However, in many European countries they swear by TF*IDF (or WDF*IDF — Within Document Frequency/Inverse Document Frequency) as a key technique that drives up organic visibility even without links.
After hanging out in Germany a bit last year, some folks were able to convince me that taking another look at TF*IDF was worth it. So, we did and then we started working it into our content optimization process.
In Searchmetrics’ 2014 study of ranking factors they found that while TF*IDF specifically actually had a negative correlation with visibility, relevant and proof terms have strong positive correlations.
Based on their examination of these factors, Searchmetrics made the call to drop TF*IDF from their analysis altogether in 2015 in favor of the proof terms and relevant terms. Year over year the positive correlation holds for those types of terms, albeit not as high.
In Moz’s own 2015 ranking factors, we find that LDA and TF*IDF related items remain in the highest on-page content factors.
In effect, no matter what model you look at, the general idea is to use related keywords in your copy in order to rank better for your primary target keyword, because it works.
Now, I can’t say we’ve examined the tactic in isolation, but I can say that the pages that we’ve optimized using TF*IDF have seen bigger jumps in rankings than those without it. While we leverage OnPage.org’s TF*IDF tool, we don’t follow it using hard and fast numerical rules. Instead, we allow the related keywords to influence ideation and then use them as they make sense.
At the very least, this order of technical optimization of content needs to revisited. While you’re at it, you should consider the other tactics that Cyrus Shepard called out as well in order to get more mileage out of your content marketing efforts.
302s vs 301s — seriously?
As of late, a reexamination of the 301 vs. 302 redirect has come back up in the SEO echo chamber. I get the sense that Webmaster Trends Analysts in the public eye either like attention or are just bored, so they’ll issue vague tweets just to see what happens.
For those of you who prefer to do work rather than wait for Gary Illyes to tweet, all I’ve got is some data to share.
Once upon a time, we worked with a large media organization. As is par for the course with these types of organizations, their tech team was resistant to implementing much of our recommendations. Yet they had millions of links both internally and externally pointing to URLs that returned 302 response codes.
After many meetings, and a more compelling business case, the one substantial thing that we were able to convince them to do was switch those 302s into 301s. Nearly overnight there was an increase in rankings in the 1–3 rank zone.
Despite seasonality, there was a jump in organic Search traffic as well.
To reiterate, the only substantial change at this point was the 302 to 301 switch. It resulted in a few million more organic search visits month over month. Granted, this was a year ago, but until someone can show me the same happening or no traffic loss when you switch from 301s to 302s, there’s no discussion for us to have.
Internal linking, the technical approach
Under the PageRank model, it’s an axiom that the flow of link equity through the site is an incredibly important component to examine. Unfortunately, so much of the discussion with clients is only on the external links and not about how to better maximize the link equity that a site already has.
There are a number of tools out there that bring this concept to the forefront. For instance, Searchmetrics calculates and visualizes the flow of link equity throughout the site. This gives you a sense of where you can build internal links to make other pages stronger.
Additionally, Paul Shapiro put together a compelling post on how you can calculate a version of internal PageRank for free using the statistical computing software R.
Either of these approaches is incredibly valuable to offering more visibility to content and very much fall in the bucket of what technical SEO can offer.
Structured data is the future of organic search
The popular one-liner is that Google is looking to become the presentation layer of the web. I say, help them do it!
There has been much discussion about how Google is taking our content and attempting to cut our own websites out of the picture. With the traffic boon that the industry has seen from sites making it into the featured snippet, it’s pretty obvious that, in many cases, there's more value for you in Google taking your content than in them not.
With Vocal Search appliances on mobile devices and the forthcoming Google Home, there's only one answer that the user receives. That is to say that the Star Trek computer Google is building is not going to read every result — just one. These answers are fueled by rich cards and featured snippets, which are in turn fueled by structured data.
Google has actually done us a huge favor regarding structured data in updating the specifications that allow JSON-LD. Before this, Schema.org was a matter of making very tedious and specific changes to code with little ROI. Now structured data powers a number of components of the SERP and can simply be placed at the <HEAD> of a document quite easily. Now is the time to revisit implementing the extra markup. Builtvisible’s guide to Structured Data remains the gold standard.
Page speed is still Google’s obsession
Google has very aggressive expectations around page speed, especially for the mobile context. They want the above-the-fold content to load within one second. However, 800 milliseconds of that time is pretty much out of your control.
Based on what you can directly affect, as an SEO, you have 200 milliseconds to make content appear on the screen. A lot of what can be done on-page to influence the speed at which things load is optimizing the page for critical rendering path.
To understand this concept, first we have to take a bit of a step back to get a sense of how browsers construct a web page.
- The browser takes the uniform resource locator (URL) that you specify in your address bar and performs a DNS lookup on the domain name.
- Once a socket is open and a connection is negotiated, it then asks the server for the HTML of the page you’ve requested.
- The browser begins to parse the HTML into the Document Object Model until it encounters CSS, then it starts to parse the CSS into the CSS Object Model.
- If at any point it runs into JavaScript, it will pause the DOM and/or CSSOM construction until the JavaScript completes execution, unless it is asynchronous.
- Once all of this is complete, the browser constructs the Render Tree, which then builds the layout of the page and finally the elements of the page are painted.
In the Timeline section of Chrome DevTools, you can see the individual operations as they happen and how they contribute to load time. In the timeline at the top, you’ll always see the visualization as mostly yellow because JavaScript execution takes the most time out of any part of page construction. JavaScript causes page construction to halt until the the script execution is complete. This is called “render-blocking” JavaScript.
That term may sound familiar to you because you’ve poked around in PageSpeed Insights looking for answers on how to make improvements and “Eliminate Render-blocking JavaScript” is a common one. The tool is primarily built to support optimization for the Critical Rendering Path. A lot of the recommendations involve issues like sizing resources statically, using asynchronous scripts, and specifying image dimensions.
Additionally, external resources contribute significantly to page load time. For instance, I always see Chartbeat’s library taking 3 or more seconds just to resolve the DNS. These are all things that need to be reviewed when considering how to make a page load faster.
If you know much about the Accelerated Mobile Pages (AMP) specification, a lot of what I just highlighted might sound very familiar to you.
Essentially, AMP exists because Google believes the general public is bad at coding. So they made a subset of HTML and threw a global CDN behind it to make your pages hit the 1 second mark. Personally, I have a strong aversion to AMP, but as many of us predicted at the top of the year, Google has rolled AMP out beyond just the media vertical and into all types of pages in the SERP. The roadmap indicates that there is a lot more coming, so it’s definitely something we should dig into and look to capitalize on.
Using pre-browsing directives to speed things up
To support site speed improvements, most browsers have pre-browsing resource hints. These hints allow you to indicate to the browser that a file will be needed later in the page, so while the components of the browser are idle, it can download or connect to those resources now. Chrome specifically looks to do these things automatically when it can, and may ignore your specification altogether. However, these directives operate much like the rel-canonical tag — you're more likely to get value out of them than not.
- Rel-preconnect – This directive allows you to resolve the DNS, initiate the TCP handshake, and negotiate the TLS tunnel between the client and server before you need to. When you don’t do this, these things happen one after another for each resource rather than simultaneously. As the diagram below indicates, in some cases you can shave nearly half a second off just by doing this. Alternatively, if you just want to resolve the DNS in advance, you could use rel-dns-prefetch.
If you see a lot of idle time in your Timeline in Chrome DevTools, rel-preconnect can help you shave some of that off.
You can specify rel-preconnect with<link rel=”preconnect” href=”https://domain.com”>
or rel-dns-prefetch with<link rel=”dns-prefetch” href=”domain.com”>
- Rel-prefetch – This directive allows you to download a resource for a page that will be needed in the future. For instance, if you want to pull the stylesheet of the next page or download the HTML for the next page, you can do so by specifying it as
<link rel=”prefetch” href=”nextpage.html”>
- Rel-prerender – Not to be confused with the aforementioned Prerender.io, rel-prerender is a directive that allows you to load an entire page and all of its resources in an invisible tab. Once the user clicks a link to go to that URL, the page appears instantly. If the user instead clicks on a link that you did not specify as the rel-prerender, the prerendered page is deleted from memory. You specify the rel-prerender as follows:
<link rel=”prerender” href=”nextpage.html”>
I’ve talked about rel-prerender in the past in my post about how I improved our site’s speed 68.35% with one line of code.
There are a number of caveats that come with rel-prerender, but the most important one is that you can only specify one page at a time and only one rel-prerender can be specified across all Chrome threads. In my post I talk about how to leverage the Google Analytics API to make the best guess at the URL the user is likely going to visit next.
If you’re using an analytics package that isn’t Google Analytics, or if you have ads on your pages, it will falsely count prerender hits as actual views to the page. What you’ll want to do is wrap any JavaScript that you don’t want to fire until the page is actually in view in the Page Visibility API. Effectively, you’ll only fire analytics or show ads when the page is actually visible.
Finally, keep in mind that rel-prerender does not work with Firefox, iOS Safari, Opera Mini, or Android’s browser. Not sure why they didn’t get invited to the pre-party, but I wouldn’t recommend using it on a mobile device anyway. - Rel-preload and rel-subresource – Following the same pattern as above, rel-preload and rel-subresource allow you to load things within the same page before they are needed. Rel-subresource is Chrome-specific, while rel-preload works for Chrome, Android, and Opera.
Finally, keep in mind that Chrome is sophisticated enough to make attempts at all of these things. Your resource hints help them develop the 100% confidence level to act on them. Chrome is making a series of predictions based on everything you type into the address bar and it keeps track of whether or not it’s making the right predictions to determine what to preconnect and prerender for you. Check out chrome://predictors to see what Chrome has been predicting based on your behavior.
Where does SEO go from here?
Being a strong SEO requires a series of skills that's difficult for a single person to be great at. For instance, an SEO with strong technical skills may find it difficult to perform effective outreach or vice-versa. Naturally, SEO is already stratified between on- and off-page in that way. However, the technical skill requirement has continued to grow dramatically in the past few years.
There are a number of skills that have always given technical SEOs an unfair advantage, such as web and software development skills or even statistical modeling skills. Perhaps it's time to officially further stratify technical SEO from traditional content-driven on-page optimizations, since much of the skillset required is more that of a web developer and network administrator than that of what is typically thought of as SEO (at least at this stage in the game). As an industry, we should consider a role of an SEO Engineer, as some organizations already have.
At the very least, the SEO Engineer will need to have a grasp of all of the following to truly capitalize on these technical opportunities:
- Document Object Model – An understanding of the building blocks of web browsers is fundamental to the understanding how how we front-end developers manipulate the web as they build it.
- Critical Rendering Path – An understanding of how a browser constructs a page and what goes into the rendering of the page will help with the speed enhancements that Google is more aggressively requiring.
- Structured Data and Markup – An understanding of how metadata can be specified to influence how Google understands the information being presented.
- Page Speed – An understanding of the rest of the coding and networking components that impact page load times is the natural next step to getting page speed up. Of course, this is a much bigger deal than SEO, as it impacts the general user experience.
- Log File Analysis – An understanding of how search engines traverse websites and what they deem as important and accessible is a requirement, especially with the advent of new front-end technologies.
- SEO for JavaScript Frameworks – An understanding of the implications of leveraging one of the popular frameworks for front-end development, as well as a detailed understanding of how, why, and when an HTML snapshot appliance may be required and what it takes to implement them is critical. Just the other day, Justin Briggs collected most of the knowledge on this topic in one place and broke it down to its components. I encourage you to check it out.
- Chrome DevTools – An understanding of one of the most the powerful tools in the SEO toolkit, the Chrome web browser itself. Chrome DevTools’ features coupled with a few third-party plugins close the gaps for many things that SEO tools cannot currently analyze. The SEO Engineer needs to be able to build something quick to get the answers to questions that were previously unasked by our industry.
- Acclerated Mobile Pages & Facebook Instant Pages – If the AMP Roadmap is any indication, Facebook Instant Pages is a similar specification and I suspect it will be difficult for them to continue to exist exclusively.
- HTTP/2 – An understanding of how this protocol will dramatically change the speed of the web and the SEO implications of migrating from HTTP/1.1.
Let’s Make SEO Great Again
One of the things that always made SEO interesting and its thought leaders so compelling was that we tested, learned, and shared that knowledge so heavily. It seems that that culture of testing and learning was drowned in the content deluge. Perhaps many of those types of folks disappeared as the tactics they knew and loved were swallowed by Google’s zoo animals. Perhaps our continually eroding data makes it more and more difficult to draw strong conclusions.
Whatever the case, right now, there are far fewer people publicly testing and discovering opportunities. We need to demand more from our industry, our tools, our clients, our agencies, and ourselves.
Let’s stop chasing the content train and get back to making experiences that perform.
This is the exactly the kind of articles we should see more. Too often I get the impression that many SEO's prefer to stay in their comfort zone, and have endless discussions on the nitty gritty details (as the 301/302 discussion), rather than seeing the bigger picture.
A post like this is a reminder that technology is evolving fast, and that SEO's should adapt to the changing environment. It's probably impossible to cover all these topics in detail in one article, but the links you mention provide excellent starting points / reference guides.
Hey Dirk,
Thanks for reading. I think it's human nature to want to stay in your comfort zone, but when the rate of change outside of your organization is much faster than the rate of change inside it you are in trouble.
Glad you got some value out of this. I'm going to attempt to blog more regularly on the more technical things because there is so much more to talk about.
-Mike
But but but... wasn't technical SEO only make-up???????!!!!! :-P
Ok... I must go feeding my unicorns.
LOL. Shots fired!
Lol.
You can only put so much make-up on a pig... and the poor thing is still a pig when you're done.
Technical SEO FTW!
I will probably have to read this at least 10 times to comprehend everything you are talking about, and that doesn't count all the great resources you linked to. I am not complaining, I will just say thank you and ask for more. Articles like the above are a great source of learning. Unfortunately we don't spend the necessary time these days diving deep into subjects and instead look for the dumbed down or Cliffsnotes version.
No problem, Justin. Happy to help. Feel free to reach out if you come across anything that needs further clarification. I can likely use it to determine what to write about next.
-Mike
You know you've read something that's so incredibly valuable when you have opened up 10+ links in new tabs to research further, haha!
Literally, the best post I've read all year..
I love your proposition of a role of SEO Engineer. I feel this role is inevitable and there will be many developers with a interest in SEO looking to fulfill those jobs.
Truth. My browser window is jam-packed with new tabs. Time for some light reading!
Hey Steffan,
Thank you, sir. Glad you got some value out of this.
I've seen this role here and there. When I was at Razorfish it was a title that some of the more senior SEO folks had. I've seen it popup recently at Conde Nast, but I don't know that it's a widely adopted idea. Generally speaking though, I think that for what I'm describing it's easier to get a front end developer and tech them SEO than it is to go the other direction. Although, I would love to see that change as people put more time into building their technical skills.
-Mike
Great write up! Like you, I started in 1995 as well, and held the rank of "Webmaster" before expanding into other areas of digital marketing (paid and organic), but SEO work was always part of the mix.
The technical side of SEO cannot be undervalued, even in this day in age, and one of the reasons why we always include a section on "Site Architecture" in our audits, alongside reviews of Content and Inbound Links. It is all three of these areas working together that are the focus of the search engines, and a misstep in one or more of them causes most of the issues that companies suffer when it comes to organic search traffic.
Where we disagree is probably more a semantic issue than anything else. Frankly, I think that set of folks during the early days of search engines that were keyword stuffing and doing their best to trick the search engines shouldn't even be included in the ranks of SEOs, because what they were doing was "cheating." These days, when I see an article that starts, "SEO has changed a lot over the years," I cringe because SEO really hasn't changed - the search engines have adapted to make life difficult for the cheaters. The true SEOs of the world have always focused on the real issues surrounding Content, Site Architecture, and Inbound Links while watching the black hats complain incessantly about how Google is picking on them, like a speeder blaming the cop for getting a ticket.
I think stewards of the faith like me, you, and Rand, will always have a place in the world, but I see the next evolution of SEO being less about "dying" and more about becoming part of the everyday tasks of multiple people throughout the organization, to the point where it is no longer considered a "thing" in and of itself, but more just a way of doing business in a time where search engines exist.
Hey Jeff,
An OG! #respect.
As far as our disagreement, it's kinda liked Jedi vs. the Sith. They both use the Force. Whether or not they use it the way that you like, it's still a remarkable display of power.
I have respect for a lot of the SEOs that came before me both white and black hat. I appreciate what they were able to accomplish. While I'd never do that type of stuff for my clients, I respect that the black hat curiosity yielded some cool hacks and lighter versions of those made it to the other side as well. I'm pretty sure that even Rand bought links back in the day before he decided to take a different approach.
I believe that SEO has matured, but so has the web in general and more and more people understand their responsibility as a marketer. So SEO has certainly changed, but it's certainly not dying. SEO as it was originally known is more vibrant than ever.
Thanks for weighing in Jeff!
-Mike
Monster of a post, Mike.
Thanks for the kind mentions, too. Trying to help close that gap a bit!
Dan
Hey Dan,
On behalf of the SEO industry, thanks for what you guys do over there. You have changed the game the world over with the software you build. We all appreciate you for it.
-Mike
Awesome post with lots of great info - Though I must admit to an initial skim-read only as it's one of those "Go get a pot of coffee and some paper & come back to digest properly" posts!
Glad to see Screaming Frog mentioned, I love that tool and use the paid version all the time, I've only used a trial of their logfile analyser so far though, as I tend to stick log files into a MySQL database to enable me to run specific queries. Though I'll probably buy the SF analyser soon, as their products are always awesome, especially when large volumes are concerned.
I'll be back to comment after reading fully, but felt compelled to comment as on an initial skim, this looks like an amazing post :)
Hey Mike,
Agreed, I used to do the same thing with log files and in some cases I still do when they are log files that don't fit a standard setup. Often site admins add some custom stuff and it's difficult for anything to auto-detect. That said, Screaming Frog's tool does a great job and I use it most of the time for our log file analysis as of late.
Thanks for reading.
-Mike
Hi Mike,
I must admit I was a little disappointed by this...I gave a talk earlier this week at a conference around the power of technical SEO & how it has been brushed under-the-rug w/ all of the other exciting things we can do as marketers & SEOs. However, if I would have seen this post prior to my presentation, I could have simply walked on stage, put up a slide w/ a link to your post, dropped the mic, and walked off as the best presenter of the week.
//301302complimentredirectincoming
So, on a serious note, industry post of the year.
Outside of the insane technical knowledge drop (i.e. - the View Source section was on-point and important for us to understand how to fully process a page as a search engine would rather than "I can't see it in the HTML, it doesn't exist!"), I think the most valuable point tying everything that we do together, came near the end: "It seems that that culture of testing and learning was drowned in the content deluge."
I think what makes our industry great is the willingness of brilliant people to share their findings (good or bad) with complete transparency. There isn't a sense of secrecy or a feeling that we need to hoard information to "stay on top". In fact, sharing not only helps elevate one's own position, but helps earn respect for the industry as a whole.
This post helps not only motivate, but reinforce the idea that everyone should be constantly testing, growing, learning, trying, doing...not waiting for the next tweet about what to do and how to do it. I feel like many of us have told developers how to do something but have no actual clue what that type of work entails (I remember when I first started SEO, I went on about header tags and urged clients to fix theirs - it wasn't until I used Firebug to get the appropriate CSS to help a client revamp their header structure while keeping the same design that I truly understood the whole picture -- it was a great feeling). I'm not saying that every SEO or digital marketer needs to be able to write their own python program, but we should be able to understand (and where applicable, apply) the core concepts that come with technical SEO.
Thank you for this wake up call. As a result, I am going to revive my terrible golf blog to once again serve as my technical SEO sandbox.
Congrats on the stellar post.
-AK
I really love all of the technical details in this post. Who else had their Chrome DevTools open while they read this?
Hey Cynthia,
Glad you enjoyed this. Here are a few other things that you may not have known that it does: https://tutorialzine.com/2015/03/15-must-know-chrom...
-Mike
Thanks for the mention, Mike, and for the many thoughtful notions you shared in this post. I will be revising your post here a few times; you've provided a lot for us to feast upon.
Hey Bill,
No problem, man. I appreciate you for all that you have parsed out and shared with the community. I will continue to be an avid reader and supporter.
-Mike
p.s. That was me cheering super loud for you in Vegas. =)
Mike, I cannot put into adequate words how much I absolutely love this post!
In the past, we've always divided SEO into " technical / on page" and "off page," but as Google has become smarter, I've personally always thought that the best "off page" SEO is just PR and publicity by another name. As a result, I think we're increasingly going to need to focus on all of the things that Mike has discussed here. Yes, it's technical and complicated -- but it's very important.
After all, from a business standpoint, technical SEO is the one thing that we can do that no one else can do. Most developers, system administrators, and DevOps engineers don't even know that stuff. It's our "unique product quality," so to speak.
Of course, I'm a little biased. I spoke on server log analysis at MozCon in September. For those who would like to learn more about it, here is a link to a post on my personal blog with my deck and accompanying notes on my presentation and what technical SEO things we need to examine in server logs. (My post also contains links to my company's informational material on the open source ELK Stack that Mike mentioned in this post on how people can deploy it themselves for server log analysis. I'd appreciate any feedback!)
Hey Samuel,
Don't worry about the adequate words, I think I put enough on the screen as it is. =)
I agree that off-page is just PR, but I'd say it's a more focused PR. Nevertheless, the people who tend to be best at it are the Lexi Mills' of the world who can pick up the phone and convince someone to give them coverage rather than the email spammer. That is not to say that there isn't an art to email outreach, but as an industry we approach it as a numbers game.
Thanks for sharing your post. Log file analysis doesn't get enough love for how powerful it still is in this day and age.
Congrats on speaking at MozCon.
-Mike
Thanks for the shouts Mike!!
Thanks for being the agency with the tech chops that everyone should aspire to have.
-Mike
Mike I've been wondering when you were going to turn this into a blog post after seeing this presentation at Confluence in Oklahoma last year (just after you gave it at Inbound) and then reviewing the slides on Slideshare several more times.
Thank you for contributing this. It will be required reading for our entire team.
Hey Everett,
That was actually a different deck at Confluence and Inbound last year. That one was called "Technical Marketing is the Price of Admission." https://www.slideshare.net/ipullrank/technical-mark... That one talks more about the T-shaped skillset that I believe all marketers should have.
This one was "The Technical SEO Renaissance." I gave it the first time this year SearchFest in Portland.
There's definitely a lot of overlap, but I'd say that folks should check the the first one out before they dig into this one.
Either way, thanks for reading Everett and if anyone on your team has questions as they're digging in, have them reach out. I'm happy to help!
-Mike
WOW! A truly epic article dealing with on-page seo. This should be a must read for anyone doing seo. I don't gush often but this really has to be one of the best posts / articles I have read this year. Kudos to you Michael King.
Thanks Brad. I appreciate you reading and taking the time to comment.
-Mike
Wow, what a great (but long) read!
I'm relatively new to the SEO game in comparison to you and I have to agree that more than ever, technical knowledge is a very important aspect of modern SEO.
I would particularly suggest that the Schema.org markup for Google rich snippets is an increasingly important part of how Google will showcase webpages in its SERPS and hence (most likely) increase CTR.
Only a few weeks ago Google introduced its fact checking label to differentiate the trustworthy news from the garbage. To have your web article indexed as a trustworthy news item - a knowledge of schema.org markup is needed.
For example, I did a search for "banana bread recipes" using google.com.au today and all the first page results were of pages that were marked up for rich snippets (showcasing cooking times, reviews, ratings etc...)
For me, I think we are entering a more developed era of the semantic web and thus technical knowledge is definitely a requirement.
In addition, while I agree that CMS such as Wordpress have fantastic support for search engines, I feel that I'm constantly manipulating the PHP of many themes to get the on-page stuff "just right".
Anyone else?
Hey JWHope,
Thanks for taking the time to read.
I agree that structured data is the future of many things. Cindy Krum called it a couple years ago when she predicted that Google was going to go after the card format for a lot of things. I think we're just seeing the beginning of that and Rich Cards is a perfect example of that being powered directly by structured data. In other words, people that get the jump on using Structured Data are going to win in the long run. The difficulty is that it's hard to see direct value from a lot of the vocabularies so it's not easy to get clients to implement it.
To your point of always manipulating code to get things just right...that is the story of my life.
-Mike
Thank you for getting back to me Mike, I have to agree with others on here that this is probably the most informed and interesting reads I have read all year.
I'm still learning the structured data markup, especially making sure that the correct category is used for the right reasons. I can only see the schema.org directory of categories expanding to accomodate for more niche businesses in the future.
Also, its good to hear that I'm not alone in making changes to pre-defined code. Sometimes I wish I was a good enough coder to create a CMS myself!
Thanks again
P.S. I've just followed Cindy Krum on Twitter - cheers for the heads up!
Mike! This post is pure justice. Great to see you writing in the space again, I had noticed you'd gone a lot more quiet in the last year or so.
I have to agree mostly with the concept that tools for SEO really do lag. I remember 4 years ago trying to find a tool that nailed local SEO rank tracking. A lot claimed they did, but in actual fact they didn't. Most would let you set a location but didn't actually track the snack pack as a separate entity (if at all). In fact, the only rank tracking tool I found back then that nailed local was Advanced Web Ranking, and still to this day it's the only tool doing so from what I've seen. That's pretty poor seeing how long local results have been around now.
I regularly work on international campaigns now and I completely agree there are limitations in this area. I've tested a few tools that audit hreflang for example and I'm yet to discover anything that will go off at the click of a button, crawl all your rules and return a simple list stating which rules are broken and why. Furthermore, I don't think any rank tracking tool exists which checks hreflang rules next to ranking and flags when an incorrect URL is showing up in any given region. The agency I work for had to build this ourselves for a client, initially using Excel before shifting over to the awesome Klipfolio. Still, life would have been easier and quicker if we could have just tracked such a thing from the outset.
I don't want to discredit anyone building these tools of course. Most SEO software developers out there have their own unique strong points, continually strive to improve and are very open to user feedback (particularly Screaming Frog, I don't think they've ever carried out an update that wasn't amazing). It does often feel by the time something really useful is added to any given tool, something else in the SEO industry has changed and needs attention, which is sadly something no one can change unless Google one day (unlikely) says "Yeah, we've nailed search nothing will ever change ever again".
Hey Dan,
Yep, I've been more focusing on building iPullRank so I haven't been making the time to blog enough. When I have, it's mostly been on our site. Moving into 2017, it's my goal to change that though. So hopefully I'll be able to share more stuff!
Yes, it's difficult dealing with the limitations of tools because of the speed at which things change. I never really thought too much about it before, because I always roll my own when I come up to something that my favorite tool doesn't do.
Love that you're using Klipfolio. I'm a big fan of that product and that team. All of our reporting is going through them. I wish more people knew about them.
I also don't want to discredit anyone on the software side. I understand that it's difficult to build software that tens of thousands of people use. There are a lot of competing priorities and then just the general issues that come with running a business. However, I do think that if it's something in Google's specs, all tools should make it a priority to universally support it.
Thanks for reading and weighing in!
-Mike
Damn, THIS is how you write a blog post.
It'd be generous to say I fully understood even half of this. Looks like I've got some learning to do.
Thanks for reading Ben. If you get stuck as you're going through things, feel free to reach out. I'm happy to help!
-Mike
Yo! I would have commented sooner but my computer started on FIREE!!! -Thanks to all your brilliant links, resources and crawling concepts. :) This could have been 6 home run posts, but you've instead gifted us with a perfectly wrapped gem. Thank you, thank you, thank you!
The Lucky Orange Gbot test is genius!!! A little salty that I didn't think of that first...love Lucky Orange!
Have been talking to our Pro dev team about integrating a header call for sites. -Thank you for the positive reinforcement! :)
I started clapping like a baby seal at "It resulted in a few million more organic search visits month over month. Granted, this was a year ago, but until someone can show me the same happening or no traffic loss when you switch from 301s to 302s, there’s no discussion for us to have." -BOOM!
+R internal PageRank crawl works like a charm, brilliant!
+That Angular2 vid is so sick!
+Had no idea about Headless Chrome?!
Quick Questions:
Thanks Britney! Glad I can help. Super hype that you're already putting things into play or working out how to.
Yep. I personally don't do Google Sheets scraping and most of the Excel-based scraping is annoying to me because you have to do all this manipulation within Excel to get one value. All of my scraping these days is either PHP scripts or NodeJS scripts.
I feel like Google believes they are in a good place with links and content so they will continue to push for speed and mobile-friendliness. So the best technical SEO tactic right now is making you site faster. After that, improving your internal linking structure.
I have not, but there are frankly not that many sites that are on my radar that have implemented it and yeah, the IETF and W3C sites take me back to my days of using a 30 day trial account on Prodigy. Good grief.
The hosting providers that are rolling it out are making it easy. In fact, if you use WPEngine, they've just made it so your SSL cert is free so you can leverage HTTP/2. Based on this AWS doc, it sounds like it's pretty simple if you're managing a server as well. It's a little harder if you have to config from scratch though. I've only done it the easy way. =)
-Mike
Thank you Michael. I was pleasantly surprised to see this in-depth article on technical SEO. To me, this is a critical part of your site architecture, which forms a cornerstone of any SEO strategy. Of course there are basic checklists of things to include (sitemap, robots, tags). But the way this article delves into relatively new technologies is certainly appreciated.
The pattern of SEO is always changing day by day. That's true that SEO is now invisible and we have to be positive with them.
Brilliant post. Time to learn some new Tech SEO skills :)
Hey Andrew,
Thanks for reading. Let me know if you run into any trouble in your learning process. Happy to help out.
-Mike
This is the true story of SEO !
Hey Moz editors -- a suggestion for making Mike's post more effective: Instruct readers to open it in a new browser window before diving in.
As others have commented, a byproduct of this epicness is a dozen+ open browser tabs and a ream of knowledge. In my case, said tabs have been saved to a new bookmarks folder labeled 'Technical SEO Tornado' that contains my morning reading material for days to come.
Amazing. Awesome. Kudos. Thank you.
Great work Mike....there's enough great stuff here for half a dozen posts! Love the 301/302 jab :-)
Thanks MC!
-Mike
Great post outlining the importance of technical SEO and it's importance and role in helping a website to rank. Without a solid foundation of technical and On-Page SEO it is extremely difficult for a website to rank.
The most popular blog platform Wordpress has the tendency to produce thousands of thin content pages through use of tags and while these are good for users to find the list of posts on a topic, they have to be noindexed or the website may be hit by the Panda algo.
Thanks for reading and taking the time to comment Joseph.
-Mike
Really great post! Thanks for sharing,
I agree with the idea that technical knowledge is definitely a requirement.
Amazing post mate!
Direct to bookmark and probably I'm going to read 20 times, thanks a lot for this ;)
This post is a monster! Erudite, unbelievably well researched, spectacularly executed and displayed, with incredibly useful, relevant links to authority sites, experts and tools, with easy to understand examples and fascinating screenshots.
Superb post!
Excellent post and one which I've already read a number of times. It makes me realise how little I know! and that I can' t be complacent and need to keep learning! It's not an easy read as there is so much in it. I have now read more about TF*IDF thanks to this post and aim to investigate it further. Cheers.
It's a complete guide of SEO in a single and wonderful post, I never can get such a helpful way to learn SEO like this.
Instant classic, must read. Thanks for saying so many things that needed to be said and sharing so much insight and tips.
WOW!
Thanks for the shouts!!!
Wow super long and detailed post ! Will probably have to read this another 5 times over many more months from now to fully comprehend and put into my daily role as a webmaster. Thanks for sharing.
Very nice roundup and exactely the technical depth I prefer. Thanks bro!
Stellar article Mike! Definitely a must-read for every savvy SEO + everyone thinking about getting into the SEO game. A worthy contender for SEO article of the year - you got my vote :)
Thanks for the kind mentions of OnPage.org! #YouRock
Obligatory, thanks for the shout out comment.
This is one of the most awesome, thoroughly researched, deep insight, comprehensive articles I've ever read on the web .... on *any* subject. I've been doing SEO now since before Google and there's still tons to learn on this page. Bookmarked for multiple re-reads.
iPullRank, kudos to you, sir.
Ah the old days man I had all of the adult words wrapped up including the single three letter word "sex" on the first page of G. That was a really nice article thanks for writing it. Your writing definitely shows the little nuances in the world we call technical SEO. The things that true SEO artist care about.
Funny on one hand you can say nothing has really changed about SEO since before there was a name or the acronym of SEO and on the other hand you can say everything has changed in SEO. I kind of enjoy the complexity change in the the breadth and depth of SEO cause it was a little boring in the past. I used to be able to guarantee first page results for any keyword and you (well, maybe just I ) can't really do that to the same degree.
This post is gold! Finally an in-depth POV on SEO that describes what we need to consider. Starting off as coder in 99' and morphing into SEO I was very keen on the technical aspect with regard to the science of SEO. My creative nature helped with art aspect of SEO. Great post, Michael. Bravo Zulu.
sure thing! anytime.
Within the 302 vs. 301 paragraph, you mention the culture of testing. What do you say about the recent tests done by LRT? They found that 302 were the most effective in the sense that there were no hiccups while the redirect (+ link juice, anchor text) was completely transfered.
Here is the link to that study: https://www.linkresearchtools.com/case-studies/11-t...
Just a disclosure: I am in no means associated with LRT or trying to promote them other than the data they provided.
Hi Mike, what a great post! so refreshig to read something like that that goes through so much relevant things and go deep into each one of them, instead of all the more of the same short articles we tend to see latley.
I completly agree that technicdl seo was and still a crucial thing in our strategy, where there are a lot of other things that seo contains today the technical elemnts are thd foundation of everything we do, its the base of our strategy and no seo should negldct them.
Unfortunatly when working as a consultant in agency those exactly things are the hardest to implement or shoukd i say its the hardest thing to convince the developers at the clients to do it :) more and more i realize that an seo MUST have a technical approach and understanding and also in the client side there must be a function that understand both seo and the technical
Once again greaf article! looking forward to read more from you!
Hi, fantastic post.
I'm really you mentioned internal linking and area I was (stupidly) skeptical a year ago.
Shapiro's internal page rank theory is quite interesting, always based on the assumption that most of the internal pages don't get external links, however it doesn't take into consideration the traffic potential or user engagement metric of those pages. I found that Ahrefs does a good job telling which pages are the most powerful in terms of search, also another interesting idea, is the one Rand Fishkin gave to Unbounce https://unbounce.com/conversion-rate-optimization/r... ; To do a site search + the keyword and see what pages Google is already association with the particular keyword and get links from those pages specially.
Thanks again.
Amazing read with a lot of useful resources! Forwarding this to my partner who is doing all the technical work on all of our projects.
Though I never understood technical SEO past the basic understanding of these concepts and practices, I strongly understood the gap that exists between the technical and the marketing part. This gap humbles me beyond words, and helps me truly appreciate the SEO industry. The more complex it becomes, the more humble I get, and I love it.
Not accepting this reality is what brings a bad rep to the entire industry, and it allows overnight SEO gurus to get away with nonsense and a false sense of confidence while repeating the mantra I-can-rank-everything.
What a fantastic post! Yes, the points on technical SEO you mentioned in this epic post is worthy follow, and I am bookmarking it for my further use. There is always room for improvements. I am constantly and altering and updating my site.
A very well-researched article. Thanks for sharing!
Great post, thanks for it!
Don't forget the 307 status code as 302s official successor on HTTP1.1
I'm New on the seo space, but this article opened my eyes of what challenges I will face.
> Technical SEO is the price of admission.
I said as much. My point was not to dismiss technical SEO, but rather that everyone is doing it already, so it's no game changer.
I have yet to work with any client, large or small, who has ever done technical SEO to the extent that Mike detailed. I see poor implementations of Angular sites that will *never* be found in a search result without SEOs pointing out what they're doing wrong and how to code going forward to improve it. Try adding 500 words of a content to each "page" on a one page Angular app with no pre-rendered version, no unique meta information if you want to see how far you can get on what everyone is doing. Link building and content can't get you out of a crappy site structure - especially at a large scale.
Digging into log files, multiple databases and tying site traffic and revenue metrics together beyond rankings or the sampling of data you get in Search Console is neither a content or link play, and again, something that everyone is definitely not doing.
Content and links still are and will likely remain important. True technical SEO - not just calling a recommendation to add a meta title to the page, or put something in an H1 and something else in an H2 - is not by any stretch something that "everyone" is doing. Digging in and doing it right can absolutely be a game changer for small sites trying to compete against larger ones, and for very large sites where 1 or 2% lifts can easily mean millions of dollars.
Well said YM.
Really good post and extremely well researched, thought out and argued.
I'm glad you did this as far too much emphasis has been placed on stuffing thousand word articles up with little or no consideration to how this looks to search engines. We've been heavily focused on technical SEO for some time and find that even without 'killer content' this alone can make a significant difference to rankings.
The sweet spot is, of course, making sure both end users and search engines find your site equally as appealing.
Thanks for the hard work!
Great article. Thanks for the share.
Thanks for the post. One of post more interesting of the blog I share it. Thanks all the community for tips.
The Simpsons invade us.
Good post, along with an excellent content.
Okay that was a long read. Thanks, made a lot of notes and will come back tomorrow ;)
Wow! This one is a keeper. Fantastic post.
Thank you.
Interesting post !!! I will try to apply all this knowlage in my website
Thank you Michael. I was pleasantly surprised to see this in-depth article on technical SEO. A little overwhelming and lengthy, but I realized that keeping up with technical SEO is critical part of your site architecture, which forms a cornerstone of any SEO strategy.
Very impressive article on SEO.
This one of those post that I will reviewing and sharing with friends for sure. Fantastic breakdown of the evolution also.
A great article to be read more than once to make it clear and understood, and we should not spend anything explained especially ami I liked about the importance of playing http / 2 to improve velocity
Thanks for the great resource!
I work in Hong Kong and many companies here are still abusing TF*IDF, yet it's working for them. Somehow even without relevant and proof terms, they are still ranking very well. You would think that they would get penalized for keyword stuffing, but many times it seems this isn't the case.
Also, as an aside, many companies here are making spin off companies to link back to themselves. While these spinoffs don't have the DA of larger sites, they nevertheless provide some link juice and flow back to each other. These tactics seem to work as they are ranking first page on relevant searches. While we're discouraged to use black hat tactics, when it's done so blatantly, how do we combat that? How do you explain to a client that a black hat is hijacking Google to make their competitor rank higher?
Hey Kevin,
Thanks for reading. Very interesting to hear that TF*IDF is being heavily abused out in Hong Kong as well.
Your link farm question is definitely a common one. I think this post does a good job of highlighting you concerns and helping you figure out what to do. The other thing to do to drive it home is show them examples of sites in their vertical that are tanking and clarify that longer term success comes on the back of staying the course
-Mike
Thanks for the link Mike! It really resonated with how I feel about the current SERPs pretty well.
I had time and was intrigued by blackhat SEO this weekend and jumped into the darkside to research what they're up to. What's interesting is that it seems that they are originating many of the ideas that eventually leak themselves into whitehat SEO, albeit a bit toned down. Perhaps we can learn and adopt some strategies from blackhats?
One very interesting thing I found on a forum is here:
Neil Patel's blackhat landing page
This is from one of Neil Patel's landing pages and I've checked around his site--even if you don't put in any website, it returns 9 errors every time... Now if a thought leader like Patel is using snake oil to sell his services, sometimes, I wonder what chance do us smaller guys have? I regularly read his articles, but seeing this--well, it just shatters everything he talks about. Is this really the state of marketing now?
A thing of beuaty. That is all!
Technical knowledge must be a requirement especially if you're an SEO beginner.
Great article. Thanks for the share.
çok teşekkürler bilge açıkgöz
bir ürk gördüm sanki :D
way too much on tech SEO... Too many resources spent on on-site tech from SEOs that are just DEVs and not focused on off-site. Way too long a post making this seem as if on-site tech is more important. Off-site content and links....
Thanks for the feedback, Todd!
-Mike