Automating Technical Reporting for SEO

By: Pete Watson-Wailes November 22nd, 2016

Automating Technical Reporting for SEO

The author's views are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

As the web gets more complex, with JavaScript framework and library front ends on websites, progressive web apps, single-page apps, JSON-LD, and so on, we're increasingly seeing an ever-greater surface area for things to go wrong. When all you've got is HTML and CSS and links, there's only so much you can mess up. However, in today's world of dynamically generated websites with universal JS interfaces, there's a lot of room for errors to creep in.

The second problem we face with much of this is that it's hard to know when something's gone wrong, or when Google's changed how they're handling something. This is only compounded when you account for situations like site migrations or redesigns, where you might suddenly archive a lot of old content, or re-map a URL structure. How do we address these challenges then?

The old way

Historically, the way you'd analyze things like this is through looking at your log files using Excel or, if you're hardcore, Log Parser. Those are great, but they require you to know you've got an issue, or that you're looking and happen to grab a section of logs that have the issues you need to address in them. Not impossible, and we've written about doing this fairly extensively both in our blog and our log file analysis guide.

The problem with this, though, is fairly obvious. It requires that you look, rather than making you aware that there's something to look for. With that in mind, I thought I'd spend some time investigating whether there's something that could be done to make the whole process take less time and act as an early warning system.

A helping hand

The first thing we need to do is to set our server to send log files somewhere. My standard solution to this has become using log rotation. Depending on your server, you'll use different methods to achieve this, but on Nginx it looks like this:

# time_iso8601 looks like this: 2016-08-10T14:53:00+01:00 
if ($time_iso8601 ~ "^(\d{4})-(\d{2})-(\d{2})") { 
        set $year $1; 
        set $month $2; 
        set $day $3; 
} 
<span class="redactor-invisible-space">
</span>access_log /var/log/nginx/$year-$month-$day-access.log;

This allows you to view logs for any specific date or set of dates by simply pulling the data from files relating to that period. Having set up log rotation, we can then set up a script, which we'll run at midnight using Cron, to pull the log file that relates to yesterday's data and analyze it. Should you want to, you can look several times a day, or once a week, or at whatever interval best suits your level of data volume.

The next question is: What would we want to look for? Well, once we've got the logs for the day, this is what I get my system to report on:

30* status codes

Generate a list of all pages hit by users that resulted in a redirection. If the page linking to that resource is on your site, redirect it to the actual end point. Otherwise, get in touch with whomever is linking to you and get them to sort the link to where it should go.

404 status codes

Similar story. Any 404ing resources should be checked to make sure they're supposed to be missing. Anything that should be there can be investigated for why it's not resolving, and links to anything actually missing can be treated in the same way as a 301/302 code.

50* status codes

Something bad has happened and you're not going to have a good day if you're seeing many 50* codes. Your server is dying on requests to specific resources, or possibly your entire site, depending on exactly how bad this is.

Crawl budget

A list of every resource Google crawled, how many times it was requested, how many bytes were transferred, and time taken to resolve those requests. Compare this with your site map to find pages that Google won't crawl, or that it's hammering, and fix as needed.

Top/least-requested resources

Similar to the above, but detailing the most and least requested things by search engines.

Bad actors

Many bots looking for vulnerabilities will make requests to things like wp_admin, wp_login, 404s, config.php, and other similar common resource URLs. Any IP address that makes repeated requests to these sorts of URLs can be added automatically to an IP blacklist.

Pattern-matched URL reporting

It's simple to use regex to match requested URLs against pre-defined patterns, to report on specific areas of your site or types of pages. For example, you could report on image requests, Javascript files being called, pagination, form submissions (via looking for POST requests), escaped fragments, query parameters, or virtually anything else. Provided it's in a URL or HTTP request, you can set it up as a segment to be reported on.

Spiky search crawl behavior

Log the number of requests made by Googlebot every day. If it increases by more than x%, that's something of interest. As a side note, with most number series, a calculation to spot extreme outliers isn't hard to create, and is probably worth your time.

Outputting data

Depending on what the importance is of any particular section, you can then set the data up to be logged in a couple of ways. Firstly, large amounts of 40* and 50* status codes or bad actor requests would be worth triggering an email for. This can let you know in a hurry if something's happening which potentially indicates a large issue. You can then get on top of whatever that may be and resolve it as a matter of priority.

The data as a whole can also be set up to be reported on via a dashboard. If you don't have that much data in your logs on a daily basis, you may simply want to query the files at runtime and generate the report fresh each time you view it. On the other hand, sites with a lot of traffic and thus larger log files may want to cache the output of each day to a separate file, so the data doesn't have to be computed. Obviously the type of approach you use to do that depends a lot on the scale you'll be operating at and how powerful your server hardware is.

Conclusion

Thanks to server logs and basic scripting, there's no reason you should ever have a situation where something's amiss on your site and you don't know about it. Proactive notifications of technical issues is a necessary thing in a world where Google crawls at an ever-faster rate, meaning that they could start pulling your rankings down thanks to site downtime or errors within a matter of hours.

Set up proper monitoring and make sure you're not caught short!

Comments 27

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

AgenciaSEO.eu

2017-02-24T08:53:22-08:00

This article will improve our insights analysis. Thanks! Actually, we use more simple practices. For example, we ask the server for notice us when the domain is down and for how long. Many times it involves suggesting the client to change the provider. Sometimes, it has helped to improve some keywords in the rankings.

4 0

This article will improve our insights analysis. Thanks! Actually, we use more simple practices. For example, we ask the server for notice us when the domain is down and for how long. Many times it involves suggesting the client to change the provider. Sometimes, it has helped to improve some keywords in the rankings. 
Cancel
- Ana Mocholí
 
 2017-02-24T08:58:19-08:00
 
 Hi there, nice tips! That happens with one of my clients. I'll suggest changing the server and see if that improves the technical SEO.
 
 ana_mr91 edited 2017-02-24T08:59:14-08:00
 2 0
 
 Hi there, nice tips! That happens with one of my clients. I'll suggest changing the server and see if that improves the technical SEO.
 Cancel
UKBB

2016-11-22T04:51:28-08:00

Good article, Pete, and very timely.

It's not just the increased complexity with fonts, JS libraries etc., all over the place. It's also the additional risk. Each time you rely on a third party to serve something to your visitors you're taking on more risk. True, it's unlikely Google's going to fail to serve the fonts your page needs, or that the path to the js libraries will fail because of a DNS problem somewhere .... but these could happen, they are not impossibilities.

Such problems won't, of course, be captured in the logs. What's your suggestion for monitoring these and/or getting some early warning?

BTW, installing something like Piwik goes a long way to collecting and analysing the log data and it saves some of the steps you describe above.

3 0

Good article, Pete, and very timely. It's not just the increased complexity with fonts, JS libraries etc., all over the place. It's also the additional risk. Each time you rely on a third party to serve something to your visitors you're taking on more risk. True, it's unlikely Google's going to fail to serve the fonts your page needs, or that the path to the js libraries will fail because of a DNS problem somewhere .... but these could happen, they are not impossibilities. Such problems won't, of course, be captured in the logs. What's your suggestion for monitoring these and/or getting some early warning? BTW, installing something like Piwik goes a long way to collecting and analysing the log data and it saves some of the steps you describe above. 
Cancel
- DrTag
 
 2016-11-25T19:31:03-08:00
 
 Pretty well post
 
 2 0
 
 Pretty well post
 Cancel
Erwin Pratama

2016-11-30T21:05:06-08:00

Thank You, It help Me for learning SEO.

2 0

Thank You, It help Me for learning SEO.
Cancel
Cornel Manu

2016-11-26T05:53:38-08:00

Hi Pete, beautiful article.

But to be honest I am not that techy guy per say. I do these things only if I must and I can't delegate someone else atm. What do you think about using a service like Pingdom for example? Wouldn't that be useful for someone like me? :D

1 0

Hi Pete, beautiful article. But to be honest I am not that techy guy per say. I do these things only if I must and I can't delegate someone else atm. What do you think about using a service like Pingdom for example? Wouldn't that be useful for someone like me? :D 
Cancel
Mustafa Aydın

2016-11-26T00:14:09-08:00

Thanks for being an informative article that will be of benefit to the readership of all seo specialists in a beautiful post.

1 0

Thanks for being an informative article that will be of benefit to the readership of all seo specialists in a beautiful post. 
Cancel
Jenny Ross

2017-01-23T03:50:13-08:00

Sounds like a good read to me. We've seen huge impacts of SEO from little things. Ultimately it all adds up overall. Thanks for sharing your post.

1 0

Sounds like a good read to me. We've seen huge impacts of SEO from little things. Ultimately it all adds up overall. Thanks for sharing your post.
Cancel
skoriaskerosa

2016-11-26T12:39:35-08:00

Thank´u Pete, it´s a really good post, It´s good help for all. :)

1 0

Thank´u Pete, it´s a really good post, It´s good help for all. :)
Cancel
techbeamers

2016-11-26T22:23:19-08:00

Automation is one of my favorite subjects. And linking it with SEO is like adding an another dimension to automation. I liked the pointers mentioned in the post, the different status codes to trigger notifications, parsing URL patterns to identify probable attacks or broken links. It seems like a set of requirement specs that even a Wordpress developer would want to look at. Anyways, logs are a great tool to reveal the real health of any web application which many of us may not give due credit. For many of my cloud hosted web apps, I began exploring Papertrail which is a nice tool for automated log analysis.

Thanks for posting such an informed article.

1 0

Automation is one of my favorite subjects. And linking it with SEO is like adding an another dimension to automation. I liked the pointers mentioned in the post, the different status codes to trigger notifications, parsing URL patterns to identify probable attacks or broken links. It seems like a set of requirement specs that even a Wordpress developer would want to look at. Anyways, logs are a great tool to reveal the real health of any web application which many of us may not give due credit. For many of my cloud hosted web apps, I began exploring Papertrail which is a nice tool for automated log analysis. Thanks for posting such an informed article.
Cancel
Baviya premkumari

2016-12-02T23:11:54-08:00

Great post! You have described beautifully about SEO. I think this article is useful for everyone.

1 0

Great post! You have described beautifully about SEO. I think this article is useful for everyone.
Cancel
RamyaT

2016-11-23T20:39:53-08:00

Thanks for sharing technical post. Really it will be very helpful for SEO Professionals.

1 0

Thanks for sharing technical post. Really it will be very helpful for SEO Professionals.
Cancel
MithunPro

2016-11-27T04:23:55-08:00

Thanks a lot Pete, it´s a really well written article, It´s good help for all.we can save our valuable time using automatic method

MithunPro edited 2016-11-27T04:24:57-08:00
1 0

Thanks a lot Pete, it´s a really well written article, It´s good help for all.we can save our valuable time using automatic method
Cancel
LatlonTechnologies

2016-12-04T23:39:50-08:00

valuable post, it would be greatly helpful for the seo professionals to save time. Thanks for sharing.

1 0

valuable post, it would be greatly helpful for the seo professionals to save time. Thanks for sharing. 
Cancel
RajivSingha

2016-11-23T08:14:21-08:00

Splendid article Pete! Could i ask the tools /softwares you use for this, especially the Dashboard creation tool you prefer.

Thanks

RajivSingha edited 2016-11-23T08:14:41-08:00
1 0

Splendid article Pete! Could i ask the tools /softwares you use for this, especially the Dashboard creation tool you prefer. Thanks
Cancel
Faiem Ahmed

2016-11-22T22:30:32-08:00

Thanks... really it is very useful.

1 0

Thanks... really it is very useful. 
Cancel
Matt Tutt

2016-11-22T11:03:11-08:00

Great advice Pete, thanks for sharing. This is something that SEO specialists will need to ensure they're up to speed with, as it does seem to be getting ever-more technical.

1 0

Great advice Pete, thanks for sharing. This is something that SEO specialists will need to ensure they're up to speed with, as it does seem to be getting ever-more technical.
Cancel
Tim Wilson

2016-11-22T06:35:53-08:00

Great to see a technical article like this. I think as technology grows to help us with so many amazing tools out there it is still important to understand the basic functionality of how to do it manually and that is exactly what I took from this article. So many new opportunities to develop amazing stuff online means an increase of various technologies we have to crawl and analyze as marketers.

There have been many articles lately speaking of how React for example is good or bad for SEO, and also with Angular they were focused on addressing some SEO issues with the new release.

Wonderful technical contribution to the community thanks!

1 0

Great to see a technical article like this. I think as technology grows to help us with so many amazing tools out there it is still important to understand the basic functionality of how to do it manually and that is exactly what I took from this article. So many new opportunities to develop amazing stuff online means an increase of various technologies we have to crawl and analyze as marketers. There have been many articles lately speaking of how React for example is good or bad for SEO, and also with Angular they were focused on addressing some SEO issues with the new release. Wonderful technical contribution to the community thanks!
Cancel
vijay9

2016-11-22T05:20:44-08:00

Thanks Pete, good post. I concur with the above comment. There are a lot of tools in the market which help you pull data and summarise them pretty well. Then as an SEO professional your job is to just evaluate the logs and gain relevant insights.

1 0

Thanks Pete, good post. I concur with the above comment. There are a lot of tools in the market which help you pull data and summarise them pretty well. Then as an SEO professional your job is to just evaluate the logs and gain relevant insights.
Cancel
California-Web-Design

2016-11-23T12:59:08-08:00

Excellent information thanks for the post.

regards

1 0

Excellent information thanks for the post. regards
Cancel
Xmx Solutions

2016-11-23T00:30:59-08:00

Hi Pete

Thanks... really it is very useful. With some technical details, those kind of records facilitates individual for better knowledge of search engine optimization and there nature of operating.

1 0

Hi Pete Thanks... really it is very useful. With some technical details, those kind of records facilitates individual for better knowledge of search engine optimization and there nature of operating.
Cancel
David Gómez

2016-11-23T06:21:17-08:00

Very interesting... How can I get these log files? Can I do it from the cpanel?

Cheers,

David

1 0

Very interesting... How can I get these log files? Can I do it from the cpanel? Cheers, David
Cancel
Victor Martin

2016-11-23T03:36:18-08:00

Good article and very interesting

You need to have control of all this, and it is clear that problem notifications are necessary.

Atopedegym edited 2016-11-23T03:36:44-08:00
1 0

Good article and very interesting You need to have control of all this, and it is clear that problem notifications are necessary. 
Cancel
Jaffrey Eric

2016-11-23T01:19:48-08:00

Great and informative post. I am an SEO Expert and I think, this post is must read for SEO professionals.

Thanks!!

2 1

Great and informative post. I am an SEO Expert and I think, this post is must read for SEO professionals. Thanks!!
Cancel
Enrique Ruiz Prieto

2016-11-23T00:57:34-08:00

There are days when I read the MOZ blog, the only thing I know is that I do not know anything and I still have to learn a lot on this subject. Many thanks Pete.

1 0

There are days when I read the MOZ blog, the only thing I know is that I do not know anything and I still have to learn a lot on this subject. Many thanks Pete.
Cancel
- Camille Deschamps
 
 2016-12-05T07:48:52-08:00
 
 Every blog make me feel like i know nothing...
 
 2 0
 
 Every blog make me feel like i know nothing...
 Cancel
EADESIGNSTUDlO

2016-11-22T11:07:40-08:00

Adorei o post, vou começar a frequentar mais este site..
eadesign.art.br

1 1

Adorei o post, vou começar a frequentar mais este site.. eadesign.art.br
Cancel

Post Analytics