For anyone that's experienced the joys of doing SEO on an exceedingly large site, you know that keeping your content in check isn't easy. Continued iterations of the Panda algorithm have made this fact brutally obvious for anyone that's responsible for more than a few hundred thousand pages.
As an SEO with a programming background and a few large sites to babysit, I was forced to fight the various Panda updates throughout this year through some creative server-side scripting. I'd like to share some with you now, and in case you're not well-versed in nerdspeak (data formats, programming, and Klingon), I'll start each item with a conceptual problem, the solution (so at least you can tell your developer what to do), and a few code examples for implementation (assumes that they didn't understand you when you told them what to do). My links to the actual code are in PHP/MySQL, but realize that these methods translate pretty simply into most any scenario.
OBLIGATORY DISCLAIMER: Although I've been successful at implementing each of these tricks, be careful. Keep current backups, log everything you do so that you can roll-back, and if necessary, ask an adult for help.
1.) Fix Duplicate Content between Your Own Articles
The Problem
Sure, you know not to copy someone else's content. But what happens when over time, your users load your database full of duplicate articles (jerks)? You can write some code that checks if articles are an exact match, but no two are going to be completely identical. You need something that's smart enough to analyze similarity, and you need to be about as clever as Google is at it.
The Solution
There's a sophisticated measure of how similar two bodies of text are using something called Levenshtein distance analysis. It measures how many edits would be necessary to transform one string into another, and can be translated into a related percentage/ratio of how similar one string is to another. When running this maintenance script on 1 million+ articles that were 50-400 words, deleting only duplicate articles with a 90% similarity in Levenshtein ratio, the margin of error was 0 in each of my trials (and the list of deletions was a little scary, to say the least).
The Technical
Levenshtein comparison functions are available in basically every programming language and are pretty simple to use. Running comparisons on 10,000 individual articles against one another all at once is definitely going to make your web/database server angry, however, so it takes a bit of creativity to finish this process while we're all still alive to see your ugly database.
What follows may not be ideal practice, or something you want to experiment with heavily on a live server, but it gets this tough job done in my experience.
-
Create a new database table where you can store a single INT value (or if this is your own application and you're comfortable doing it, just add a row somewhere for now). Then create one row that has a default value of 0.
-
Have your script connect to the database, and get the value form the table above. That will represent the primary key of the last article we've checked (since there's no way you're getting through all articles in one run).
-
Select that article, and check it against all other articles by comparing Levenshtein distance. Doing this in the application layer will be far faster than running comparisons as a database stored procedure (I found the best results occurred when using levenshteinDistance2(), available in the comments section of levenshtein() on php.net). If your database size makes this run like poop through a funnel (checking just 1 article against all others at once), consider only comparing articles by the same author, of similar length, posted in a similar date range, or other factors that might help reduce your data set of likely duplicates.
-
Handle the duplicates as you see fit. In my case, I deleted the newer entry and stored a log in a new table with full text of both, so individual mistakes could later be reverted (there were none, however). If your database isn't so messy or you still fear mistakes after testing a bit, it may very well be good enough just to store a log and later review them by hand.
- After you're done, store the primary key of the last article that you checked in the database entry from i.). You can loop through ii.) - iv.) a few more times on this run if this didn't take too long to execute. Run this script as many times as necessary on a one minute cronjob or with the Windows Task Scheduler until complete, and keep a close eye on your system load.
2.) Spell-Check Your Database
The Problem
Sure, it would be best if your users were all above a third grade reading level, but we know that's not the case. You could have a professional editor run through content before it went live on your site, but now it's too late. Your content is now a jumbled mess of broken English, and in dire need of a really mean English teacher to set it all straight.
The Solution
Since you don't have an English teacher, we'll need automation. In PHP, for example, we have fun built-in tools like soundex(), or even levenshtein(), but when analyzing individual words, these just don't cut it. You could grab a list of the most common misspelled English words, but that's going to be hugely incomplete. The best solution that I've found is an open source (free) spell checking tool called the Portable Spell Checker Interface Library (Pspell), which uses the Aspell library and works very well.
The Technical
Once you get it setup, working with Pspell is really simple. After you've installed it using the link above, include the libraries in your code, and this function to return an array of suggestions for each word, with the word at array key 0 being the closest match found. Consider the basic logic from 1.) if it looks like it's going to be too much to tackle at once, incrementing your place as you step through the database, logging all actions in a new table, and (carefully) choosing whether or not you like the results well enough to automate the fixes or if you'd prefer to chase them by hand.
3.) Implement rel="canonical" in Bulk
The Problem
link rel="canonical" is very useful tag for eliminating confusion when two URLs might potentially return the same content, such as when Googlebot makes its way to your site using an affiliate ID. In fact, the SEOmoz automated site analysis will yell at you on every page that doesn't have one. Unfortunately since this tag is page-specific, you can't just paste some HTML in the static header of your site.
The Solution
As this assumes that you have a custom application, let's say that you can't simply install ALL IN ONE SEO on your WordPress, or install a similar SEO plugin (because if you can, don't re-invent the wheel). Otherwise, we can tailor a function to serve your unique purposes.
The Technical
I've quickly crafted this PHP function with the intent of being as flexible as possible. Note that desired URL structures are different on different sites and scripts, so think about everything that's installed under a given umbrella. Use the flags that it mention in the description section so that it can best mesh with the needs of your site.
4.) Remove Microsoft Word's "Smart Quote" Characters
The Problem
In what could be Microsoft's greatest crime against humanity, MS Word was shipped with a genius feature that automatically "tilts" double and single quotes towards a word (called "smart quotes"), in a style that's sort of like handwriting. You can turn this off, but most don't, and unfortunately, these characters are not a part of the ASCII set. This means that various character sets used on the web and in databases that store them will often fail to present them, and instead, return unusable junk that users (and very likely, search engines) will hate.
The Solution
This one's easy: use find/replace on the database table that stores your articles.
The Technical
Here it is an example of how to fix this using MySQL database queries. Place a script on an occasional cron in Linux or using the Task Scheduler in Windows, and say goodbye to these ever appearing on your site again.
5.) Fix Failed Contractions
The Problem
Your contributors are probably going to make basic grammar mistakes like this all over the map, and Google definitely cares. While it's important never to make too many assumptions, I've generally found that fixing common contractions is very sensible.
The Solution
You can use find/replace here, but it's not as simple as the solution fixing smart quotes, so you need to be careful. For example "wed" might need to be "we'd", or it might not. Other contractions might make sense while standing on their own, but find/replace by itself will also return results that are pieces of other words. So, we need to account for this as well.
The Technical
Note that there are two versions of each word. This is because in my automated proofreading trials, I've found it's common not only for an apostrophe to be omitted., but also for a simple typo to occur that puts the apostrophe after the last letter when Word's automated fix for this isn't on-hand. Words have also been surrounded by a space to eliminate a margin of error (this is key- just look at how many other words include 'dont' on one of these sites that people use to cheat in word games). Here's an example of how this works. This list is a bit incomplete, and leaves probably the most room for improvement in the list. Feel free to generate your own using this list of English contractions.
That should about do it. I hope everyone enjoyed my first post here on SEOMoz, and hopefully this stirs some ideas on how to clean up some large sites!
Thanks Corey for giving the idea of using leveinshtien distance equation for rectifying the duplicate content problems, well I applied the above solutions for my site running on PHP script but I have one more site which runs on ASP , so what would be the smart way to solve the problem of duplicate contents for sites running on IIS?
Thanks!!
Sure Ajay, the tool will work just as well in other languages. Here's an ASP Levensthein distance function that Google shot back at me:
https://snipplr.com/view/9094/levenshtein-distance/
I haven't tested this one yet, so you may want to experiment with a few others (ie. like above, PHP's default levenshtein() does not give the best results). Worst case scenario, PHP runs surprisingly well on IIS as well, or you can even create a database stored procedure with one (note the Levensthein function, and second "helper" function that produces a 0-10 ratio), though this one is much more mean to your system resources:
https://www.artfulsoftware.com/infotree/queries.php#552
And here's another gorgeous-looking option that uses C# (in case that was ASP.NET :) )
https://www.java2s.com/Open-Source/ASP.NET/Validation/nvigorate/Nvigorate/Common/Levenshtein.cs.htm
Thanks corey for the immediate and helpful response, thumbs up for you! :)
IIS7 has some great features to deal with this. IIS Extensions for URLRewriting plus lots more.
Wow!
This was a great post. Lots of tips and implementation details.
Nice! Thanks!
When I read this kind of articles, I always regret not having a developer background. This is the kind of article I'll never be able to write, but knowing there're so generous person in the SeoMoz community that share their knowledge makes me feel less disappointed.
Thanks, Corey, really a great post, even for not code geek like me.
Like you I too regret for not having a developer background :(
Yes, we can't write such articles but I'm very thankful to Corey & others who post something about coding in SEOmoz which we can discuss with our internal developers who can sort out the issue going in our site.
Thanks Corey :)
Glad to help guys! Had read so many articles on analysis, seemed like something that tackled a few big implementation issues was overdue. :)
I agree. Sometimes I feel like I need to not only be a writer and SEO but also a designer, developer, etc. With all the changes in the world of Google and Social Networking it seems that more and more is expected of SEOs all the time.
Really interesting to get such a technical perspective sometimes. These little 'tid bits' add to our arsenal as SEOs and I can see these items helping efficiency in getting stuff done. Keep them coming! Great post!
As someone who is more technical developer than SEO, its great to read a more technical side post on here. There are lots of things that you can do at a technical level to improve SEO from canonicalisation, schema.org implementations etc. I might have to write some articles myself on it in the new year
Will look forward to reading them!
Really Nice article from programmers point of you. I have just forwarded this blog post link to my php developer, he might be interested in it.
Glad you liked it!
Great post, a lot of practical advice backed up with code and detailed explanations! I'm sure you have probably done this already - how about a script that does all this in one shot - maybe you need to specify a few parameters or have a config file to go with it, but you could have quite an effective - fixmydatabase.php type script there.
Is a good idea. I do have some maintenance scripts that use each of these methods (though #1 and #2 are intense on a level to where they seem to need to be run on their own). The big issue is that everyone's situation is a bit different, so the ideal application of this stuff could be as well.
Yep, for sure it would be difficult to make something that worked across the board, but maybe some generic checking script or parameterised function that would allow easy customization or integration.
Maybe someone that is really proficient at PHP could even create a class that allowed API type access.
Just throwing it out there, I like PHP and automation. I am using automation in my SEO process more and more, having hisotrically been quite against it, now I am seeing some areas where it makes sense.
Levensthein is imho not ment to be an algorithm to detect duplicates. I would rather take a look into shingles. There's a fantastic paper by Andrei Z. Broder covering that topic: https://clair.si.umich.edu/si767/papers/Week03/Similarity/nearduplicate_broder.pdf
Though not as easy as using Levensthein (there's no built in function afaik), it is much more accurate.
This looks really interesting, thanks for the share. While Levenshtein worked pretty much perfectly when I needed it, it's not without limitations (ie. in a scenario where we might want to look for smaller strings of duplicate content within much larger bodies of text). Have you implemented Shingles for this purpose in the past?
I'm using it to measure the degree of uniqueness for the articles generated by my article spinner (e.g. re-generate an article if it's too close to one that was generated beforehand). The algorithm itself is pretty straightforward:
1. normalize the text (lowercase, remove special chars, etc. - see https://www.miislita.com/information-retrieval-tutorial/indexing.html for some further ideas ;))
2. take all 3-word-shingles
3. get a fingerprint of each shingle (I'm using rabin https://en.wikipedia.org/wiki/Rabin_fingerprint )
4. pairwise comparison of each article you want to check
Levensthein fails pretty hard at near-duplicates (real world example: huge portion of quoted text but only a small fraction of unique text on the same page). I haven't used it for duplicate detection on "the same website" so far, but i have some ideas for a neat little application that can do something like this :)
Looks simple and sound enough. Would love to see that applicatoin if you get to creating it.
Hey Guys,
Great info! I have a question that I wanted to ask before I say something stupid ;)....
1) What exactly is considered a 3-word-shingle and is it worth investing the time to learn if I just want to compare two articles for uniqueness?
I could not really figure out what how Corey is using Levensthein. I assume you are passing in 2 full articles and using the distance returned for those 2 articles as a decision point on uniqueness, that seems extremely simple! Assuming this is the case, on a 500 word article, what would you assume to be unique enough to get around Panda ;)?
I have been building a very advanced article spinner for years. I can put articles into copy scape and I never go above 10%. I am talking about spinning 1000+ articles which are well written with less than 10% similarities when using copyscape. Problem with this is that I am going to go broke doing article comparisons to determine when I need to re-spin ;).....
I would love to share ideas and strategies at any time. Feel free to contact me directly or to go through the forum. Either way works for me. I really enjoy this stuff. It is like a challenge from Google! So far, everything has worked great for me but I have to kill this copyscape cost for checking duplicate content. So, any open source or free comparison checks are obviously well received over here....
One comment, I noticed that noone seems to be doing any grammar checks. Is there a reason that you are skipping grammar? Unless this is achieved through the spell checker above...
In the last pubcon I attended in Vegas, there was a clear message that google hired 1000 human reviewers to sniff out garbage. While article spinners can sometimes be very good, a good human reviewer can probably find that. I realize this is fairly far fetched and probably worth the risk, I am just curious what you think.
For some interesting info.... I have a flow on my site that sells products. It simply takes a feed from commission junction and dumps out the products. The website itself was so strong, I ranked for many products and was making some nice money. One day, I lost 5K visitors. BOOM. GONE! I had no idea what happened. I looked at everything on my site (a national directory of vendors) and I could not find any dip in my traffic. Finally, I found that the specific product pages where I show the products from the feed had killed me. All traffic from those pages DISAPPPEARED overnight. I realize this is probably a panda update. I am very excited becasuse now I get to use my article spinner to see how good it is ;). If I can get my traffic back, I am in business probably for a few years until they come up with something else....
Would love to hear your insight and feedback into this.
I hope I did not ask too much. I am really into this stuff and the challenge of outsmarting google seems to be an addiction now. I hope my questions don't make me sound stupid. I know more than meets the eye. I just have never heard of shingles other than a disease that someone gets!
Thanks for your post. Amazing info!
Hey thanks.
I'll leave the shingles question to Hirnhamster as I haven't implemented it.
Regarding Levenshtein, you're basically right-on. No doubt that you can spin an article into having a higher Levenshtein ratio than my (pretty convservative 10%... even at 20% it was pretty dead-on, but since I was also automating deletions, I didn't want to push my luck). It gets thrown off further when the amount of one piece of content that you're comparing is significantly increased/decreased (what appears to be the greatest advantage of shingles). It's worked great for me, but definitely may not for a number of scenarios. In all, however, I bet you could come up with some more accurate metrics on similarity using Levenshtein than with tools like Copyscape.
Regarding grammar checking, you'd definitely be right, it's important and well-worth consideration. I have a few more functions beyond the contraction fix above... this was originally going to be a series, but I backed off a bit when I got to writing and thought I'd just put some of my best utilities forward.. the best approach that I have found is to just think about one grammar-related fix at a time. So far as I'm aware, there is no all-in-one open source grammar correction utility out there (though if there is, someone please chime in!).
Regarding your situation in general, I have read stories on the black hat forums that a number of autoblogs/spinners are still absolutely thriving after all Panda updates, in spite of the punishment dealt to a number of more white hat sites (maybe Google should have set one person loose to download the dozen or so mainstream article spinners and reverse-engineered them instead :) ). Sooner or later, you'd think that they have to catch on, however, black hat/spinning still strikes me as a very short-term "take the money and run" art form.
Corey,
What did you mean here:
"(pretty convservative 10%... even at 20% it was pretty dead-on, but since I was also automating deletions, I didn't want to push my luck). "
Are you referring to copyscape results?
I am mostly trying to understand distances and where you can assume two articles are unique. I ran two 500 word articles through Levenstein and they returned 1450 changes required to make them unique. I realize that the smaller the number the better, but is there a threshold that you look for in terms of changes required to make two things equal?
Unless, you are implying the following.
Assume a 500 word article
10% would be 50. If Levenstein suggests 50 or less changes to make the two articles equal, they are too similar?????
Just looking to understand how to utilize the values.
I don't like to think of my article spinner as black hat. I consider it a very sophisticated bulk writer ;).
Thanks
EDIT: I somehow wrote out a big response and got only a quotation? Noooo!
"Are you referring to copyscape results?"
I was actually referring to my own trials with this with that bit. I don't use Copyscape.. it seems like more of marketing tool for article spinners than anything; I've never seen a spun article fail to pass it.
"I don't like to think of my article spinner as black hat. I consider it a very sophisticated bulk writer ;)."
Hey whatever works for you. I actually spend about as much time reading the black hat boards as I do snowy white hat resources like SEOMoz, SEL, and SEW; lots of fascinating things people are experimenting with that most white hats would never dare attempt. I just gravitate towards white hat in professional practice. :)
Any chance to get an idea on this?
I am mostly trying to understand distances and where you can assume two articles are unique. I ran two 500 word articles through Levenstein and they returned 1450 changes required to make them unique. I realize that the smaller the number the better, but is there a threshold that you look for in terms of changes required to make two things equal?
I have looked everywhere and cannot understand at what point I can consider two articles to be unique given the results from Levenstein!
Can you help?
That seems wrong.. are you using the PHP levenshteinDistance2()? That should never be greater than 100. Some functions give you a number that then needs to be converted to ratios/percentages by a second function, like the MySQL example. I would just work off of those numbers.
Truly unique articles actually haven't seemed to return greater than 50 / .5 for me, but I keep it kind of conservative for the sake of smaller data sets seeming to throw it (ie. the few that might slip by with just 5 words).
Does anyone know of any WordPress plugin that can spell check all previous posts? I tried looking for one but I can noly find plugins that proofread your post before you publish it.
I'm not aware of one (this was all made to work with a custom application), but provided that a good one doesn't exist, a killer tool could be polished in about an hour:
i.) Select all articles from the database
ii.) Trim out punctuation and use explode() to break out the invidual words with a space character as the delimiter
iii.) Check using the pspell function and echo what didn't return exact matches
Done.
Corey - this is ace! I passed your above comment to our Dev and he's fixed tonnes of typos and spelling errors on our site, Cheers :)
Awesome!
At first, congratulations on your promotion. Thank you for such detail you explain in your article, some of yours can be applicable for me but the the problems is I run blog on blogger so for number 3, I need more guide from you. Hope you can help me, thanks.
Thanks. Blogger should take care of #3 for you (if you 'View Source' on one of your pages and CTRL+F for 'canonical' you can make sure).
I actually didn't understand some of the technical things in here but looking at the facts they do make sense. And I am sending the link to my programmer in a short while.
I didn't know that was possible, Levenshtein Distance sounds like a clever trick to clean up duplicate content reasonably quickly.
This stuff looks cool, definitely some good scripts to add to the arsenal.
Thanks for the post. Already Liked the post. It can save valuable time. For Spell Check is there any .net version of script.
Glad you enjoyed it Usef. Google returned this .NET variation for me:
https://aspell-net.sourceforge.net/
As in my response to Ajay above re: levenshtein functions, I haven't actually tested aspell.net. It's worth a shot though and worst case scenario, PHP does run quite good on IIS platforms. It could still be used with pspell for the sole purpose of auditing your database.
Hi Rand,
Have been reading your posts on a regular basis. They are really very helpful.
I have one query which is not getting solved after trying out. I have my own websites on automotive which went down when Google panda got updated.
I will be very thankful if you can help me out to get those sites back in search engine
Waiting for your response
Actually Rand didn't write this one. I hope you don't mind my commenting instead. :)
Unfortunately, search engines are complex on a level to where there is no one instant piece of advice that I think anyone can give you. But depending on when your site dropped (ie. if it was with Panda, and which iteration of Panda), you can learn a lot.
Panda on the whole is geared towards on-page site quality (content/code you control). Though keep in mind that it's likely affecting Google's perception of the quality of sites that link to you as well, so in that sense, your off-page SEO could somewhat be fair game as well. Cleaning up duplicate content and overall quality, and nipping out content that's too "thin" and brief has been big. There's been a lot of discussion of other factors as well, everything from bounce rates to grammar (Office 95 could identify good English, you can be sure that Google in 2012 can too).
With Panda 2.5 especially, it seemed that sites that lacked rich media seemed to lose out, based on my interpretation of the "panda 2.5 biggest winners and losers" articles that were going around in October, and the trends that I saw in all of our clients' sites (although that's just my interpretation). Your best bet is to evaluate your site based on every theory that's white hat, seems supported by solid evidence, and most importantly, written by someone that seems to have a clue. There are also some nice tools out there to help you along with the basics if you're new at this (SEOmoz's toolset could definitely be a nice start).
Like the tip on Levenshtein distance analysis - and not just because I like saying the name Levenshtein ;)
Thanks for the article. I don't understand all of it but I appreciate the breakdown and the solutions. I hadn't thought about some of these concerns in the past. Thanks for your help.
great, so we have to be very confident with our content what google wants from us is unque and excellent written what users love . .now the thing like duplicate content because panda is running along with google to filter out all these waste to make the search very quality proof!
Corey, would it be possible to substitute the faux smart quotes for the ASCII equivalents (e.g., “)?
Sure, the HTML character code could be a slightly better option (although I personally prefer the non-tilted version). What I really try to avoid is situations where the raw character is converted on the page, and you just see those "square" icons (which I'm sure most webmasters have undoubtably seen before), or □. I've also seen them rendered in a wide variety of other weird ways in different scenarios. Depending on your database and page character settings, you may be immune from this particular issue, but I like to trim them out all of the same for a variety of reasons.
I've also heard a few SEO's suggest using real quotation marks anytime that you quote in order to avoid possible duplicate content issues. Logically, the idea seems like a pretty sound way to identify a 3rd party, and could even fit in the "if I were Google" column, but I don't know if that one is confirmed beyond a theory. So in this way, it could just be one of my unconfirmed SEO superstitions for the time being; I still personally prefer to keep my sites this way.
Excellent scripting examples, thanks for the insight!
I was really surprised to read this article. Obviously you mentioned great poits about SEO and this article must spread to other SEO as well as programme fellows.
Thanks for being shared this article in this community.
Really really nice blog post. Everyone of the 5 points will be useful to me. I can only say THANKS!
thanks for sharing usefull information. I have duplicate content issue with my one client website and keep suggestion company to sort this but unfortunatley still waiting, its been just two and half year. Wish me luck :)
Awesome first post, Corey. Esepcially love the use of levenshtein for duplicate articles. :)
Great read Corey, SEO with clearing technical stuff is a best way to optimize our website. They are corelate with each other. Canonical is impact a lot in many ways like if you have implement pagination, or dynamic urls. Levenshtein code is quite good one for sure.
Recently i have seen many wrong urls are generated through various parameter in my webmaster account. Mainly i have removed it but some of them are still there and the paramaeters are session_id , p_id etc. That would be great if you drop your feedback on this.
These points are really useful for me as well as many techy guys here.
Ideal world, we'd get those session ID's out of your URL's entirely, but if the content that Google should see isn't dependent on those GET variables, I'd absolutely play with the function above. It should wipe away what sounds like a pretty significant issue of duplicate content for you.
You can also do a "site:yourdomain.com" search to see how many duplicate versions of your pages are really getting indexed. This can be more important than one might assume.. as much as Google always seems to claim that they're getting better at recognizing these sorts of things on their own, I still have clients approaching me on the regular that seem shocked when we show them 50+ versions of their homepage showing in Google's index.
Thanks Corey for the quick response. I have done the exercise which you have suggest. I got few urls and just putting those url to 301 or remove it through webmaster tools.
Giood points there - I'll get that over to the development team to run through those 5 steps, let's see what we can come up with! Hopefully if I'm doing my job properly not many issues :$
Thanks for the explanations on multiple "levels", it's really useful for when you're working with multiple people with various areas of expertise.
Thanks, was hoping that would result in a clearer message.
Those are some great ideas, thanks for sharing!
Thanks for some practical solution tips. I could surely use at least couple of these.
Oh Panda how I love you, thinks for the post. Duplicate content is never going away and thats the problem.
the spell check script link doesn't work.
great guide though!
Woops, looks like it's just the wrong link. This is the one you want: https://pastebin.com/aVPnJSK3
That's really just to show you how it works, though, you'll want to download pspell before that will actually do anything.
Link is fixed!
thanks keri!
Spell Check the data base with this tiny little smart shit... this is super amazing mate! I have played a bit with bulk canonical but this spell check thing seriously deserves some time to play with it!
Great work!
tiny little smart shit = a much better title for this post
Thanks for the kind words, hope something here helps!
Thanks Corey for this stupendous post.
Like I said I'm not into scripting & all that but yes I'm very thankful to you for the post. I love the way you shown the canonial issue. Passed this post to my developer & looking out to implement the same on my clients site.
Thanks a lot.
Great post Corey! Some very interesting programming for taking care of those pesky duplicate content articles! Thank you for sharing your developer knowledge as is gets the wheels turning for other tools and methods for SEO!
Oh! Congrats! i didn't hear about these stuff before. To me that completely new! Will need some high level technical help to proper test it
Those programming tactics using for meet out Panda updates. Thanks Corey!
Brilliant :) Appreciate the high level tips. I also think author tags in the new micro data format will help sites in the long run.
Good technical advise though not really geared towards fighting web content spam or Panda. Will definitely implement few of these.
Excellent article, explaining multi-utilities at different levels from programming point of view.