A lot of things can go wrong when you change most of the URLs on a website with thousands or millions of pages. But this is the story of how something went a little too "right", and how it was fixed by doing something a little bit "wrong".
The Relaunch Timeline
On February, 28 2012 FreeShipping.org relaunched with a new design and updated site architecture.The site's management and developers were well-versed in on-site SEO issues and handled the relaunch in what many SEOs might consider "textbook" fashion. This included simultaneous 301 redirects from all previous URLs to their specific counterparts using the new URL structure. All internal links were updated immediately, as were the sitemap files, rel canonical tags and all other markup.
They had expected some lag-time and a temporary loss in rankings, but traffic had started a dramatic decline immediately after the relaunch, and a week later it was still falling.
On March, 7 FreeShipping.org contacted seOverflow to make sure they had done the redirects properly. Everything seemed to check out. A scan of the site revealed only a few 404 errors from internal links, those being relegated to a few outlying blog entries. All of the old URLs were serving a 301 response code to the new URLs, which returned a 200 response code. The XML sitemap was using the new URLs, as was all internal navigation, rel canonical tags and other on-site links. By all indications, the developers had implemented a major site redevelopment flawlessly...
Too flawlessly. A site:domain.com search revealed that many of the old URLs were still indexed alongside the new ones, and had not been re-cached since the relaunch of the site a week earlier. Log files revealed that Google had not been back to visit most of the old URLs. They had no link path available to reach most of them, so any page with a preivous version that had not been recraweled yet (i.e. any page without prominent external links) was seen as a duplicate.
Knowing how fast and accurate their developers are, I proposed they turn the old linking structure back on for awhile so the internal links on categry pages would send crawlers through the redirects first. This ensures they see the 301 status code and can update the index accordingly, rather than assuming that the old page is still active along-side the new page for weeks or months. This is slightly different than what I used to prescribe, which involved resubmitting an old sitemap (more on that later). It is important to note that only the navigation links changed back - all other markup still reflected the new URLs. Changing the rel canonical, Open Graph or Schema, for instance, would not be recommended. All they needed was an easy crawl path to the now-redirected URLs.
On March, 8 about half way through the day they flipped the switch to turn on the old internal link URLs and traffic from search more than doubled on the same day. They maintained a steady climb until traffic from search stabilized above pre-relaunch levels.
On March 12 the new internal links were again changed over to the new URLs and traffic from search has remained at or above pre-relaunch levels.
Rethinking Overthinking Sitewide Redirect Best Practices
I'd seen this situation before and had always advised resubmiting the old XML sitemap to ensure the legacy URLs got recrawled faster than the weeks or months it could take search engines to revisit a page without a link from somewhere. But recent statements from Bing caused me to think twice about that recommendation. And this great post by John Doherty had me wondering the same about submitting a "dirty" sitemap to Google.
What Bing Says...
"Only end state URL. That's the only thing I want in a sitemap.xml. We have a very tight threshold on how clean your sitemap needs to be... if you start showing me 301s in here, rel=canonicals, 404 errors, all of that, I'm going to start distrusting your sitemap and I'm just not going to bother with it anymore... It's very important that people take that seriously." - Duane Forrester, Senior Product Manager, Bing Webmaster Tools
“Your Sitemaps need to be clean. We have a 1% allowance for dirt in a Sitemap. Examples of dirt are if we click on a URL and we see a redirect... If we see more than a 1% level of dirt, we begin losing trust in the Sitemap”. - Duane Forrester, Senior Product Manager, Bing Webmaster Tools
In preparation for this post I asked for some clarification. I'm not sure how "clear" this makes it, as the seriousness of the statements above seem to be at odds with the following advice:
What I Took Away From All of This...
#1 Despite what I've heard during several interviews and straight from him at conferences, it seems like Bing will let you get away with more than 1% of "dirt" on your sitemap, at least if it isn't an ongoing thing. Sometimes I get the feeling Duane Forrester makes some stuff up as he goes along, which is fine. Sometimes it is better to be decisive and give an actionable answer than to hedge your bets by talking on and on without actually saying anything (*Ahem).
#2 As long as your old URLs redirect to the new ones it is OK, perhaps even preferable, to leave the old internal links up for awhile. Best Practices for redirects has always been to update all of the links you have control over. This is for several reasons. First, it helps you avoid multiple redirect hops if/when it comes time to change all of the URLs again. It is also good htaccess housekeeping since old redirect rules can often get broken without being noticed during the QA process. Last but not least, according to Matt Cutts a 301 redirect does not pass 100% of pagerank on to the destination page. However, losing out on a tiny percentage of inherited pagerank for a few days and having a good excuse to procrastinate on housekeeping is better than having your traffic drop off a cliff for weeks or months at a time.
#3 The old adage about "Knowing enough to get yourself into trouble" is as true as ever.
#4 Leaving the old links up for a few days seems to work equally as well across major search engines. The Google Analytics screenshot above shows traffic from all search engines, but looking at just Yahoo, Bing or Google individually tells pretty much the same story.
#5 You can do it either way. Since every site is different it is good to have more than one option. One could stick with the XML sitemap resubmission to each of their webmaster tools accounts as a best practice, and that "should" work just fine. Given the results of this case study I'm going to recommend that most clients leave up the old internal links (especially nav and category links) for about one week after re-launching a website with new URLs on the same domain (a new domain is slighly different, and you can use the change of address tools).
#6 Domain Authority doesn't necessarily mean squat for weak internal page crawling. Free Shipping Day was the third largest online shopping day of the year in 2010 and 2011. FreeShipping.org is the only official sponsor, and benefits from massive amounts of press coverage. The site has about 12,700 links from about 1,110 domains, including the New York Times, CNN, MSN, TIME, Huffington Post, Mashable, USA Today, Forbes... Not bad for a coupon affiliate. Yet it was a week after the relaunch, and both Google and Bing were uninterested in revisiting any of the FreeShipping.org pages in their indexes that didn't have their own strong external links.
I really enjoy these case studies - every day, I'm seeing how what works in theory and what works in practice (not to mention what the engines tell you to do) can be very different for any given site. With something like a mass 301-redirect, I always warn people that there's risk. Even if you do it 100% "right", it doesn't always go like you planned. Even the best SEOs can't guarantee a trouble-free transition. The best SEOs just know how to fix things when they go wrong and learn when to break the rules.
Hi Everett,
This is an interesting post with some more interesting findings. Making sure that new URLs get indexed as quickly as possible is always one of the biggest challenges in any site migration.
From my exprerience, there are quite a few ways to help spiders crawl and index the new pages:
I find that combining work really well in most cases.
I would also like to comment on the Analytics graph you've shared. Typically, new URLs on a PR5 website as freeshipping.org shouldn't take more than 3-4 days for the top level pages to get indexed and a 1-2 weeks for the deeper ones. I'm just wondering whether the fact that traffic went up on the 8th is pure coincidence and has nothing to do with turning the old URLs back on. What if that was the day that Google had actually found those deeper URLs?
Furthermore, I must admit that I haven't really understood this statement:
"They had no link path available to reach most of them, so any page with a preivous version that had not been recraweled yet (i.e. any page without prominent external links) was seen as a duplicate."
I just can't see any duplication issues given that 301s have been in place, maybe something isn't very clear?
Once again, thanks for sharing your thoughts and findings!
Modesto,
Thanks for sharing your strategy for this sort of situation.
I don't think it was coincidence because Google wasn't revisiting the old pages at all until the day this change went live, then suddenly they revisited most of the old pages. Log files tell an important piece to the story, and if this had been my personal site I would have shared them.
The duplication issue comes because the search engine hasn't seen the 301 yet. They don't know that there is a 301 in place until they visit the old URL, and if they have no path to get to the old URL they may not revisit it for quite some time. In my experience, the old URLs with prominent external links tend to get recrawled more often, along with those that regularly get fresh content. But when you have hundreds or thousands (or more) pages without any internal OR external links this can become a problem. How do they know about your 301 until they try to access the page and see it?
I hope that clears things up. I'm sure there are plenty of ways to skin this cat, but I wanted to share one that worked for me recently.
Thanks for the detailed response. It all makes perfect sense now, luckily I've never had the chance to deal with a similar situation where there were no paths at all to the old URLs.
Many thanks for the great insights!
Very informative and helpful addition to the post, Modesto!
Thanks Everett! This post explains a lot of things I was wondering about... and it is going to be of great help for me :)
Very Interesting finding! Indexing of the new URLs is very important for a website especially if the website have completely changed their URL structure. Here are the practices that I follow to indicate search engine that the change has been made.
> Update the .xml sitemap and use the old URL of the main pages in the footer
> Tweet of +1 about the major pages that contain good amount of internal link pointing to different pages of the website.
> Build some easy content based links (no matter no follow) pointing to different areas of the website. Dropping the relevant links in the forum will help.
> Building some internal links to different pages (using new URL)
> Setting canonical tag all over the website
I do believe that having a clean sitemap is interesting but as far as the 1% statement is concern, i might have to doubt on that. I mean there are scenarios where you got not much to do but to play with dirt!
Great Read as a whole!
Moosahemani,
Thanks for sharing your strategy here. Like Modesto's strategy above though, I have to wonder if just one bullet point wouldn't be simpler than all of those...
- Leave your old internal linking turned on for a week, or until you feel the redirects have been found.
I think that is a simple and effective solution that, in most cases, probably doesn't need to be more complicated.
PS: I love your comment about playing with dirt!
Internal links is usually not my concern. My website will get reindexed. However I hate leaving all those backlinks with 301's.. i think they worth less than non 301 backlinks....
Etienne,
If Matt Cutts is to be believed then you are absolutely correct. 301 redirects do not pass 100% of the pagerank that a direct link would pass. Though the loss is small, every bit counts. Best practices is to change any link you can to point it at the final destination. That might involve sending out some emails and making some phone calls, but is worth it in my opinion. On top of that - what happens when you change the URL again down the line? Now that external link would be going through two or more redirects. Change it if you can. Good point.
Having just managed this process at www.findaproperty.com the trick is to delete the contents of the old XML sitemap and wait.
This allows SEs the chance to decide which URL to use (old or new), after following 301s and crawling the site.
By putting in place a sitemap with all the new URLs, you are jumping the gun.
Wait until the old URLs drop first, then put the XML in place, otherwise you risk duplication from indexed URLs and sitemap URLs.
SEOeditors,
Very good points. I wonder though how you handle getting Google to recrawl the old URLs if you've updated the internal linking on the site to point to the new URLs and you don't have an XML sitemap up. Maybe I just missunderstood the strategy.
Congrats on a successfull relaunch on the site you mentioned! Those are always fun. ;-)
Why the time pressure? There's no need to force new URLs down Google's throat. Google was set to crawl at max rate set in GWMT (10 per sec) and all old URLs were 301d. I left it to G to work its magic. No rank drop, if anything we had a one week period were both old and new URLs ranked alongside each other for the same destination page. A double win rather than a penalty! SERPs have since returned to normal and show the new URLs. If you have 301s in place, I'd advocate patience.
I was bit by this problem last year when I migrated a medium sized site (less than 10,000 pages) to WordPress. In my case the fix was recommended by Dan Theis who recommended that I put the old pages back and configure rel canonicals to the new, shiney version for each one. Within a couple of days all was well and a month later I removed the old pages entirely.
In my experience you should wait until the old URLs drop first, then put the XML in place, otherwise you risk duplication from indexed URLs and sitemap URLs.
Excellent post Everett! I really like your point "#3 The old adage about "Knowing enough to get yourself into trouble" is as true as ever."
When one of my blogs got hit by Panda I decided to block all of the tag pages since they were very thin appearing to search engines. So I did meta noindex, removed from navigation, and blocked via robots.txt - and it took months to deindex...why? because I was too thorough in trying to get them removed, so blocking via robots.txt and removing links to them meant search engines didn't crawl them for months to see the meta noindex...
This is great advice for anyone doing a domain migration, I'll definitely be sharing it with a few folks. Thanks again!
This is a really interesting post and it explains well why sometimes when i set redirects for a small set of pages i get a duplicate Title and Description warning in my GWT.
And as iThinkMedia said it couldn'thave come at a better time since i'll soon relaunch a whole key section of one of my best websites :)
Thanks :D
Very interesting stuff, Everett! I've been dealing with more migrations/redirects recently, and had been thinking through this and came to the same sort of conclusion. I like it a lot more than the "temporarily dirty sitemap" strategy.
Well done and thanks for sharing the knowledge!
This post couldn't have come at a better time for me as I am right in the middle of managing my first site migration for a huge site! Some great tips here, most of which I will make the most of. Thanks Everett!
Please guide as I don't want to get hurt with existing indexing in SE.
I am working on a website having thousands of pages indexed in the search engines.
I want to code the website a bit to have to move the pages to the website's subdomain.
After few months around 3 or 5 months we want to revert back from subdomain to the main domain.
At this point is 301 redirect recommended or 302 redirect? I don't want to lose existing rankings and PR.
Please guide me as it is a matter of thousands of pages indexed and also the redirect will not be for the same domain but it's subdomain from www.site.com/project to project.site.com.
Please guide.
If you very strongly do not recommend playing with 301 or 302, then I will have to think of something else for the 3 to 5 months of work we are planning to do.
I don't want to take risks.
Thank you so much.
please reply me at premrishi at gmail dot com
What about this: Removing old URL types from the SERPS + cache via GWT?
Everett,
Thanks for this awesome (and timely, for me!) post!
I have a question:
Won't the old pages get crawled eventually (even though there are no links to them on the site) since they are indexed by Google. Don't they crawl all pages that are indexed? If so, the 301s will be reached and these pages will be de-indexed - correct or not?
Sometimes they'll get recrawled right away (on small sites with fresh content and lots of authority) but usually it takes several weeks for an average site, or even several months for large enterprise sites with hundreds of thousands of pages that don't have any external links. When the site used to make $10,000 a week and has been pinned down for several weeks after site-wide redirects you can imagine that any tactic to get things back on track sooner is going to be preferable over the wait-and-see approach. I've seen this work now on several sites. I read a lot of comments from people saying maybe it was just coincidence, or it would have happened anyway if you just wait. That's fine, but for me the proof is in the pudding and this has brought back several sites now nearly overnight so I'll keep doing it.
You can lead a horse to water but you can't make him drink.
Very good article on 301 redirects. Sounds like you have the team there, but apparently the search engines can't keep up in this case. No perfect solutions. I'm sure this strategy will change too, but great to know for now.
I Done my 301 on the 22nd May form old to a new and 3 weeks on im still struggling, I set my 301 and within and hour the bots were all over the site indexing the new URLS so i rubbed my hands and was impressed with how smooth it seemed to go, within 24 hours nearly every single URL was crawled
3 weeks on my old site still has 7600 URLs in the index and the new one has nearly all the 10,600 and traffic is at an all time low, we have confirmed all the 301's are working correctly to which they are.
I just need to clarify one thing what do you mean by "On March, 8 about half way through the day they flipped the switch to turn on the old internal link URLs and traffic from search more than doubled on the same day"
Sorry to sound a bit thick but i have to pass this info onto my developer and want to make sure i get it right, what exactly Swith to turn on the old Internal link URLs
Thanks
Hey,
Maybe I just read over what you said, but what exactly was different the second time around? Why did traffic stay stable when you changed the url structure back to the new structure?
I am going through a similar situation. We have launched a new site with new url structure. All 301s are in place, etc..We decided to relaunch the old site, but we are going to put the new site on a new domain and once the old site regains its traffic(hopefully it will) point everything at the new site.
THanks For Sharing... A gReat Post
Thanks that is very important things we should know if we are going to redesign and relaunch our website....relaunching definitely affect our seo part in website
This post was written in 2012. Google has gotten much faster at recognizing large-scale site changes, which triggers a quick, heavy crawl according to: https://webmasters.googleblog.com/2017/01/what-cra.... For this and other reasons I no longer recommend keeping navigation and other internal links pointing to the old site for any period of time. However, leaving the old XML Sitemap up, and requesting a crawl of it should still be done.
To be clear: At this point, five years after the post, I think leaving old links up sends mixed signals, and would be detrimental to the progress of a site migration. However, you still want the old URLs to be crawled: https://support.google.com/webmasters/answer/60658....
Thank u SO much
Two thumbs up for this post Everett! One of the best articles I have read so far. I believe that since every site is different it is just proper to have more than one option. We could stick with the XML sitemap resubmission to each of webmaster tools accounts as a best practice, and that "should" work just fine. After reading this post, I think I should follow your advice to leave up the old internal links for about one week after re-launching a website with new URLs on the same domain.
Just wanted to say we're going through this right now and this article has been immensely helpful. Thank you for the detailed writeup.
As it's a situation you don't want to experiment with and about wich few information is available I'm posting a followup on my post from march 29th.
Within 12 hours after my post or 48 hours after my URL's were wrongly changed I was able to put all URL's back as they were. I made some exceptions for very young pages where the original one was deleted and the new one was indexed yet.
Until now the results are very good. Pages which were online for 7 years, which gained lots of authority/backlinks and which were deleted from the index popped up again at Google's index at the same position as they used to be. So far so good.
However I do notice that many of the pages with the wrong URL's are still indexed which might bring up a duplicated penalty. However at this point only the wrong URL pages are somewhere down the rankings and of course we don't care.
I'm just keeping an eye on things and if it goes wrong I can still do a 301 redirect for all wrong URL pages as I was able to show them in Google, amount 102, which is a lot of work but it can be done.
I'll post another update later on.
Hello,
Thank you for the excellent article.
What do you do, however, if you fear that 301's could harm your new links? We have switched from one gear provider to another because our old provider took some new approaches with SERP's that we were not happy with. We suggested to them that they stop but they felt they were still being balanced with keywords, etc. However, we noticed that our e-commerce pages started moving down. So, we left to work with a new company and we would like to start fresh.
We have about 80k pages of content pages built over the last 15 years and about 30k pages of an e-commerce store. If the old 30k pages are at all poisoned, it does not seem logical for us to 301 the old pages to the new URL's.
How do we start fresh? 404's for that many pages seems extreme.
If you have any thoughts, I'd appreciate it.
Thanks
Great post Everett, this has become more and more important to our more muture client domains.
Thanks for the article. I've a similar job to tackle soon and am nervous about the links going funny!
I really liked the idea and i do support too that whenever such situation arises ew need to keep our old XML sitemap for a while with proper 301 redirect to new urls. Tracking of analytics data in such situations are utmost important, specially when we are dealing with a website with thousand and millions of pages.
Thanks for sharing this case study of yours.
Thanks for sharing this.
Wondering if it will not work:
- new sitemap with new URLs
- change the new links in the website structure
- keep the old URLs in the HTML sitemap
What do you think?
Gabandrei,
I think that would work, but it might take a little longer. I see Google crawling the primary navigation and category level navigation way more often than an html sitemap these days.
Does anyone else see that or is it just me? I really think your 301s will be found faster if the path to them is in the main navigation rather than a sitemap. But if that is a problem for developers or for other reasons, your solution sounds great to me. I'd update the URLs in the old sitemap after a couple of weeks, or when you notice they've all been recrawled and removed from the index.
This is very interesting study about 301 & How it useful to maintain traffic. I'm working on eCommerce website and I have done similar stuff on my website. I have big confusion to manage 301 redirect.
My website generates new URLs due to following actions.
I'm managing my 301 redirect with old practice. Excel sheet data from Google webmaster tools and set specific new URLs for redirect. Hoooo... Now, I have 8.5K redirect in htaccess... And, I'm thinking it's too much.
Can we remove old 301 redirect from htaccess or not? This is big question for me. Because, all pages are not hyperlink on external website. Google have just de-indexed old URLs and indexed new URLs. So, Is it require to maintain 301 redirect after Google process?
Look for patterns in your redirections and use regex for different groups of URL. Then, break into mapfiles and use the .htaccess to 'farm out' the redirect to the appropriate mapfile. Rebuild your server so the mapfiles work quickly.
Can you give me any reference article which I can forward to development team?
This is a good start point for your dev team: https://www.helicontech.com/ape/doc/mod_rewrite.htm Hope that helps you break into manageable maps!
CommercePundit,
You are asking some really good questions here. I remember hearing Matt Cutts say once that he would "leave them up" as long as you can, but that - generally speaking - after about a year or so and several attempts to access the old page and getting redirected, you could take them down. This would be especially true on any page that didn't have links pointing into it. However, I've seen way more than 8.5k redirects in an htaccess file without any noticeable performance issues in page load time.
Good luck!
That will good start for me to re-evaluate 301 redirect for all URLs which are not available on external website. Thanks for your reply.
Hi Everett Sizemore,
Thanks for sharing this great information. We lost's of suffer 301 redirect related pages.
Thanks again.
Some really interesting findings there. Maybe it would be a good idea to keep submitted to Google Webmaster Tools a copy of the old sitemaps with the old links and it might even be worthwhile having an HTML sitemap page(s) buried somwhere on the site with links to the old pages - for a while at least?
Great post. Could also have tried "fetch as Googlebot" on a few of the old URLs, pinged the old URLs with pingler, etc...
I typically recommend an HTML sitemap be made of all the 301's and to use the Fetch as Googlebot with submit + all linked URLs on that page along with a healthy dose of linking. The XML sitemap was a great touch too.
JoeYoungBlood,
That is a great recommendation for smaller sites or when you only have a few redirects, but doesn't that tool have a quota limit of 50 URLs per week? On a site with thousands or more URLs being redirected all at once, it may be necessary to try something else. Every site is different so it's good that we have several tools to work with an the community to tap into when things don't go perfectly.
You can only fetch 50 per week. My understanding is the submit "URL and all Linked Pages" feature has no hard limit on indexing. From webmaster tools:
"Select if your site has changed significantly. Google will use this URL as a starting point in indexing your site content. Google doesn't guarantee to index all pages on your site."
edit: fetch limit is at 500 per week, per webmaster tools account
Nice post, Everett. I'm going to add your suggestions to my best practices.
Hi Everret,
Thanks for this information. We are planning to revamp our website soon, so this will be very useful for us.
Everret,
Man this article came at a great time. I just got a call last night from a large ecomerce site that we have consulted for in the past and they just moved to a new site and got it all wrong and are losing $100,000 a week due to a tremendous loss of organic traffic. Quick question though. Would it make sense to use the Fetch as Googlebot tool in Google's webmaster tools to fetch manually some of the pages you want them to quickly crawl again? You know how you can fetch as Googlebot, then click the "submit to index" link, then submit "submit url and all indexed pages". You can do up to 500 of those a month. It isn't a quick solution but it may quickly get them to crawl and see the 301's. We are going to try it but wondering if you have any experience with that. Thanks again for the great post.
Mind blowing! Never heard of the concept before.
But I have a question, see, Google indexed thousands of my tag pages. After a website updating, the tag page links became broken. How do I get these pages unindexed by Google? The situation is that:
1. I cannot 301 redirect the old pages to the new ones, because most of the old tag pages don't make any sense and will be deleted. So most of the old tag pages don't have corresponding new pages.
2. I'm not going to link to tag pages on the new website. So, I don't have the pages where I can Leave the old links up.
3. And obviously the tag pages don't have external links.
From all the above, how to I do that? Do I submit a "very dirty" sitemap to Google?
Greatzqy,
If you are using wordpress you can install Yoast's SEO Plugin or any others that allow the tag urls to be noindexed and just check the box to remove them. Next time they are crawled Google should see the noindex meta tag and begin to remove these pages from their index. We have seen it take over a month to get them to crawl all of them and remove.
Thanks Josh! I'm not using wordpress. But good infomation though!
Very interesting post - am dealing with a few duplicate indexing issues myself at the moment
Hi Everett, thanks for the detailed post. It's an issue not deeply discussed at other places.
To be honest I arrived at your post as we are suffering a similar situation: yesterday our new website went online but every single URL has been changed during the update. Most of the time changes are as small as replacing a '_' with a '-'. That's why I only noticed it today and no redirects were setted up. Many pages are already deleted from the Google index and some new are indexed yet. It's a 50-50%. I hope to have the old URL's back within the next 24 hours but if the pages which are now deleted will index again, will they be considered as fresh new pages or will they still keep their authority? If someone has any advise please shoot. This is my work of the last 7 years and I hope to save as much from it as possible.
Thanks
I haven't worked on ecommerce sites of that scale for just about a year, but this all rings true. Looks like things haven't changed.
I learned quickly then that the XML sitemap from a dynamic enterprise ecomm has a lot of trouble staying "clean." Search engines really didn't seem to take to them anyway (and perhaps the "dirt" in them is why... it was interesting to see that addressed). In the end, I stopped fully relying on XML, while pushing developers for a way to keep the old waterways open (at-least) subtly on the page. Rarely ever got it.
It's great that FreeShipping.org was able to flash back up the old internal linking structure, which makes perfect sense. I think that's rare for many ecommerce platforms to be that flexible. Sounds like a good client!