I have a large site that has a sitemap of over a million links, most of which had already been indexed by google.
We changed the URL path for most urls such that it went from
old_path urls are properly using a 301 redirect to
new_path. In the HTML, I followed the instructions suggested by this MOZ post that suggests keeping the
old_path links in the HTML pages for a week to give google a chance to follow all the 301 redirects and re-index the old pages, so that we don't end up with many "Duplicate content" errors from both new and old paths getting indexed.
My question is how should I handle the sitemap in GWT? I currently have both my old sitemap with all the
old_path URLs, which was mostly indexed, and my new sitemap with
new_path URLs up at the same time. I was thinking it might be best to delete the
old_path sitemap since those are all resulting in a 301 redirect, but I am afraid that my rankings will quickly drop if deleting the old sitemap drops those URLs from the index.
Should I delete my old sitemap?
Simple answer. Your sitemap should reflect the structure you want and not the structure you do not want. It should only contain your new URLs.
As well, while the advice fro MOZ sounds good, if Google has all of your pages indexed, it does not care about links. It cares about URLs. Not URLs from links, but URLs of pages. This is one of the two keys for your site within the index and the most important in this case. When Google refetches pages, it does not use the link index, it uses the URL index. The link index is primarily used to calculate PageRank and discover new pages.
That being said, there is no harm in keeping old links out there for a period, however, I would suggest using the new links and dropping old links as soon as you can so that the entries in the link index are updated as quickly as possible. Why prolong the PageRank advantage and confuse things?
One other thing I would recommend is using the canonical tag on your new pages to point to itself. Of course you must use a full URL for this. This is your primary insurance against duplicate content.
Duplicate content issues often resolve themselves quickly. I have never heard of duplicate content issues using a 301 redirect to restructure a site. A few years ago, I went through a similar restructuring with 287,000 301 redirects and not one issue with duplicate content. I left my 301 redirects in place for 6 months which was probably far too long. I did this for Bing and others more than Google. Once Google started to crawl my site and see the 301 redirects, it massively reindexed my site within just a week or so.
With such a massive link profile, you may never want to remove the 301 redirects, however, I do warn you that at some point you should. Compare link profiles of your new structure against your old link structure. At some point, you will be far better off removing the 301 redirects and coping with the loss which should be much easier with a strong link profile for the new structure. Granted, it could take a while for this to happen. Do not be in a hurry. With such a massive link profile, you have value that you will not want to let go of easily and certainly not without a lot of consideration and making sure that the link profile to the new structure is stronger than the old.
Sometimes it is best to pull the bandage off quickly. I am not suggesting that you be careless, however, there is a balance that you need to decide for yourself. Considering the size of your site, I suggest being more careful than I was. I personally did not care on whit. I suggest that you do. Be careful, whip out the spreadsheet, and gather evidence as much as you can before making large changes at each step of the process.