Last year I had a project that required me to redirect over 50,000 URLs from an old website to a new website. This was one heck of a project, but having all those links disappear from Google simple was not an option (the company relies pretty heavily on Google for their revenue).
One thing that I noticed was that on the old website, a page could have multiple URLs. Some pages had 2 URL’s but some had up to 7 or 8. This prospect meant that I was going to have to create well over 100,000 redirects in my .htaccess file. Luckily the URLs, while they were different, they had a matching pattern to them.
The URLs looked something like this:
http://mydomain.com/products/23/46/11023/the-first-product-name.html http://mydomain.com/products/22/46/11023/the-first-product-renamed.html http://mydomain.com/products/22/56/11023/i-changed-this-product-name-yet-again.html
You will notice that these URLs all have the number 11023 in the exact same spot in the URL while just about any other thing about the URL is different. Now usually I would have written three 301 redirects for these that would have looked something like this:
redirect 301 /products/23/46/11023/the-first-product-name.html http://mynewdomain.com/my-new-url.html redirect 301 /products/22/46/11023/the-first-product-renamed.html http://mynewdomain.com/my-new-url.html redirect 301 /products/22/56/11023/i-changed-this-product-name-yet-again.html http://mynewdomain.com/my-new-url.html
This isn’t bad when you have only a few URL’s, but remember we have 50,000 more of these. We are also trying to make sure that we cover the highest amount of URLs possible (we are trying to make sure the company does not fall out of favor with Google) so we really should handle ALL possible situations. What we can use is a something called RedirectMatch. It gives you the ability to use regular expressions in your 301 redirect entries. So now we can create a single line that looks like this:
RedirectMatch 301 /products/(.*)/(.*)/11023/(.*)\.html http://mynewdomain.com/my-new-url.html
Now all three listed old URLs will redirect to the proper new page, as will any other URL that comes to the website that matches this pattern. Now one downside to using this method is that this takes up more time for Apache to check the incoming URL to find a redirect. You will notice that all these URLs were in a products directory. So what we can do is put all these redirects inside an .htaccess file in the products directory on the web server so only requests that are referencing the products will have this small lag.