Ad

Apache Regex -301 Redirects To Eradicate Duplicates In Url Path

We are using a CMS that produces URLs of the format www.domain.com/home/help/contact/contact. Here the first occurrence of contact is the directory and the second occurrence is the HTML page itself. These urls are causing issues in the SEO space.

We have implemented canonical tags but the business wants to make sure they don't come across these duplicates in both the search engines and Google analytics, and have asked us to implement a 301 solution on our web server.

My question is we have got a regex to find these matches but I also need the part of the URL before the match. The regex we have is .*?([\w]+)\/\1+ and this returns contact in /home/help/contact/contact. How can I get the /home/help/ path as well so I can redirect to the right page? Can someone help with this please as I am a beginner when it comes to regex?

Ad

Answer

Since you're able to get contact using a matching group, enclose everything before that inside a matching group as well:

(.*?)(/[\w]+)\2+

I have put the / inside a matching group too, so that you won't get false positives for

    /home/some/app/page
this would be \1 ^ ^ found repetition (character p would be matched)
Ad
source: stackoverflow.com
Ad