Tags: canonical rewrite, htaccess canonicalization, url canonicalization, url normalization, wordpress canonicalization
In this post, we will learn more about the aspect of a search engine optimization technique, called URL Canonicalization or URL Normalization. We will see why it is important and why being knowledgeable about it will reduce duplicate content issues.
Do you have the right content on the web, but have four different ways to access it? if you do, it could affect your search engine rankings. If your still confused let me explain. The web is compromised of billions of URL’s or Uniform Resource Locator’s. A URL is part of the Uniform Resource Identifier (URI) that specifies where an identified resource (website) is available and the mechanism for retrieving it (http://,https://,smtp://,ftp://,etc.).
Now every URL has a structure when it is read by the web server. Let me show you an example below:
Each of these locations will go to the same page. This is due to improper canonicalization which is a process for converting data that has more than one possible representation into a “standard”, “normal”, or canonical form. Improper canonicalization can cause duplicate content and can affect your search engine rankings. This happens when the search engines index multiple versions of the same page and effectively divide the attributed link equity among the duplicates. Consolidating the worth of your pages by serving only a singular instance of each of them. Having the proper canonicalization is beneficial to you, your visitors, and your search engine rankings.
Google, Yahoo, and Bing all support canonicalization so why not provide it to them. There are several ways to do this but what I find the simplest is using htaccess files. However these only work on apache web servers. If you are unfamiliar with what web server you are using ask your web host provider. To first use htaccess files create one in the root of your website directory and name it exactly .htaccess do not forget the period (.) in the beginning and do not add any extensions to the end of it.
Before presenting a generalized, copy-&-paste version of this htaccess canonicalization solution, let’s examine the functionality involved with various directives.
Now using your favorite text editor of choice edit the file and put the following code:
RewriteEngine On
# Remove index.php from root URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php [NC]
RewriteRule ^index\.php$ http://domain.tld/ [R=301,L]
# Permanently redirect from www domain to non-www domain
RewriteCond %{HTTP_HOST} ^www\.domain\.tld$ [NC]
RewriteRule ^(.*)$ http://domain.tld/$1 [R=301,L]Of course you will need to replace “domain.tld” in the above example to your website domain. This will prevent search engines and visitors from linking to duplicate content since it will be automatically redirected to the proper structure. If you are using WordPress there is an even better way to do the same result while delivering dynamic content from your WordPress Blog. However, WordPress 2.3+ does provide a built-in URL canonicalization technique via PHP. Thus, if you are using WP 2.3+, you technically don’t need to use this method, though, if you prefer to handle URL rewriting at the server level, then you may indeed benefit from its use.
# Comprehensive URL Canonicalization for WordPress in Root
RedirectMatch permanent index.php/(.*) http://domain.tld/$1
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.domain\.tld$ [NC]
RewriteRule ^(.*)$ http://domain.tld%{REQUEST_URI} [R=301,L]
RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html|php)$ http://domain.tld/$1 [R=301,L]
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+[^/])$ %{REQUEST_URI}/ [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]If your WordPress blog is setup in the root directory then use the above example. If you have your blog in a subdirectory (www.domain.com/blog/) then see the code below.
# Comprehensive URL Canonicalization for WordPress in Subdirectory
RedirectMatch permanent index.php/(.*) http://domain.tld/subdirectory/$1
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.domain\.tld$ [NC]
RewriteRule ^(.*)$ http://domain.tld%{REQUEST_URI} [R=301,L]
RewriteBase /subdirectory/
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html|php)$ http://domain.tld/subdirectory/$1 [R=301,L]
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+[^/])$ %{REQUEST_URI}/ [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /subdirectory/index.php [L]So now you save and upload the htaccess file and it will help prevent duplicate content and provide search engines with better URL normalization. The canonicalization solution presented in this article is both comprehensive and effective. I have been using this technique on client websites since 2008 with great success. Since employing this method, I have virtually eliminated all opportunity for duplicate content to be served from my site and my clients websites. Of course, there are other duplicate content issues not associated with canonical URLs, but we will save that for another article.
2 Responses to Clarify Your Website Content In Search Engines with URL Normalization
March 14th, 2010 at 10:30 pm
Hi there, thanks for this useful information! For the code below, it’s for redirecting a www url to NON-www url, right? What if I wanted to redirect everything from NON-url version to www version? Can you please let me know what kind of code to use?
Thanks a lot !
# Comprehensive URL Canonicalization for WordPress in Subdirectory
RedirectMatch permanent index.php/(.*) http://domain.tld/subdirectory/$1
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.domain\.tld$ [NC]
RewriteRule ^(.*)$ http://domain.tld%{REQUEST_URI} [R=301,L]
RewriteBase /subdirectory/
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html|php)$ http://domain.tld/subdirectory/$1 [R=301,L]
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+[^/])$ %{REQUEST_URI}/ [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /subdirectory/index.php [L]
March 16th, 2010 at 7:40 pm
I think something like this would get you started:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www..*
RewriteCond %{HTTP_HOST} !^$
RewriteCond %{HTTP_HOST} ^([^.]*).(com|com/)
RewriteRule ^.*$ www.%1.%2%{REQUEST_URI} [R=301,L]