Program The rantings of a lunatic Scientist

New URL schema using URL Rewrite

L2Program Web Dev

Just finished up implementing URL Rewrite for the site. Effectively this means that the normal urls such as

l2program.co.uk/?id=42

Can be disguised as

l2program.co.uk/post/42/Friends_dont_let_friends_use_Internet_Explorer

Why is this good? I mean, one is considerably longer than the other. But who on earth really types in a whole URL manually to access a specific page of a website? I personally can only ever remember doing so while typing long and ugly URL’s off of a lab sheet in High School Chemistry class.

Edit: It should also be mentioned that because the post title is optional in the url…

l2program.co.uk/post/42

l2program.co.uk/post/42/

l2program.co.uk/post/42/any_random_string_of_chars_digits_and_underscores

… Will all work. As long as the /post/ prefix is followed by a digit (and a slash if anything else follows) you’re good to go! The title string is only there so that in an anchor tag it is visible. A user hovering over the link of a search engine crawling will see that string be able to see where the link leads more clearly.

While the new “fake” urls may be longer they benefit from being friendlier. If someone hovers over the link and sees the address preview they will have a much better idea of where their browser is about to be directed. Along with this ‘human’ friendliness the new urls also benefit from being more frendly to search engines. By appending a stripped and simplified version of the article title to the end of the url we have packed the url full of key words relevant to the pages content.

The simplest way of implementing URL Rewrite is to modify or create a .htaccess file in the root folder of your site. Here is the .htaccess file I have written.

There are two Rewrite rules in the .htaccess file. Lets have a look at what they are doing by breaking them apart

  • RewriteRule // This keyword tells Apache we want to define a rule
  • ^post/([0-9]+)(:?\/[a-zA-Z0-9_])?(:?\/.)?$ // If the url passed in matches this regex then Apache will swap it with the next parameter…
  • /index.php?id=$1 // The replace string. $1 represents the part of the previous regex’s capture group. In this case it is the post id

Long story short, that rule matches URL’s of the form /post/42/Friends_dont_let_friends_use_Internet_Explorer and turns them into index.php?id=42 . Similarly, due to the second Rewrite Rule, if a URL matches /post/tag/5/Ray_Tracing it will be converted to index.php?filter=tag&value=5 . Notice how the titles in the url, either of a post or tag, do not matter when the url is converted. This because they are purely there for show. For the benefit of the user or search engine that will be looking at the URL. All that matters is the post id or filter and value.

All this happens behind the scenes though. Your users will never know the truth, and you get to sit there smugly while your code works with incredibly simple urls while displaying beautiful friendly ones.

To help give a complete picture, here is the php function I use to turn an articles title string into a string that can be used in a url. It runs two very simple regex’s on the input string and then returns the result. The first regex simply takes every character or group of characters that are neither letters/numbers/spaces/underscores and removes them by replacing them with an empty string. The second regex finishes the job by replacing any space or group of spaces with a single underscore.

Thus Friends don’t let friends use Internet Explorer… becomes Friends_dont_let_friends_use_Internet_Explorer