URL Validation in Ruby/Rails

A project I’m working on has a need to validate URLs that users enter. Thinking that this would be just a straightforward exercise in regular expressions, I hit Google to find out who’d already done the hard work for me. The true spirit of reuse 🙂

A couple of false starts later and I’d found url_validation_improved. This seemed to be just the ticket, it has a regex for checking the URL format and even tests the connection.

To get started with my own validation I just wanted the regular expression part as my own project currently only needs to validate the format of the URL. Here’s the regex from url validation improved:

/^(http|https)://[a-z0-9]+([-.]{1}[a-z0-9]+)*

.[a-z]{2,5}(([0-9]{1,5})?/.*)?$/ix

Looking good. I’m not overly familiar with using regular expressions, so I plugged it into my model with validates_format_of and whipped up a unit test to throw a bunch of URLs at it. Everything was going fine, until I added the URL for a test server the application will be interfacing with to the unit test. As it’s a locally hosted rails app, the base URL is http://127.0.0.1:3000. Suddenly, my tests imploded. It turns out that this regex doesn’t allow IP speficied URLs or port numbers. Back to the drawing board Google I went.

Not finding anything much within Google, I started to wonder whether any of the built-in Ruby classes or libararies could help. It wasn’t long before URI caught my eye, flirting with me and giggling as it showed off its parse method, which takes a uri string and returns an appropriate URI subclass representing the URI. The hussy. Not only does it do that, but it raises a URI::InvalidURIError if the uri given is, well, invalid.

Ripping the disappointing validates_format_of from my model, in went a shiny new validate method. All it has to do is check wether a URI::InvalidURIError has been raised, and also ensure that the returned URI subclass is for a protocol that’s acceptable. Here’s the whole thing:

def validate
  begin
    uri = URI.parse(url)
    if uri.class != URI::HTTP
      errors.add(:url, 'Only HTTP protocol addresses can be used')
    end
    rescue URI::InvalidURIError
      errors.add(:url, 'The format of the url is not valid.')
    end
  end

As there was now a possibility of two different error messages appearing on my model I had to update my unit test. Once that was done, everything passes. The balance of the universe is restored.

There’s just a couple of caveats. Sometimes, URI.parse returns a URI::Generic, which is the parent of the other URI types. I’ve not looked deeply into why this is, but it seems to happen when URI is sure enough that the string you’ve passed really does represent a URI, but can’t actually identify a protocol. Since I know I only want to deal with valid HTTP addresses, I restrict my code to only accept those as valid.

It should also be noted that there are some subtle differences between URIs and URLs (URLs are a subset of URIs) but finding out what that means in practical terms seems to be tricky. I’m making the assumption that if a string passes this validation that I can use it as what I would think of as an URL.

Interestingly, the url_validation_improved code calls URI.parse about 5 lines after it checks URLs against the regex. I wonder why it don’t use that as the test of URL validity…?

Advertisements

61 Responses to URL Validation in Ruby/Rails

  1. gregt says:

    very helpful, thanks!

  2. vijitha says:

    yeh it’s really helpfull thanx..

  3. Draconid says:

    I’ve just used this and it’s mostly great. However I’ve found that it does parse urls without a domain at the end (I did my domain – http://www.draigwen” and it passes).

  4. Julie says:

    I think some of the backslashes went missing when you tried to put it online

    In my humble opinion, the “.” character at the beginning of the second line needs to be escaped with a backslash. It represents the dot in “.com” (no pun intended).
    the slashes also need to be escaped.

    Dunno if this will show properly on the webpage but here’s the escaped string:
    /^(https?:\/\/)?[a-z0-9]+([-.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/ix

  5. riki says:

    I think any expression like this is potentially flawed. A case of too much software.

    Basically because it’s impossible to anticipate all the possible exceptions, such as international domain names, addresses entered without http:// for example http://www.bbc.co.uk they may not be correct urls but from a usability standpoint the code should be able to accommodate lazy users.

    Also strange subdomains like wwwwwwww.domain.com or .info domains or links to files that non tech savy users have uploaded ie http://www.domain.com/My%20Document.pdf

  6. WP says:

    90 Blog Themes

    download 90 themes for your blog

  7. shyl says:

    Hi …Thanks,
    I have decided to go with URI 🙂

    shyl –

  8. Ryan says:

    I’m not sure what this catches… it allows: http://blah

  9. Tom Harrison says:

    Ah Google. It led me here, but I have found that URI is very, very picky about URLs.

    For example, this one from target.com cannot be parsed:

    http://www.target.com/gp/detail.html/602-4045909-4263801?ASIN=B000NPCK3W&AFID=Froogle&LNM=B000NPCK3W|Lexmark_AllInOne_Printer_with_Scanner_and_Copier__X1240&ci_src=14110944&ci_sku=B000NPCK3W&ref=tgt_adv_XSG10001

    I think it is the vertical bar in the URL, but we have found numerous other characters (e.g. carat) that URI wont accept.

    Seeking alternatives…

  10. Tom Harrison says:

    Didn’t find any alternatives: the problem in the URL above is the vertical bar. If I escape it with %7D URI is happy as a clam.

    So reading the URI specification, it does appear that this character falls into a group that while specifically excluded are in a class that should not be used in URLs (or escaped if they are). So I would be fine with Ruby’s picky URI class, except that in Rails, URLs are sometimes generated with ids in [123] square brackets, which are also not allowed by the spec, but which URI seems fine with.

    So anyway, if anyone runs into this, just gsub replace out any characters like carat, backtick, tilde and possibly others with their CGI encoded variants.

  11. here it is says:

    This is the best regex i’ve found so far

    /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/ix

    It handles both http, https, ip addresses, domain names, port numbers, domain names up to 5 characters, and even domains like the one Tom Harrison said could not be matched, were matched correctly by the above regex.

    If you want to try it yourself

    go into the console mode:

    ruby ./script/console

    url = “http://www.target.com/gp/detail.html/602-4045909-4263801?ASIN=B000NPCK3W&AFID=Froogle&LNM=B000NPCK3W|Lexmark_AllInOne_Printer_with_Scanner_and_Copier__X1240&ci_src=14110944&ci_sku=B000NPCK3W&ref=tgt_adv_XSG10001”

    reg = /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/ix

    reg.match(url) ? true : false

    you’ll see it’ll return true

  12. here it is says:

    When I said domain names up to 5 characters I meant domain extensions, like .info, .com, .org, .tv etc

  13. jacques says:

    in a description text i have to find and replace urls…
    hi how can i search for reg in a string? How should the pattern look like?

    thank you

  14. alan says:

    Check this model, towards the bottom there is a pretty good regex. It only needs to recognise urls with http|https, but that is very easy to do.

    http://sample.caboo.se/weed2/app/models/domain.rb

  15. alan says:

    in case its hard to see…

    PORT = /(([:]\d+)?)/
    DOMAIN = /([a-z0-9\-]+\.?)*([a-z0-9]{2,})\.[a-z]{2,}/
    NUMERIC_IP = /(?>(?:1?\d?\d|2[0-4]\d|25[0-5])\.){3}(?:1?\d?\d|2[0-4]\d|25[0-5])(?:\/(?:[12]?\d|3[012])|-(?>(?:1?\d?\d|2[0-4]\d|25[0-5])\.){3}(?:1?\d?\d|2[0-4]\d|25[0-5]))?/

    validates_format_of :name, :with => /^((localhost)|#{DOMAIN}|#{NUMERIC_IP})#{PORT}$/

  16. JR says:

    Very informative.

  17. jaren says:

    f4hvYk dfv078fnw8f934ndvkg2l

  18. Dean says:

    Excellent. Thanks for the info!

  19. Soinype says:

    Подскажите шооблончег под WordPress 2.6.2, чтобы был похож на ваш actsasblog.wordpress.com.

    Заранее благодарю)

  20. Darren says:

    What about http://www.domain.com, none of these expressions will handle that? I know most young people now don’t do the ‘oldschool’ www but the old generation still do all the time. One way around that would be to before_validation gsub the www. out and add in a http://.

  21. elokcomputer says:

    yes, .. thank’s for share … ^_^

  22. Liza says:

    I can tell that this is not the first time you write about the topic. Why have you decided to write about it again?

  23. e7cjwl says:

    yugygu6756 tyu hffdrtd y guyg ug

  24. sandrar says:

    Hi! I was surfing and found your blog post… nice! I love your blog. 🙂 Cheers! Sandra. R.

  25. His_wife86 says:

    This is a remarkable scandal. ,

  26. Una Morabito says:

    Bonjour, Great site! I really like the layout. The one issue I have is that the RSS feed arnt saving properly in my viewer.

  27. gnrfan says:

    This one ended working for me:

    /^(http|https):\/\/([a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}|(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}|localhost)(:[0-9]{1,5})?(\/.*)?$/ix

    This matches, for example:

    localhost
    127.0.0.1:4567
    http://twitter.com/gnrfan

    It is not perfect but hope it helps someone 🙂

  28. black celebs says:

    Sign: wdpad Hello!!! ggret and 8226zkrgyrynvx and 8365 : Nice blog!

  29. make money , link trade…

    […]URL Validation in Ruby/Rails « acts_as_blog[…]…

  30. web hosting says:

    web hosting…

    […]URL Validation in Ruby/Rails « acts_as_blog[…]…

  31. blog turysty says:

    Thank you for another excellent article. Where else could anyone get that kind of info in such a perfect way of writing? I’ve a presentation next week, and I’m on the look for such info.

  32. I really like your blog.. very nice colors & theme.
    Did you make this website yourself or did you hire someone to do it for you?
    Plz answer back as I’m looking to design my own blog and would like to know where u got this from. thanks a lot

  33. obviously like your web site however you have to test the spelling on quite a few of your posts.
    Many of them are rife with spelling problems and I to find it
    very bothersome to tell the reality then again I’ll surely come back again.

  34. Reach truck – This one is a variant on a Rider Stacker forklift, designed for small aisles, usually electrically powered, named because the
    forks can extend to reach the load. Also, while operating the machine,
    care should be taken not to apply sudden brakes. If this is not the case then accidents
    can always occur.

  35. whoah this weblog is great i really like studying your articles.
    Stay up the great work! You know, many people are searching around for this information, you can help them greatly.

  36. When i discussed your website on twitter, hkpe you don’t care.

  37. kid crafts says:

    Your style is unique compared to other people I’ve read stuff from.
    Many thanks for posting when you’ve got the opportunity, Guess I
    will just bookmark this blog.

  38. Aidan says:

    Simply how do I buy a wholesale pallet?

  39. Actually when someone doesn’t know afterward its up to other visitors that they will assist, so here it happens.

  40. Fantastic blog! Do you have any suggestions for aspiring writers?

    I’m hoping to start my own site soon but I’m
    a little lost on everything. Would you advise starting with a free
    platform like WordPress oor go for a ppaid option?
    There are so many options ouut there that I’m totally confused ..
    Any tips? Bless you!

  41. Antony says:

    I’m impressed, I have too admit. Seldom do I encounter a blog that’s equally educative and interesting, and llet me tell you,
    you’ve hit the nail on the head. The problem is an ssue hat nott enough folks are speaking intelligently
    about. I am very happy that I stumbled across this in my
    hunt for something regarding this.

  42. Hi, after reading this remarkable article i am also glad to
    share my familiarity here with mates.

  43. Excelpent website. A lot of helpful information here.

    I’m sending it to several pals ans also sharing in delicious.

    And certainly, thanks in your effort!

  44. Link exchangee is nothing else however it is just
    placing the other person’s weblog link on your paqge at suitable place and other person will also ddo same inn favor off you.

  45. Fantastic website. Plenty of ueful information here.
    I’m seending it to a few pazls ans additionally sharing in delicious.
    And certainly, thanks to your sweat!

  46. It is perfect time to make some plans for the long rrun and it is
    time to be happy. I have learn this submit and
    if I may I wish to counsel you some attention-grabbing issues or advice.
    Maybe you could write ext articles referring tto this article.

    I desire to leqrn more things approximately it!

  47. Helen says:

    Thanks , I’ve just been looking for info approximately this topic for a long time and yours is the greatest I’ve found out till now.
    However, what concerning the conclusion? Are you sure about the source?

  48. Hi! Would you mind if I share your blog with my myspace group?

    There’s a lot of folks that I think would really appreciate
    your content. Please let me know. Thanks

  49. This post will assist the internet users for creating new blog or even a weblog from start to
    end.

  50. phyto sc says:

    This post provides clear idea in support of
    the new viewers of blogging, that actually how to do running a blog.

  51. Wonderful site you have here but I was curious
    if you knew of any message boards that cover the same
    topics talked about in this article? I’d really love to be a part of group
    where I can get comments from other experienced individuals that share
    the same interest. If you have any recommendations, please let me know.

    Thanks!

  52. help with military resume

    URL Validation in Ruby/Rails | acts_as_blog

  53. Zju.in says:

    I’ve been surfing on-line greater than three hours lately,
    yet I by no meazns discovered aany attention-grabbing article like yours.
    It is pretty value sufficient for me. In my view,
    if all webmasters and bloggers made just right content as you did, the internet will probably
    be a lot more helpful than ever before.

  54. casino partouche gratuit machine a sous

    URL Validation in Ruby/Rails | acts_as_blog

  55. prix machine a mise sous plis

    URL Validation in Ruby/Rails | acts_as_blog

  56. Dorcas says:

    Writing blog is tedious.I know where you can get unlimited articles
    for your site, search in google:
    Anightund’s rewriter

  57. Well I really enjoyed studying it. This post offered by you is very constructive for accurate planning.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: