----- # Custom isHidden: false menupriority: 1 kind: article created_at: 2010-02-23T10:09:52+02:00 title: When regexp is not the best solution multiTitle: fr: When regexp is not the best solution en: When regexp is not the best solution multiDescription: fr: pas de description. en: no description. tags: - programming - regexp - regular expression - extension - file ----- Regular expression are really useful. Unfortunately, they are not always the best way of doing things. Particularly when transformations you want to make are easy. I wanted to know how to get file extension from filename the fastest way possible. There is 3 natural way of doing this:
# regexp str.match(/[^.]*$/); ext=$& # split ext=str.split('.')[-1] # File module ext=File.extname(str)
At first sight I believed that the regexp should be faster than the split because it could be many `.` in a filename. But in reality, most of time there is only one dot and I realized the split will be faster. But not the fastest way. There is a function dedicated to this work in the `File` module. Here is the Benchmark ruby code:
#!/usr/bin/env ruby require 'benchmark' n=80000 tab=[ '/accounts/user.json', '/accounts/user.xml', '/user/titi/blog/toto.json', '/user/titi/blog/toto.xml' ] puts "Get extname" Benchmark.bm do |x| x.report("regexp:") { n.times do str=tab[rand(4)]; str.match(/[^.]*$/); ext=$&; end } x.report(" split:") { n.times do str=tab[rand(4)]; ext=str.split('.')[-1] ; end } x.report(" File:") { n.times do str=tab[rand(4)]; ext=File.extname(str); end } end
And here is the result
Get extname
            user     system      total        real
regexp:  2.550000   0.020000   2.570000 (  2.693407)
 split:  1.080000   0.050000   1.130000 (  1.190408)
  File:  0.640000   0.030000   0.670000 (  0.717748)
Conclusion of this benchmark, dedicated function are better than your way of doing stuff (most of time). ## file path without the extension.
#!/usr/bin/env ruby require 'benchmark' n=80000 tab=[ '/accounts/user.json', '/accounts/user.xml', '/user/titi/blog/toto.json', '/user/titi/blog/toto.xml' ] puts "remove extension" Benchmark.bm do |x| x.report(" File:") { n.times do str=tab[rand(4)]; path=File.expand_path(str,File.basename(str,File.extname(str))); end } x.report("chomp:") { n.times do str=tab[rand(4)]; ext=File.extname(str); path=str.chomp(ext); end } end
and here is the result:
remove extension
          user     system      total        real
 File:  0.970000   0.060000   1.030000 (  1.081398)
chomp:  0.820000   0.040000   0.860000 (  0.947432)
Conclusion of the second benchmark. One simple function is better than three dedicated functions. No surprise, but it is good to know.