2011-04-20 12:29:01 +00:00
<?xml version="1.0" encoding="utf-8"?>
< !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
< html xmlns = "http://www.w3.org/1999/xhtml" lang = "fr" xml:lang = "fr" >
< head >
< meta http-equiv = "Content-Type" content = "text/html; charset=UTF-8" / >
< meta name = "keywords" content = "programming, regexp, regular expression, extension, file" >
2011-04-20 13:56:52 +00:00
< link rel = "shortcut icon" type = "image/x-icon" href = "/Scratch/img/favicon.ico" / >
< link rel = "stylesheet" type = "text/css" href = "/Scratch/assets/css/main.css" / >
2012-04-02 21:43:39 +00:00
< link rel = "stylesheet" type = "text/css" href = "/Scratch/css/solarized.css" / >
2011-04-20 13:56:52 +00:00
< link rel = "stylesheet" type = "text/css" href = "/Scratch/css/idc.css" / >
2012-05-02 15:43:56 +00:00
< link href = 'http://fonts.googleapis.com/css?family=Inconsolata' rel = 'stylesheet' type = 'text/css' >
2011-04-20 12:29:01 +00:00
< link rel = "alternate" type = "application/rss+xml" title = "RSS" href = "http://feeds.feedburner.com/yannespositocomen" / >
2011-04-20 13:56:52 +00:00
< link rel = "alternate" lang = "fr" xml:lang = "fr" title = "Quand se passer des expressions régulières ?" type = "text/html" hreflang = "fr" href = "/Scratch/fr/blog/2010-02-23-When-regexp-is-not-the-best-solution/" / >
< link rel = "alternate" lang = "en" xml:lang = "en" title = "When regexp is not the best solution" type = "text/html" hreflang = "en" href = "/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/" / >
< script type = "text/javascript" src = "/Scratch/js/jquery-1.3.1.min.js" > < / script >
< script type = "text/javascript" src = "/Scratch/js/jquery.cookie.js" > < / script >
< script type = "text/javascript" src = "/Scratch/js/index.js" > < / script >
2012-05-02 15:43:56 +00:00
< script type = "text/javascript" src = "/Scratch/js/highlight/highlight.pack.js" > < / script >
< script type = "text/javascript" src = "/Scratch/js/article.js" > < / script >
2011-04-20 12:29:01 +00:00
<!-- [if lt IE 9]>
< script src = "http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js" > < / script >
<![endif]-->
< title > When regexp is not the best solution< / title >
< / head >
2011-10-18 22:30:00 +00:00
< body lang = "en" class = "article" >
2011-04-20 12:29:01 +00:00
< script type = "text/javascript" > / / < ! [ C D A T A [
2011-04-20 13:56:52 +00:00
document.write('< div id = "blackpage" > < img src = "/Scratch/img/loading.gif" alt = "loading..." / > < / div > ');
2011-04-20 12:29:01 +00:00
// ]]>
< / script >
< div id = "content" >
< div id = "choix" >
< div class = "return" > < a href = "#entete" > ↓ Menu ↓ < / a > < / div >
< div id = "choixlang" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/fr/blog/2010-02-23-When-regexp-is-not-the-best-solution/" onclick = "setLanguage('fr')" > en Français< / a >
2011-04-20 12:29:01 +00:00
< / div >
2011-09-28 16:05:55 +00:00
< div class = "flush" > < / div >
2011-04-20 12:29:01 +00:00
< / div >
< div id = "titre" >
< h1 >
When regexp is not the best solution
< / h1 >
< / div >
< div class = "flush" > < / div >
< div class = "flush" > < / div >
< div id = "afterheader" >
< div class = "corps" >
< p > Regular expression are really useful. Unfortunately, they are not always the best way of doing things.
Particularly when transformations you want to make are easy.< / p >
< p > I wanted to know how to get file extension from filename the fastest way possible. There is 3 natural way of doing this:< / p >
2012-05-02 15:43:56 +00:00
< div >
2011-04-20 12:29:01 +00:00
2012-05-02 15:43:56 +00:00
< pre > < code class = "ruby" > # regexp
str.match(/[^.]*$/);
ext=$&
2011-04-20 12:29:01 +00:00
2012-05-02 15:43:56 +00:00
# split
ext=str.split('.')[-1]
# File module
ext=File.extname(str)
< / code > < / pre >
< / div >
2011-04-20 12:29:01 +00:00
< p > At first sight I believed that the regexp should be faster than the split because it could be many < code > .< / code > in a filename. But in reality, most of time there is only one dot and I realized the split will be faster. But not the fastest way. There is a function dedicated to this work in the < code > File< / code > module.< / p >
< p > Here is the Benchmark ruby code:< / p >
2012-05-02 15:43:56 +00:00
< div > < div class = "codefile" > < a href = "/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/code/regex_benchmark_ext.rb" > ➥ regex_benchmark_ext.rb< / a > < / div >
< pre > < code class = "ruby" > #!/usr/bin/env ruby
require 'benchmark'
n=80000
tab=[ '/accounts/user.json',
'/accounts/user.xml',
'/user/titi/blog/toto.json',
'/user/titi/blog/toto.xml' ]
puts "Get extname"
Benchmark.bm do |x|
x.report("regexp:") { n.times do
str=tab[rand(4)];
str.match(/[^.]*$/);
ext=$& ;
end }
x.report(" split:") { n.times do
str=tab[rand(4)];
ext=str.split('.')[-1] ;
end }
x.report(" File:") { n.times do
str=tab[rand(4)];
ext=File.extname(str);
end }
end
< / code > < / pre >
< / div >
2011-04-20 12:29:01 +00:00
< p > And here is the result< / p >
< pre class = "twilight" >
Get extname
user system total real
regexp: 2.550000 0.020000 2.570000 ( 2.693407)
split: 1.080000 0.050000 1.130000 ( 1.190408)
File: 0.640000 0.030000 0.670000 ( 0.717748)
< / pre >
< p > Conclusion of this benchmark, dedicated function are better than your way of doing stuff (most of time).< / p >
< h2 id = "file-path-without-the-extension" > file path without the extension.< / h2 >
2012-05-02 15:43:56 +00:00
< div > < div class = "codefile" > < a href = "/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/code/regex_benchmark_strip.rb" > ➥ regex_benchmark_strip.rb< / a > < / div >
< pre > < code class = "ruby" > #!/usr/bin/env ruby
require 'benchmark'
n=80000
tab=[ '/accounts/user.json',
'/accounts/user.xml',
'/user/titi/blog/toto.json',
'/user/titi/blog/toto.xml' ]
puts "remove extension"
Benchmark.bm do |x|
x.report(" File:") { n.times do
str=tab[rand(4)];
path=File.expand_path(str,File.basename(str,File.extname(str)));
end }
x.report("chomp:") { n.times do
str=tab[rand(4)];
ext=File.extname(str);
path=str.chomp(ext);
end }
end
< / code > < / pre >
< / div >
2011-04-20 12:29:01 +00:00
< p > and here is the result:< / p >
< pre class = "twilight" >
remove extension
user system total real
File: 0.970000 0.060000 1.030000 ( 1.081398)
chomp: 0.820000 0.040000 0.860000 ( 0.947432)
< / pre >
< p > Conclusion of the second benchmark. One simple function is better than three dedicated functions. No surprise, but it is good to know.< / p >
< / div >
2012-04-10 13:56:34 +00:00
< div id = "social" >
< div class = "left" > < a href = "https://twitter.com/share" class = "twitter-share-button" data-via = "yogsototh" > Tweet< / a >
< script > ! function ( d , s , id ) { var js , fjs = d . getElementsByTagName ( s ) [ 0 ] ; if ( ! d . getElementById ( id ) ) { js = d . createElement ( s ) ; js . id = id ; js . src = "//platform.twitter.com/widgets.js" ; fjs . parentNode . insertBefore ( js , fjs ) ; } } ( document , "script" , "twitter-wjs" ) ; < / script >
< / div >
< div class = "left" > < div class = "g-plusone" data-size = "medium" data-annotation = "inline" data-width = "106" > < / div >
< script type = "text/javascript" >
(function() {
var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true;
po.src = 'https://apis.google.com/js/plusone.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
})();
< / script >
< / div >
< div class = "flush" > < / div >
< / div >
2011-04-20 12:29:01 +00:00
< div id = "choixrss" >
< a id = "rss" href = "http://feeds.feedburner.com/yannespositocomen" >
Subscribe
< / a >
< / div >
< script type = "text/javascript" >
$(document).ready(function(){
$('#comment').hide();
$('#clickcomment').click(showComments);
});
function showComments() {
$('#comment').show();
$('#clickcomment').fadeOut();
}
2012-04-10 13:56:34 +00:00
document.write('< div id = "clickcomment" > Comments & Share< / div > ');
2011-04-20 12:29:01 +00:00
< / script >
< div class = "flush" > < / div >
2012-04-10 13:56:34 +00:00
2011-04-20 12:29:01 +00:00
< div class = "corps" id = "comment" >
< h2 class = "first" > comments< / h2 >
< noscript >
You must enable javascript to comment.
< / noscript >
< script type = "text/javascript" >
var idcomments_acct = 'a307f0044511ff1b5cfca573fc0a52e7';
2011-04-20 13:56:52 +00:00
var idcomments_post_id = '/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/';
var idcomments_post_url = 'http://yannesposito.com/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/';
2011-04-20 12:29:01 +00:00
< / script >
< span id = "IDCommentsPostTitle" style = "display:none" > < / span >
2011-04-20 13:56:52 +00:00
< script type = 'text/javascript' src = '/Scratch/js/genericCommentWrapperV2.js' > < / script >
2011-04-20 12:29:01 +00:00
< / div >
< div id = "entete" class = "corps_spaced" >
< div id = "liens" >
2011-04-20 13:56:52 +00:00
< ul > < li > < a href = "/Scratch/en/" > Home< / a > < / li >
< li > < a href = "/Scratch/en/blog/" > Blog< / a > < / li >
< li > < a href = "/Scratch/en/softwares/" > Softwares< / a > < / li >
< li > < a href = "/Scratch/en/about/" > About< / a > < / li > < / ul >
2011-04-20 12:29:01 +00:00
< / div >
< div class = "flush" > < / div >
< hr / >
< div id = "next_before_articles" >
< div id = "previous_articles" >
previous entries
< div class = "previous_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/" > < span class = "nicer" > «< / span > split a file by keyword< / a >
2011-04-20 12:29:01 +00:00
< / div >
< div class = "previous_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-02-16-All-but-something-regexp--2-/" > < span class = "nicer" > «< / span > Pragmatic Regular Expression Exclude (2)< / a >
2011-04-20 12:29:01 +00:00
< / div >
< div class = "previous_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-02-15-All-but-something-regexp/" > < span class = "nicer" > «< / span > Pragmatic Regular Expression Exclude< / a >
2011-04-20 12:29:01 +00:00
< / div >
< / div >
< div id = "next_articles" >
next entries
< div class = "next_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-03-22-Git-Tips/" > Git Tips < span class = "nicer" > »< / span > < / a >
2011-04-20 12:29:01 +00:00
< / div >
< div class = "next_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-03-23-Encapsulate-git/" > Encapsulate git < span class = "nicer" > »< / span > < / a >
2011-04-20 12:29:01 +00:00
< / div >
< div class = "next_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-05-17-at-least-this-blog-revive/" > I live again! < span class = "nicer" > »< / span > < / a >
2011-04-20 12:29:01 +00:00
< / div >
< / div >
< div class = "flush" > < / div >
< / div >
< / div >
< div id = "bottom" >
2012-04-02 21:43:39 +00:00
< div >
2012-04-10 13:56:34 +00:00
< a href = "https://twitter.com/yogsototh" > Follow @yogsototh< / a >
2012-04-02 21:43:39 +00:00
< / div >
2011-04-20 12:29:01 +00:00
< div >
< a rel = "license" href = "http://creativecommons.org/licenses/by-sa/3.0/" > Copyright ©, Yann Esposito< / a >
< / div >
< div id = "lastmod" >
Created: 02/23/2010
Modified: 05/09/2010
< / div >
< div >
Entirely done with
< a href = "http://www.vim.org" > Vim< / a >
and
< a href = "http://nanoc.stoneship.org" > nanoc< / a >
< / div >
< / div >
< div class = "clear" > < / div >
< / div >
< / body >
< / html >