275 lines
No EOL
16 KiB
HTML
275 lines
No EOL
16 KiB
HTML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|
|
|
|
|
<meta name="keywords" content="programming, regexp, regular expression, extension, file">
|
|
|
|
<link rel="shortcut icon" type="image/x-icon" href="/Scratch/img/favicon.ico" />
|
|
<link rel="stylesheet" type="text/css" href="/Scratch/assets/css/main.css" />
|
|
<link rel="stylesheet" type="text/css" href="/Scratch/css/twilight.css" />
|
|
<link rel="stylesheet" type="text/css" href="/Scratch/css/idc.css" />
|
|
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://feeds.feedburner.com/yannespositocomen"/>
|
|
|
|
<link rel="alternate" lang="fr" xml:lang="fr" title="Quand se passer des expressions régulières ?" type="text/html" hreflang="fr" href="/Scratch/fr/blog/2010-02-23-When-regexp-is-not-the-best-solution/" />
|
|
<link rel="alternate" lang="en" xml:lang="en" title="When regexp is not the best solution" type="text/html" hreflang="en" href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/" />
|
|
<script type="text/javascript" src="/Scratch/js/jquery-1.3.1.min.js"></script>
|
|
<script type="text/javascript" src="/Scratch/js/jquery.cookie.js"></script>
|
|
<script type="text/javascript" src="/Scratch/js/index.js"></script>
|
|
<!-- < % if containMaths %>
|
|
<script type="text/javascript" src="/Scratch/js/MathJax/MathJax.js"></script>
|
|
< % end %>
|
|
-->
|
|
<title>When regexp is not the best solution</title>
|
|
</head>
|
|
<body lang="en">
|
|
<script type="text/javascript">// <![CDATA[
|
|
document.write('<div id="blackpage"><img src="/Scratch/img/loading.gif" alt="loading..."/></div>');
|
|
// ]]>
|
|
</script>
|
|
|
|
<div id="content">
|
|
|
|
<div id="choix">
|
|
<div class="return"><a href="#entete">↓ Menu ↓</a></div>
|
|
<div id="choixlang">
|
|
<a href="/Scratch/fr/blog/2010-02-23-When-regexp-is-not-the-best-solution/" onclick="setLanguage('fr')">en Français</a>
|
|
</div>
|
|
</div>
|
|
<img src="/Scratch/img/presentation.png" alt="Presentation drawing"/>
|
|
<div id="titre">
|
|
<h1>
|
|
When regexp is not the best solution
|
|
</h1>
|
|
|
|
</div>
|
|
|
|
<div class="flush"></div>
|
|
|
|
|
|
|
|
|
|
|
|
<div class="flush"></div>
|
|
<div id="afterheader">
|
|
<div class="corps">
|
|
<p>Regular expression are really useful. Unfortunately, they are not always the best way of doing things.
|
|
Particularly when transformations you want to make are easy.</p>
|
|
|
|
<p>I wanted to know how to get file extension from filename the fastest way possible. There is 3 natural way of doing this:</p>
|
|
|
|
<div><pre class="twilight">
|
|
<span class="Comment"><span class="Comment">#</span> regexp</span>
|
|
str.<span class="Entity">match</span>(<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp"><span class="StringRegexp"><span class="StringRegexp">[</span>^.<span class="StringRegexp">]</span></span>*$</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>);
|
|
ext<span class="Keyword">=</span><span class="Variable"><span class="Variable">$</span>&</span>
|
|
|
|
<span class="Comment"><span class="Comment">#</span> split</span>
|
|
ext<span class="Keyword">=</span>str.<span class="Entity">split</span>(<span class="String"><span class="String">'</span>.<span class="String">'</span></span>)[<span class="Keyword">-</span><span class="Constant">1</span>]
|
|
|
|
<span class="Comment"><span class="Comment">#</span> File module</span>
|
|
ext<span class="Keyword">=</span><span class="Support">File</span>.<span class="Entity">extname</span>(str)
|
|
</pre></div>
|
|
|
|
<p>At first sight I believed that the regexp should be faster than the split because it could be many <code>.</code> in a filename. But in reality, most of time there is only one dot and I realized the split will be faster. But not the fastest way. There is a function dedicated to this work in the <code>File</code> module.</p>
|
|
|
|
<p>Here is the Benchmark ruby code:</p>
|
|
|
|
<div><div class="code"><div class="file"><a href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/code/regex_benchmark_ext.rb"> ➥ regex_benchmark_ext.rb </a></div><div class="withfile">
|
|
<pre class="twilight">
|
|
<span class="Comment"><span class="Comment">#</span>!/usr/bin/env ruby</span>
|
|
<span class="Keyword">require</span> <span class="String"><span class="String">'</span>benchmark<span class="String">'</span></span>
|
|
n<span class="Keyword">=</span><span class="Constant">80000</span>
|
|
tab<span class="Keyword">=</span>[ <span class="String"><span class="String">'</span>/accounts/user.json<span class="String">'</span></span>,
|
|
<span class="String"><span class="String">'</span>/accounts/user.xml<span class="String">'</span></span>,
|
|
<span class="String"><span class="String">'</span>/user/titi/blog/toto.json<span class="String">'</span></span>,
|
|
<span class="String"><span class="String">'</span>/user/titi/blog/toto.xml<span class="String">'</span></span> ]
|
|
|
|
puts <span class="String"><span class="String">"</span>Get extname<span class="String">"</span></span>
|
|
<span class="Support">Benchmark</span>.<span class="Entity">bm</span> <span class="Keyword">do </span>|<span class="Variable">x</span>|
|
|
x.<span class="Entity">report</span>(<span class="String"><span class="String">"</span>regexp:<span class="String">"</span></span>) { n.<span class="Entity">times</span> <span class="Keyword">do </span>
|
|
str<span class="Keyword">=</span>tab[<span class="Entity">rand</span>(<span class="Constant">4</span>)];
|
|
str.<span class="Entity">match</span>(<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp"><span class="StringRegexp"><span class="StringRegexp">[</span>^.<span class="StringRegexp">]</span></span>*$</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>);
|
|
ext<span class="Keyword">=</span><span class="Variable"><span class="Variable">$</span>&</span>;
|
|
<span class="Keyword">end</span> }
|
|
x.<span class="Entity">report</span>(<span class="String"><span class="String">"</span> split:<span class="String">"</span></span>) { n.<span class="Entity">times</span> <span class="Keyword">do </span>
|
|
str<span class="Keyword">=</span>tab[<span class="Entity">rand</span>(<span class="Constant">4</span>)];
|
|
ext<span class="Keyword">=</span>str.<span class="Entity">split</span>(<span class="String"><span class="String">'</span>.<span class="String">'</span></span>)[<span class="Keyword">-</span><span class="Constant">1</span>] ;
|
|
<span class="Keyword">end</span> }
|
|
x.<span class="Entity">report</span>(<span class="String"><span class="String">"</span> File:<span class="String">"</span></span>) { n.<span class="Entity">times</span> <span class="Keyword">do </span>
|
|
str<span class="Keyword">=</span>tab[<span class="Entity">rand</span>(<span class="Constant">4</span>)];
|
|
ext<span class="Keyword">=</span><span class="Support">File</span>.<span class="Entity">extname</span>(str);
|
|
<span class="Keyword">end</span> }
|
|
<span class="Keyword">end</span>
|
|
</pre>
|
|
</div></div></div>
|
|
|
|
<p>And here is the result</p>
|
|
|
|
<pre class="twilight">
|
|
Get extname
|
|
user system total real
|
|
regexp: 2.550000 0.020000 2.570000 ( 2.693407)
|
|
split: 1.080000 0.050000 1.130000 ( 1.190408)
|
|
File: 0.640000 0.030000 0.670000 ( 0.717748)
|
|
</pre>
|
|
|
|
<p>Conclusion of this benchmark, dedicated function are better than your way of doing stuff (most of time).</p>
|
|
|
|
<h2 id="file-path-without-the-extension">file path without the extension.</h2>
|
|
|
|
<div><div class="code"><div class="file"><a href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/code/regex_benchmark_strip.rb"> ➥ regex_benchmark_strip.rb </a></div><div class="withfile">
|
|
<pre class="twilight">
|
|
<span class="Comment"><span class="Comment">#</span>!/usr/bin/env ruby</span>
|
|
<span class="Keyword">require</span> <span class="String"><span class="String">'</span>benchmark<span class="String">'</span></span>
|
|
n<span class="Keyword">=</span><span class="Constant">80000</span>
|
|
tab<span class="Keyword">=</span>[ <span class="String"><span class="String">'</span>/accounts/user.json<span class="String">'</span></span>,
|
|
<span class="String"><span class="String">'</span>/accounts/user.xml<span class="String">'</span></span>,
|
|
<span class="String"><span class="String">'</span>/user/titi/blog/toto.json<span class="String">'</span></span>,
|
|
<span class="String"><span class="String">'</span>/user/titi/blog/toto.xml<span class="String">'</span></span> ]
|
|
|
|
puts <span class="String"><span class="String">"</span>remove extension<span class="String">"</span></span>
|
|
<span class="Support">Benchmark</span>.<span class="Entity">bm</span> <span class="Keyword">do </span>|<span class="Variable">x</span>|
|
|
x.<span class="Entity">report</span>(<span class="String"><span class="String">"</span> File:<span class="String">"</span></span>) { n.<span class="Entity">times</span> <span class="Keyword">do </span>
|
|
str<span class="Keyword">=</span>tab[<span class="Entity">rand</span>(<span class="Constant">4</span>)];
|
|
path<span class="Keyword">=</span><span class="Support">File</span>.<span class="Entity">expand_path</span>(str,<span class="Support">File</span>.<span class="Entity">basename</span>(str,<span class="Support">File</span>.<span class="Entity">extname</span>(str)));
|
|
<span class="Keyword">end</span> }
|
|
x.<span class="Entity">report</span>(<span class="String"><span class="String">"</span>chomp:<span class="String">"</span></span>) { n.<span class="Entity">times</span> <span class="Keyword">do </span>
|
|
str<span class="Keyword">=</span>tab[<span class="Entity">rand</span>(<span class="Constant">4</span>)];
|
|
ext<span class="Keyword">=</span><span class="Support">File</span>.<span class="Entity">extname</span>(str);
|
|
path<span class="Keyword">=</span>str.<span class="Entity">chomp</span>(ext);
|
|
<span class="Keyword">end</span> }
|
|
<span class="Keyword">end</span>
|
|
</pre>
|
|
</div></div></div>
|
|
|
|
<p>and here is the result:</p>
|
|
|
|
<pre class="twilight">
|
|
remove extension
|
|
user system total real
|
|
File: 0.970000 0.060000 1.030000 ( 1.081398)
|
|
chomp: 0.820000 0.040000 0.860000 ( 0.947432)
|
|
</pre>
|
|
|
|
<p>Conclusion of the second benchmark. One simple function is better than three dedicated functions. No surprise, but it is good to know.</p>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div id="choixrss">
|
|
<a id="rss" href="http://feeds.feedburner.com/yannespositocomen">
|
|
Subscribe
|
|
</a>
|
|
</div>
|
|
<script type="text/javascript">
|
|
$(document).ready(function(){
|
|
$('#comment').hide();
|
|
$('#clickcomment').click(showComments);
|
|
});
|
|
function showComments() {
|
|
$('#comment').show();
|
|
$('#clickcomment').fadeOut();
|
|
}
|
|
document.write('<div id="clickcomment">Comments</div>');
|
|
</script>
|
|
<div class="flush"></div>
|
|
<div class="corps" id="comment">
|
|
<h2 class="first">comments</h2>
|
|
<noscript>
|
|
You must enable javascript to comment.
|
|
</noscript>
|
|
|
|
<script type="text/javascript">
|
|
var idcomments_acct = 'a307f0044511ff1b5cfca573fc0a52e7';
|
|
var idcomments_post_id = '/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/';
|
|
var idcomments_post_url = 'http://yannesposito.com/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/';
|
|
</script>
|
|
<span id="IDCommentsPostTitle" style="display:none"></span>
|
|
<script type='text/javascript' src='/Scratch/js/genericCommentWrapperV2.js'></script>
|
|
|
|
</div>
|
|
|
|
<div id="entete" class="corps_spaced">
|
|
<div id="liens">
|
|
<ul><li><a href="/Scratch/en/">Homepage</a></li>
|
|
<li><a href="/Scratch/en/blog/">Blog</a></li>
|
|
<li><a href="/Scratch/en/softwares/">Softwares</a></li>
|
|
<li><a href="/Scratch/en/about/">About</a></li></ul>
|
|
</div>
|
|
<div class="flush"></div>
|
|
<hr/>
|
|
<div id="next_before_articles">
|
|
<div id="previous_articles">
|
|
previous entries
|
|
|
|
<div class="previous_article">
|
|
<a href="/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/"><span class="nicer">«</span> split a file by keyword</a>
|
|
</div>
|
|
|
|
|
|
<div class="previous_article">
|
|
<a href="/Scratch/en/blog/2010-02-16-All-but-something-regexp--2-/"><span class="nicer">«</span> Pragmatic Regular Expression Exclude (2)</a>
|
|
</div>
|
|
|
|
|
|
<div class="previous_article">
|
|
<a href="/Scratch/en/blog/2010-02-15-All-but-something-regexp/"><span class="nicer">«</span> Pragmatic Regular Expression Exclude</a>
|
|
</div>
|
|
|
|
|
|
</div>
|
|
<div id="next_articles">
|
|
next entries
|
|
|
|
<div class="next_article">
|
|
<a href="/Scratch/en/blog/2010-03-22-Git-Tips/">Git Tips <span class="nicer">»</span></a>
|
|
</div>
|
|
|
|
|
|
<div class="next_article">
|
|
<a href="/Scratch/en/blog/2010-03-23-Encapsulate-git/">Encapsulate git <span class="nicer">»</span></a>
|
|
</div>
|
|
|
|
|
|
<div class="next_article">
|
|
<a href="/Scratch/en/blog/2010-05-17-at-least-this-blog-revive/">I live again! <span class="nicer">»</span></a>
|
|
</div>
|
|
|
|
|
|
</div>
|
|
<div class="flush"></div>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
<div id="bottom">
|
|
<div>
|
|
<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Copyright ©, Yann Esposito</a>
|
|
</div>
|
|
<div id="lastmod">
|
|
Created: 02/23/2010
|
|
Modified: 05/09/2010
|
|
</div>
|
|
<div>
|
|
Entirely done with
|
|
<a href="http://www.vim.org">Vim</a>
|
|
and
|
|
<a href="http://nanoc.stoneship.org">nanoc</a>
|
|
</div>
|
|
<div>
|
|
<a href="/Scratch/en/validation/">Validation</a>
|
|
<a href="http://validator.w3.org/check?uri=referer"> [xhtml] </a>
|
|
.
|
|
<a href="http://jigsaw.w3.org/css-validator/check/referer?profile=css3"> [css] </a>
|
|
.
|
|
<a href="http://validator.w3.org/feed/check.cgi?url=http%3A//yannesposito.com/Scratch/en/blog/feed/feed.xml">[rss]</a>
|
|
</div>
|
|
</div>
|
|
<div class="clear"></div>
|
|
</div>
|
|
</body>
|
|
</html> |