294 lines
No EOL
12 KiB
HTML
294 lines
No EOL
12 KiB
HTML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|
|
|
|
|
<meta name="keywords" content="programming, regexp, regular expression, extension, file">
|
|
|
|
<link rel="shortcut icon" type="image/x-icon" href="/Scratch/img/favicon.ico" />
|
|
<link rel="stylesheet" type="text/css" href="/Scratch/assets/css/main.css" />
|
|
<link rel="stylesheet" type="text/css" href="/Scratch/css/solarized.css" />
|
|
<link rel="stylesheet" type="text/css" href="/Scratch/css/idc.css" />
|
|
<link href='http://fonts.googleapis.com/css?family=Inconsolata' rel='stylesheet' type='text/css'>
|
|
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://feeds.feedburner.com/yannespositocomen"/>
|
|
|
|
<link rel="alternate" lang="fr" xml:lang="fr" title="Quand se passer des expressions régulières ?" type="text/html" hreflang="fr" href="/Scratch/fr/blog/2010-02-23-When-regexp-is-not-the-best-solution/" />
|
|
<link rel="alternate" lang="en" xml:lang="en" title="When regexp is not the best solution" type="text/html" hreflang="en" href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/" />
|
|
<script type="text/javascript" src="/Scratch/js/jquery-1.3.1.min.js"></script>
|
|
<script type="text/javascript" src="/Scratch/js/jquery.cookie.js"></script>
|
|
<script type="text/javascript" src="/Scratch/js/index.js"></script>
|
|
<script type="text/javascript" src="/Scratch/js/highlight/highlight.pack.js"></script>
|
|
<script type="text/javascript" src="/Scratch/js/article.js"></script>
|
|
<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
|
|
<!--[if lt IE 9]>
|
|
<script src="http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js"></script>
|
|
<![endif]-->
|
|
<title>When regexp is not the best solution</title>
|
|
</head>
|
|
<body lang="en" class="article">
|
|
<script type="text/javascript">// <![CDATA[
|
|
document.write('<div id="blackpage"><img src="/Scratch/img/loading.gif" alt="loading..."/></div>');
|
|
// ]]>
|
|
</script>
|
|
|
|
<div id="content">
|
|
<div id="choix">
|
|
<div class="return"><a href="#entete">↓ Menu ↓</a></div>
|
|
<div id="choixlang"><a href="/Scratch/fr/blog/2010-02-23-When-regexp-is-not-the-best-solution/" onclick="setLanguage('fr')">en Français</a>
|
|
</div>
|
|
<div class="flush"></div>
|
|
</div>
|
|
<div id="titre">
|
|
<h1>
|
|
When regexp is not the best solution
|
|
</h1>
|
|
|
|
</div>
|
|
<div class="flush"></div>
|
|
|
|
|
|
|
|
|
|
|
|
<div class="flush"></div>
|
|
<div id="afterheader">
|
|
<div class="corps">
|
|
<p>Regular expression are really useful. Unfortunately, they are not always the best way of doing things.
|
|
Particularly when transformations you want to make are easy.</p>
|
|
|
|
<p>I wanted to know how to get file extension from filename the fastest way possible. There is 3 natural way of doing this:</p>
|
|
|
|
<div>
|
|
|
|
<pre><code class="ruby"># regexp
|
|
str.match(/[^.]*$/);
|
|
ext=$&
|
|
|
|
# split
|
|
ext=str.split('.')[-1]
|
|
|
|
# File module
|
|
ext=File.extname(str)
|
|
</code></pre>
|
|
|
|
</div>
|
|
|
|
<p>At first sight I believed that the regexp should be faster than the split because it could be many <code>.</code> in a filename. But in reality, most of time there is only one dot and I realized the split will be faster. But not the fastest way. There is a function dedicated to this work in the <code>File</code> module.</p>
|
|
|
|
<p>Here is the Benchmark ruby code:</p>
|
|
|
|
<div><div class="codefile"><a href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/code/regex_benchmark_ext.rb">➥ regex_benchmark_ext.rb</a></div>
|
|
|
|
|
|
<pre><code class="ruby">#!/usr/bin/env ruby
|
|
require 'benchmark'
|
|
n=80000
|
|
tab=[ '/accounts/user.json',
|
|
'/accounts/user.xml',
|
|
'/user/titi/blog/toto.json',
|
|
'/user/titi/blog/toto.xml' ]
|
|
|
|
puts "Get extname"
|
|
Benchmark.bm do |x|
|
|
x.report("regexp:") { n.times do
|
|
str=tab[rand(4)];
|
|
str.match(/[^.]*$/);
|
|
ext=$&;
|
|
end }
|
|
x.report(" split:") { n.times do
|
|
str=tab[rand(4)];
|
|
ext=str.split('.')[-1] ;
|
|
end }
|
|
x.report(" File:") { n.times do
|
|
str=tab[rand(4)];
|
|
ext=File.extname(str);
|
|
end }
|
|
end
|
|
</code></pre>
|
|
|
|
</div>
|
|
|
|
<p>And here is the result</p>
|
|
|
|
<pre class="twilight">
|
|
Get extname
|
|
user system total real
|
|
regexp: 2.550000 0.020000 2.570000 ( 2.693407)
|
|
split: 1.080000 0.050000 1.130000 ( 1.190408)
|
|
File: 0.640000 0.030000 0.670000 ( 0.717748)
|
|
</pre>
|
|
|
|
<p>Conclusion of this benchmark, dedicated function are better than your way of doing stuff (most of time).</p>
|
|
|
|
<h2 id="file-path-without-the-extension">file path without the extension.</h2>
|
|
|
|
<div><div class="codefile"><a href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/code/regex_benchmark_strip.rb">➥ regex_benchmark_strip.rb</a></div>
|
|
|
|
|
|
<pre><code class="ruby">#!/usr/bin/env ruby
|
|
require 'benchmark'
|
|
n=80000
|
|
tab=[ '/accounts/user.json',
|
|
'/accounts/user.xml',
|
|
'/user/titi/blog/toto.json',
|
|
'/user/titi/blog/toto.xml' ]
|
|
|
|
puts "remove extension"
|
|
Benchmark.bm do |x|
|
|
x.report(" File:") { n.times do
|
|
str=tab[rand(4)];
|
|
path=File.expand_path(str,File.basename(str,File.extname(str)));
|
|
end }
|
|
x.report("chomp:") { n.times do
|
|
str=tab[rand(4)];
|
|
ext=File.extname(str);
|
|
path=str.chomp(ext);
|
|
end }
|
|
end
|
|
</code></pre>
|
|
|
|
</div>
|
|
|
|
<p>and here is the result:</p>
|
|
|
|
<pre class="twilight">
|
|
remove extension
|
|
user system total real
|
|
File: 0.970000 0.060000 1.030000 ( 1.081398)
|
|
chomp: 0.820000 0.040000 0.860000 ( 0.947432)
|
|
</pre>
|
|
|
|
<p>Conclusion of the second benchmark. One simple function is better than three dedicated functions. No surprise, but it is good to know.</p>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div id="social">
|
|
<div class="left"> <a href="https://twitter.com/share" class="twitter-share-button" data-via="yogsototh">Tweet</a>
|
|
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
|
|
</div>
|
|
<div class="left"> <div class="g-plusone" data-size="medium" data-annotation="inline" data-width="106"></div>
|
|
<script type="text/javascript">
|
|
(function() {
|
|
var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true;
|
|
po.src = 'https://apis.google.com/js/plusone.js';
|
|
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
|
|
})();
|
|
</script>
|
|
</div>
|
|
<div class="flush"></div>
|
|
</div>
|
|
<div id="choixrss">
|
|
<a id="rss" href="http://feeds.feedburner.com/yannespositocomen">
|
|
Subscribe
|
|
</a>
|
|
</div>
|
|
<script type="text/javascript">
|
|
$(document).ready(function(){
|
|
$('#comment').hide();
|
|
$('#clickcomment').click(showComments);
|
|
});
|
|
function showComments() {
|
|
$('#comment').show();
|
|
$('#clickcomment').fadeOut();
|
|
}
|
|
document.write('<div id="clickcomment">Comments & Share</div>');
|
|
</script>
|
|
<div class="flush"></div>
|
|
|
|
<div class="corps" id="comment">
|
|
<h2 class="first">comments</h2>
|
|
<noscript>
|
|
You must enable javascript to comment.
|
|
</noscript>
|
|
|
|
<script type="text/javascript">
|
|
var idcomments_acct = 'a307f0044511ff1b5cfca573fc0a52e7';
|
|
var idcomments_post_id = '/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/';
|
|
var idcomments_post_url = 'http://yannesposito.com/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/';
|
|
</script>
|
|
<span id="IDCommentsPostTitle" style="display:none"></span>
|
|
<script type='text/javascript' src='/Scratch/js/genericCommentWrapperV2.js'></script>
|
|
|
|
</div>
|
|
|
|
<div id="entete" class="corps_spaced">
|
|
<div id="liens">
|
|
<ul><li><a href="/Scratch/en/">Home</a></li>
|
|
<li><a href="/Scratch/en/blog/">Blog</a></li>
|
|
<li><a href="/Scratch/en/softwares/">Softwares</a></li>
|
|
<li><a href="/Scratch/en/about/">About</a></li></ul>
|
|
</div>
|
|
<div class="flush"></div>
|
|
<hr/>
|
|
<div id="next_before_articles">
|
|
<div id="previous_articles">
|
|
previous entries
|
|
|
|
<div class="previous_article">
|
|
<a href="/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/"><span class="nicer">«</span> split a file by keyword</a>
|
|
</div>
|
|
|
|
|
|
<div class="previous_article">
|
|
<a href="/Scratch/en/blog/2010-02-16-All-but-something-regexp--2-/"><span class="nicer">«</span> Pragmatic Regular Expression Exclude (2)</a>
|
|
</div>
|
|
|
|
|
|
<div class="previous_article">
|
|
<a href="/Scratch/en/blog/2010-02-15-All-but-something-regexp/"><span class="nicer">«</span> Pragmatic Regular Expression Exclude</a>
|
|
</div>
|
|
|
|
|
|
</div>
|
|
<div id="next_articles">
|
|
next entries
|
|
|
|
<div class="next_article">
|
|
<a href="/Scratch/en/blog/2010-03-22-Git-Tips/">Git Tips <span class="nicer">»</span></a>
|
|
</div>
|
|
|
|
|
|
<div class="next_article">
|
|
<a href="/Scratch/en/blog/2010-03-23-Encapsulate-git/">Encapsulate git <span class="nicer">»</span></a>
|
|
</div>
|
|
|
|
|
|
<div class="next_article">
|
|
<a href="/Scratch/en/blog/2010-05-17-at-least-this-blog-revive/">I live again! <span class="nicer">»</span></a>
|
|
</div>
|
|
|
|
|
|
</div>
|
|
<div class="flush"></div>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
<div id="bottom">
|
|
<div>
|
|
<a href="https://twitter.com/yogsototh">Follow @yogsototh</a>
|
|
</div>
|
|
<div>
|
|
<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Copyright ©, Yann Esposito</a>
|
|
</div>
|
|
<div id="lastmod">
|
|
Created: 02/23/2010
|
|
Modified: 05/09/2010
|
|
</div>
|
|
<div>
|
|
Entirely done with
|
|
<a href="http://www.vim.org">Vim</a>
|
|
and
|
|
<a href="http://nanoc.stoneship.org">nanoc</a>
|
|
</div>
|
|
</div>
|
|
<div class="clear"></div>
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html> |