scratch/output/Scratch/en/blog/2010-02-15-All-but-something-regexp/index.html

241 lines
14 KiB
HTML
Raw Normal View History

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="keywords" content="regexp, regular expression">
<link rel="shortcut icon" type="image/x-icon" href="/Scratch/img/favicon.ico" />
<link rel="stylesheet" type="text/css" href="/Scratch/assets/css/main.css" />
<link rel="stylesheet" type="text/css" href="/Scratch/css/twilight.css" />
<link rel="stylesheet" type="text/css" href="/Scratch/css/idc.css" />
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://feeds.feedburner.com/yannespositocomen"/>
<link rel="alternate" lang="fr" xml:lang="fr" title="Expression régulière pour tout sauf quelquechose" type="text/html" hreflang="fr" href="/Scratch/fr/blog/2010-02-15-All-but-something-regexp/" />
<link rel="alternate" lang="en" xml:lang="en" title="Pragmatic Regular Expression Exclude" type="text/html" hreflang="en" href="/Scratch/en/blog/2010-02-15-All-but-something-regexp/" />
<script type="text/javascript" src="/Scratch/js/jquery-1.3.1.min.js"></script>
<script type="text/javascript" src="/Scratch/js/jquery.cookie.js"></script>
<script type="text/javascript" src="/Scratch/js/index.js"></script>
<title>Pragmatic Regular Expression Exclude</title>
</head>
<body lang="en">
<script type="text/javascript">// <![CDATA[
document.write('<div id="blackpage"><img src="/Scratch/img/loading.gif" alt="loading..."/></div>');
// ]]>
</script>
2010-09-26 21:07:23 +00:00
<div id="choix">
<div class="return"><a href="#entete">&darr; Menu &darr;</a></div>
<div id="choixlang">
<a href="/Scratch/fr/blog/2010-02-15-All-but-something-regexp/" onclick="setLanguage('fr')">en Français</a>
</div>
</div>
<div id="content">
<div id="titre">
<h1>
Pragmatic Regular Expression Exclude
</h1>
</div>
<div class="flush"></div>
<div class="flush"></div>
<div id="afterheader">
<div class="corps">
<p>Sometimes you cannot simply write:</p>
<div><pre class="twilight">
<span class="Keyword">if</span> str.<span class="Entity">match</span>(regexp) <span class="Keyword">and</span>
<span class="Keyword">not</span> str.<span class="Entity">match</span>(other_regexp)
do_something
</pre></div>
<p>and you have to make this behaviour with only one regular expression. The problem is the complementary of regular languages is not regular. Then, for some expression it is absolutely not impossible.</p>
<p>But sometimes with some simple regular expression it should be possible<sup><a href="#note1">&dagger;</a></sup>. Say you want to match everything containing the some word say <code>bull</code> but don&rsquo;t want to match <code>bullshit</code>. Here is a nice way to do that:</p>
<div><pre class="twilight">
<span class="Comment"><span class="Comment">#</span> match all string containing 'bull' (bullshit comprised)</span>
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">bull</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
<span class="Comment"><span class="Comment">#</span> match all string containing 'bull' except 'bullshit'</span>
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">bull<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^s<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|</span>
<span class="StringRegexp">bulls<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^h<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|</span>
<span class="StringRegexp">bullsh<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^i<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|</span>
<span class="StringRegexp">bullshi<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^t<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span></span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
<span class="Comment"><span class="Comment">#</span> another way to write it would be</span>
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">bull<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^s<span class="StringRegexp">]</span></span>|$|s<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^h<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|sh<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^i<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|shi<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^t<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span><span class="StringRegexp">)</span></span></span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
</pre></div>
<p>Let look closer. In the first line the expression is:
<code>bull([^s]|$)</code>, why does the <code>$</code> is needed?
Because, without it the word <code>bull</code> would be no more matched. This expression means:</p>
<blockquote>
<p>The string finish by <code>bull</code> <br />
or, <br />
contains <code>bull</code> followed by a letter different from <code>s</code>. </p>
</blockquote>
<p>And this is it. I hope it could help you.</p>
<p>Notice this method is not always the best. For example try to write a regular expression equivalent to the following conditional expression:</p>
<div><pre class="twilight">
<span class="Comment"><span class="Comment">#</span> Begin with 'a': ^a</span>
<span class="Comment"><span class="Comment">#</span> End with 'a': c$</span>
<span class="Comment"><span class="Comment">#</span> Contain 'b': .*b.*</span>
<span class="Comment"><span class="Comment">#</span> But isn't 'axbxc'</span>
<span class="Keyword">if</span> str.<span class="Entity">match</span>(<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">^a.*b.*c$</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>) <span class="Keyword">and</span>
<span class="Keyword">not</span> str.<span class="Entity">match</span>(<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">^axbxc$</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>)
do_something
<span class="Keyword">end</span>
</pre></div>
<p>A nice solution is:</p>
<div><pre class="twilight">
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">abc| <span class="Comment"><span class="Comment">#</span> length 3</span></span>
<span class="StringRegexp">a.bc| <span class="Comment"><span class="Comment">#</span> length 4</span></span>
<span class="StringRegexp">ab.c|</span>
<span class="StringRegexp">a<span class="StringRegexp"><span class="StringRegexp">[</span>^x<span class="StringRegexp">]</span></span>b<span class="StringRegexp"><span class="StringRegexp">[</span>^x<span class="StringRegexp">]</span></span>c| <span class="Comment"><span class="Comment">#</span> length 5</span></span>
<span class="StringRegexp">a...*b.*c| # length &gt;5</span>
<span class="StringRegexp">a.*b...*c</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
</pre></div>
<p>This solution uses the maximal length of the string not to be matched.
There certainly exists many other methods. But the important lesson is
it is not straightforward to exclude something of a regular expression.</p>
<hr />
<p><small><a name="note1">&dagger;</a>
It can be proved that any regular set minus a finite set is also regular.
</small></p>
</div>
<div id="choixrss">
<a id="rss" href="http://feeds.feedburner.com/yannespositocomen">
Subscribe
</a>
</div>
<script type="text/javascript">
$(document).ready(function(){
$('#comment').hide();
$('#clickcomment').click(showComments);
});
function showComments() {
$('#comment').show();
$('#clickcomment').fadeOut();
}
document.write('<div id="clickcomment">Comments</div>');
</script>
<div class="flush"></div>
<div class="corps" id="comment">
<h2 class="first">comments</h2>
<noscript>
Vous devez activer javascript pour commenter.
</noscript>
<script type="text/javascript">
var idcomments_acct = 'a307f0044511ff1b5cfca573fc0a52e7';
var idcomments_post_id = '/Scratch/en/blog/2010-02-15-All-but-something-regexp/';
var idcomments_post_url = 'http://yannesposito.com/Scratch/en/blog/2010-02-15-All-but-something-regexp/';
</script>
<span id="IDCommentsPostTitle" style="display:none"></span>
<script type='text/javascript' src='/Scratch/js/genericCommentWrapperV2.js'></script>
</div>
<div id="entete" class="corps_spaced">
<div id="liens">
<ul><li><a href="/Scratch/en/">Homepage</a></li>
<li><a href="/Scratch/en/blog/">Blog</a></li>
<li><a href="/Scratch/en/about/">About</a></li>
<li><a href="/Scratch/en/contact/">Contact</a></li></ul>
</div>
<div class="flush"></div>
<hr/>
<div id="next_before_articles">
<div id="previous_articles">
previous entries
<div class="previous_article">
<a href="/Scratch/en/blog/2010-01-12-antialias-font-in-Firefox-under-Ubuntu/">&larr; antialias font in Firefox under Ubuntu</a>
</div>
<div class="previous_article">
<a href="/Scratch/en/blog/2010-01-04-Change-default-shell-on-Mac-OS-X/">&larr; Change default shell on Mac OS X</a>
</div>
<div class="previous_article">
<a href="/Scratch/en/blog/2009-12-14-Git-vs--Bzr/">&larr; Git vs. Bzr</a>
</div>
</div>
<div id="next_articles">
next entries
<div class="next_article">
<a href="/Scratch/en/blog/2010-02-16-All-but-something-regexp--2-/">Pragmatic Regular Expression Exclude (2)&rarr; </a>
</div>
<div class="next_article">
<a href="/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/">split a file by keyword&rarr; </a>
</div>
<div class="next_article">
<a href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/">When regexp is not the best solution&rarr; </a>
</div>
</div>
<div class="flush"></div>
</div>
</div>
<div id="bottom">
<div>
<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Copyright ©, Yann Esposito</a>
</div>
<div id="lastmod">
2010-08-31 13:06:43 +00:00
Created: 02/15/2010
2010-09-02 09:51:46 +00:00
Modified: 05/09/2010
</div>
<div>
Entirely done with
<a href="http://www.vim.org">Vim</a>
and
<a href="http://nanoc.stoneship.org">nanoc</a>
</div>
<div>
<a href="/Scratch/en/validation/">Validation</a>
<a href="http://validator.w3.org/check?uri=referer"> [xhtml] </a>
.
<a href="http://jigsaw.w3.org/css-validator/check/referer?profile=css3"> [css] </a>
.
<a href="http://validator.w3.org/feed/check.cgi?url=http%3A//yannesposito.com/Scratch/en/blog/feed/feed.xml">[rss]</a>
</div>
</div>
<div class="clear"></div>
</div>
</body>
</html>