scratch/output/Scratch/en/blog/2010-02-15-All-but-something-regexp/index.html

256 lines
15 KiB
HTML
Raw Normal View History

2011-04-20 12:29:01 +00:00
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
2012-01-11 20:40:22 +00:00
<meta name="keywords" content="regex, regexp, regular expression, negate">
2011-04-20 12:29:01 +00:00
<link rel="shortcut icon" type="image/x-icon" href="/Scratch/img/favicon.ico" />
<link rel="stylesheet" type="text/css" href="/Scratch/assets/css/main.css" />
2012-04-02 21:43:39 +00:00
<link rel="stylesheet" type="text/css" href="/Scratch/css/solarized.css" />
<link rel="stylesheet" type="text/css" href="/Scratch/css/idc.css" />
2011-04-20 12:29:01 +00:00
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://feeds.feedburner.com/yannespositocomen"/>
<link rel="alternate" lang="fr" xml:lang="fr" title="Expression régulière pour tout sauf quelquechose" type="text/html" hreflang="fr" href="/Scratch/fr/blog/2010-02-15-All-but-something-regexp/" />
<link rel="alternate" lang="en" xml:lang="en" title="Pragmatic Regular Expression Exclude" type="text/html" hreflang="en" href="/Scratch/en/blog/2010-02-15-All-but-something-regexp/" />
<script type="text/javascript" src="/Scratch/js/jquery-1.3.1.min.js"></script>
<script type="text/javascript" src="/Scratch/js/jquery.cookie.js"></script>
<script type="text/javascript" src="/Scratch/js/index.js"></script>
2011-04-20 12:29:01 +00:00
<!--[if lt IE 9]>
<script src="http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js"></script>
<![endif]-->
<title>Pragmatic Regular Expression Exclude</title>
</head>
2011-10-18 22:30:00 +00:00
<body lang="en" class="article">
2011-04-20 12:29:01 +00:00
<script type="text/javascript">// <![CDATA[
document.write('<div id="blackpage"><img src="/Scratch/img/loading.gif" alt="loading..."/></div>');
2011-04-20 12:29:01 +00:00
// ]]>
</script>
<div id="content">
<div id="choix">
<div class="return"><a href="#entete">&darr; Menu &darr;</a></div>
<div id="choixlang">
<a href="/Scratch/fr/blog/2010-02-15-All-but-something-regexp/" onclick="setLanguage('fr')">en Français</a>
2011-04-20 12:29:01 +00:00
</div>
2011-09-28 16:05:55 +00:00
<div class="flush"></div>
2011-04-20 12:29:01 +00:00
</div>
<div id="titre">
<h1>
Pragmatic Regular Expression Exclude
</h1>
</div>
<div class="flush"></div>
<div class="flush"></div>
<div id="afterheader">
<div class="corps">
<p>Sometimes you cannot simply write:</p>
<div><pre class="twilight">
<span class="Keyword">if</span> str.<span class="Entity">match</span>(regexp) <span class="Keyword">and</span>
<span class="Keyword">not</span> str.<span class="Entity">match</span>(other_regexp)
do_something
</pre></div>
2012-01-11 20:40:22 +00:00
<p>and you have to make this behaviour with only one regular expression.
But, there exists a major problem: the complementary of a regular language might not be regular.
Then, for some expression it is absolutely impossible to negate a regular expression.</p>
2011-04-20 12:29:01 +00:00
2012-02-20 14:41:09 +00:00
<p>But sometimes with some simple regular expression it should be possible<sup><a href="#note1">&dagger;</a></sup>. Say you want to match everything containing the some word say <code>bull</code> but don&rsquo;t want to match <code>bullshit</code>. Here is a nice way to do that:</p>
2011-04-20 12:29:01 +00:00
<div><pre class="twilight">
<span class="Comment"><span class="Comment">#</span> match all string containing 'bull' (bullshit comprised)</span>
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">bull</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
<span class="Comment"><span class="Comment">#</span> match all string containing 'bull' except 'bullshit'</span>
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">bull<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^s<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|</span>
<span class="StringRegexp">bulls<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^h<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|</span>
<span class="StringRegexp">bullsh<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^i<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|</span>
<span class="StringRegexp">bullshi<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^t<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span></span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
<span class="Comment"><span class="Comment">#</span> another way to write it would be</span>
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">bull<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^s<span class="StringRegexp">]</span></span>|$|s<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^h<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|sh<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^i<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|shi<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^t<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span><span class="StringRegexp">)</span></span></span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
</pre></div>
<p>Let look closer. In the first line the expression is:
<code>bull([^s]|$)</code>, why does the <code>$</code> is needed?
Because, without it the word <code>bull</code> would be no more matched. This expression means:</p>
<blockquote>
<p>The string finish by <code>bull</code> <br />
or, <br />
contains <code>bull</code> followed by a letter different from <code>s</code>. </p>
</blockquote>
<p>And this is it. I hope it could help you.</p>
<p>Notice this method is not always the best. For example try to write a regular expression equivalent to the following conditional expression:</p>
<div><pre class="twilight">
<span class="Comment"><span class="Comment">#</span> Begin with 'a': ^a</span>
<span class="Comment"><span class="Comment">#</span> End with 'a': c$</span>
<span class="Comment"><span class="Comment">#</span> Contain 'b': .*b.*</span>
<span class="Comment"><span class="Comment">#</span> But isn't 'axbxc'</span>
<span class="Keyword">if</span> str.<span class="Entity">match</span>(<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">^a.*b.*c$</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>) <span class="Keyword">and</span>
<span class="Keyword">not</span> str.<span class="Entity">match</span>(<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">^axbxc$</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>)
do_something
<span class="Keyword">end</span>
</pre></div>
<p>A nice solution is:</p>
<div><pre class="twilight">
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">abc| <span class="Comment"><span class="Comment">#</span> length 3</span></span>
<span class="StringRegexp">a.bc| <span class="Comment"><span class="Comment">#</span> length 4</span></span>
<span class="StringRegexp">ab.c|</span>
<span class="StringRegexp">a<span class="StringRegexp"><span class="StringRegexp">[</span>^x<span class="StringRegexp">]</span></span>b<span class="StringRegexp"><span class="StringRegexp">[</span>^x<span class="StringRegexp">]</span></span>c| <span class="Comment"><span class="Comment">#</span> length 5</span></span>
<span class="StringRegexp">a...*b.*c| # length &gt;5</span>
<span class="StringRegexp">a.*b...*c</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
</pre></div>
<p>This solution uses the maximal length of the string not to be matched.
There certainly exists many other methods. But the important lesson is
it is not straightforward to exclude something of a regular expression.</p>
<hr />
2012-02-20 14:41:09 +00:00
<p><small><a name="note1">&dagger;</a>
2011-04-20 12:29:01 +00:00
It can be proved that any regular set minus a finite set is also regular.
</small></p>
</div>
2012-04-10 13:56:34 +00:00
<div id="social">
<div class="left"> <a href="https://twitter.com/share" class="twitter-share-button" data-via="yogsototh">Tweet</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
</div>
<div class="left"> <div class="g-plusone" data-size="medium" data-annotation="inline" data-width="106"></div>
<script type="text/javascript">
(function() {
var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true;
po.src = 'https://apis.google.com/js/plusone.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
})();
</script>
</div>
<div class="flush"></div>
</div>
2011-04-20 12:29:01 +00:00
<div id="choixrss">
<a id="rss" href="http://feeds.feedburner.com/yannespositocomen">
Subscribe
</a>
</div>
<script type="text/javascript">
$(document).ready(function(){
$('#comment').hide();
$('#clickcomment').click(showComments);
});
function showComments() {
$('#comment').show();
$('#clickcomment').fadeOut();
}
2012-04-10 13:56:34 +00:00
document.write('<div id="clickcomment">Comments &amp; Share</div>');
2011-04-20 12:29:01 +00:00
</script>
<div class="flush"></div>
2012-04-10 13:56:34 +00:00
2011-04-20 12:29:01 +00:00
<div class="corps" id="comment">
<h2 class="first">comments</h2>
<noscript>
You must enable javascript to comment.
</noscript>
<script type="text/javascript">
var idcomments_acct = 'a307f0044511ff1b5cfca573fc0a52e7';
var idcomments_post_id = '/Scratch/en/blog/2010-02-15-All-but-something-regexp/';
var idcomments_post_url = 'http://yannesposito.com/Scratch/en/blog/2010-02-15-All-but-something-regexp/';
2011-04-20 12:29:01 +00:00
</script>
<span id="IDCommentsPostTitle" style="display:none"></span>
<script type='text/javascript' src='/Scratch/js/genericCommentWrapperV2.js'></script>
2011-04-20 12:29:01 +00:00
</div>
<div id="entete" class="corps_spaced">
<div id="liens">
<ul><li><a href="/Scratch/en/">Home</a></li>
<li><a href="/Scratch/en/blog/">Blog</a></li>
<li><a href="/Scratch/en/softwares/">Softwares</a></li>
<li><a href="/Scratch/en/about/">About</a></li></ul>
2011-04-20 12:29:01 +00:00
</div>
<div class="flush"></div>
<hr/>
<div id="next_before_articles">
<div id="previous_articles">
previous entries
<div class="previous_article">
<a href="/Scratch/en/blog/2010-01-12-antialias-font-in-Firefox-under-Ubuntu/"><span class="nicer">«</span>&nbsp;antialias font in Firefox under Ubuntu</a>
2011-04-20 12:29:01 +00:00
</div>
<div class="previous_article">
<a href="/Scratch/en/blog/2010-01-04-Change-default-shell-on-Mac-OS-X/"><span class="nicer">«</span>&nbsp;Change default shell on Mac OS X</a>
2011-04-20 12:29:01 +00:00
</div>
<div class="previous_article">
<a href="/Scratch/en/blog/2009-12-14-Git-vs--Bzr/"><span class="nicer">«</span>&nbsp;Git vs. Bzr</a>
2011-04-20 12:29:01 +00:00
</div>
</div>
<div id="next_articles">
next entries
<div class="next_article">
<a href="/Scratch/en/blog/2010-02-16-All-but-something-regexp--2-/">Pragmatic Regular Expression Exclude (2)&nbsp;<span class="nicer">»</span></a>
2011-04-20 12:29:01 +00:00
</div>
<div class="next_article">
<a href="/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/">split a file by keyword&nbsp;<span class="nicer">»</span></a>
2011-04-20 12:29:01 +00:00
</div>
<div class="next_article">
<a href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/">When regexp is not the best solution&nbsp;<span class="nicer">»</span></a>
2011-04-20 12:29:01 +00:00
</div>
</div>
<div class="flush"></div>
</div>
</div>
<div id="bottom">
2012-04-02 21:43:39 +00:00
<div>
2012-04-10 13:56:34 +00:00
<a href="https://twitter.com/yogsototh">Follow @yogsototh</a>
2012-04-02 21:43:39 +00:00
</div>
2011-04-20 12:29:01 +00:00
<div>
<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Copyright ©, Yann Esposito</a>
</div>
<div id="lastmod">
Created: 02/15/2010
2012-01-18 13:28:01 +00:00
Modified: 01/11/2012
2011-04-20 12:29:01 +00:00
</div>
<div>
Entirely done with
<a href="http://www.vim.org">Vim</a>
and
<a href="http://nanoc.stoneship.org">nanoc</a>
</div>
</div>
<div class="clear"></div>
</div>
</body>
</html>