245 lines
No EOL
14 KiB
HTML
245 lines
No EOL
14 KiB
HTML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|
|
|
|
|
<meta name="keywords" content="regex, regexp, regular expression, negate">
|
|
|
|
<link rel="shortcut icon" type="image/x-icon" href="/Scratch/img/favicon.ico" />
|
|
<link rel="stylesheet" type="text/css" href="/Scratch/assets/css/main.css" />
|
|
<link rel="stylesheet" type="text/css" href="/Scratch/css/twilight.css" />
|
|
<link rel="stylesheet" type="text/css" href="/Scratch/css/idc.css" />
|
|
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://feeds.feedburner.com/yannespositocomen"/>
|
|
|
|
<link rel="alternate" lang="fr" xml:lang="fr" title="Expression régulière pour tout sauf quelquechose" type="text/html" hreflang="fr" href="/Scratch/fr/blog/2010-02-15-All-but-something-regexp/" />
|
|
<link rel="alternate" lang="en" xml:lang="en" title="Pragmatic Regular Expression Exclude" type="text/html" hreflang="en" href="/Scratch/en/blog/2010-02-15-All-but-something-regexp/" />
|
|
<script type="text/javascript" src="/Scratch/js/jquery-1.3.1.min.js"></script>
|
|
<script type="text/javascript" src="/Scratch/js/jquery.cookie.js"></script>
|
|
<script type="text/javascript" src="/Scratch/js/index.js"></script>
|
|
<!--[if lt IE 9]>
|
|
<script src="http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js"></script>
|
|
<![endif]-->
|
|
<title>Pragmatic Regular Expression Exclude</title>
|
|
</head>
|
|
<body lang="en" class="article">
|
|
<script type="text/javascript">// <![CDATA[
|
|
document.write('<div id="blackpage"><img src="/Scratch/img/loading.gif" alt="loading..."/></div>');
|
|
// ]]>
|
|
</script>
|
|
|
|
<div id="content">
|
|
|
|
<div id="choix">
|
|
<div class="return"><a href="#entete">↓ Menu ↓</a></div>
|
|
<div id="choixlang">
|
|
<a href="/Scratch/fr/blog/2010-02-15-All-but-something-regexp/" onclick="setLanguage('fr')">en Français</a>
|
|
</div>
|
|
<div class="flush"></div>
|
|
</div>
|
|
<div id="titre">
|
|
<h1>
|
|
Pragmatic Regular Expression Exclude
|
|
</h1>
|
|
|
|
</div>
|
|
<div class="flush"></div>
|
|
|
|
|
|
|
|
|
|
|
|
<div class="flush"></div>
|
|
<div id="afterheader">
|
|
<div class="corps">
|
|
<p>Sometimes you cannot simply write:</p>
|
|
|
|
<div><pre class="twilight">
|
|
<span class="Keyword">if</span> str.<span class="Entity">match</span>(regexp) <span class="Keyword">and</span>
|
|
<span class="Keyword">not</span> str.<span class="Entity">match</span>(other_regexp)
|
|
do_something
|
|
</pre></div>
|
|
|
|
<p>and you have to make this behaviour with only one regular expression.
|
|
But, there exists a major problem: the complementary of a regular language might not be regular.
|
|
Then, for some expression it is absolutely impossible to negate a regular expression.</p>
|
|
|
|
<p>But sometimes with some simple regular expression it should be possible<sup><a href="#note1">†</a></sup>. Say you want to match everything containing the some word say <code>bull</code> but don’t want to match <code>bullshit</code>. Here is a nice way to do that:</p>
|
|
|
|
<div><pre class="twilight">
|
|
<span class="Comment"><span class="Comment">#</span> match all string containing 'bull' (bullshit comprised)</span>
|
|
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">bull</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
|
|
|
|
<span class="Comment"><span class="Comment">#</span> match all string containing 'bull' except 'bullshit'</span>
|
|
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">bull<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^s<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|</span>
|
|
<span class="StringRegexp">bulls<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^h<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|</span>
|
|
<span class="StringRegexp">bullsh<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^i<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|</span>
|
|
<span class="StringRegexp">bullshi<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^t<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span></span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
|
|
|
|
<span class="Comment"><span class="Comment">#</span> another way to write it would be</span>
|
|
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">bull<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^s<span class="StringRegexp">]</span></span>|$|s<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^h<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|sh<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^i<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span>|shi<span class="StringRegexp"><span class="StringRegexp">(</span><span class="StringRegexp"><span class="StringRegexp">[</span>^t<span class="StringRegexp">]</span></span>|$<span class="StringRegexp">)</span></span><span class="StringRegexp">)</span></span></span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
|
|
</pre></div>
|
|
|
|
<p>Let look closer. In the first line the expression is:
|
|
<code>bull([^s]|$)</code>, why does the <code>$</code> is needed?
|
|
Because, without it the word <code>bull</code> would be no more matched. This expression means:</p>
|
|
|
|
<blockquote>
|
|
<p>The string finish by <code>bull</code> <br />
|
|
or, <br />
|
|
contains <code>bull</code> followed by a letter different from <code>s</code>. </p>
|
|
</blockquote>
|
|
|
|
<p>And this is it. I hope it could help you.</p>
|
|
|
|
<p>Notice this method is not always the best. For example try to write a regular expression equivalent to the following conditional expression:</p>
|
|
<div><pre class="twilight">
|
|
<span class="Comment"><span class="Comment">#</span> Begin with 'a': ^a</span>
|
|
<span class="Comment"><span class="Comment">#</span> End with 'a': c$</span>
|
|
<span class="Comment"><span class="Comment">#</span> Contain 'b': .*b.*</span>
|
|
<span class="Comment"><span class="Comment">#</span> But isn't 'axbxc'</span>
|
|
<span class="Keyword">if</span> str.<span class="Entity">match</span>(<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">^a.*b.*c$</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>) <span class="Keyword">and</span>
|
|
<span class="Keyword">not</span> str.<span class="Entity">match</span>(<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">^axbxc$</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>)
|
|
do_something
|
|
<span class="Keyword">end</span>
|
|
</pre></div>
|
|
|
|
<p>A nice solution is:</p>
|
|
|
|
<div><pre class="twilight">
|
|
<span class="StringRegexp"><span class="StringRegexp">/</span></span><span class="StringRegexp">abc| <span class="Comment"><span class="Comment">#</span> length 3</span></span>
|
|
<span class="StringRegexp">a.bc| <span class="Comment"><span class="Comment">#</span> length 4</span></span>
|
|
<span class="StringRegexp">ab.c|</span>
|
|
<span class="StringRegexp">a<span class="StringRegexp"><span class="StringRegexp">[</span>^x<span class="StringRegexp">]</span></span>b<span class="StringRegexp"><span class="StringRegexp">[</span>^x<span class="StringRegexp">]</span></span>c| <span class="Comment"><span class="Comment">#</span> length 5</span></span>
|
|
<span class="StringRegexp">a...*b.*c| # length >5</span>
|
|
<span class="StringRegexp">a.*b...*c</span><span class="StringRegexp"><span class="StringRegexp">/</span></span>
|
|
</pre></div>
|
|
|
|
<p>This solution uses the maximal length of the string not to be matched.
|
|
There certainly exists many other methods. But the important lesson is
|
|
it is not straightforward to exclude something of a regular expression.</p>
|
|
|
|
<hr />
|
|
<p><small><a name="note1">†</a>
|
|
It can be proved that any regular set minus a finite set is also regular.
|
|
</small></p>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<div id="choixrss">
|
|
<a id="rss" href="http://feeds.feedburner.com/yannespositocomen">
|
|
Subscribe
|
|
</a>
|
|
</div>
|
|
<script type="text/javascript">
|
|
$(document).ready(function(){
|
|
$('#comment').hide();
|
|
$('#clickcomment').click(showComments);
|
|
});
|
|
function showComments() {
|
|
$('#comment').show();
|
|
$('#clickcomment').fadeOut();
|
|
}
|
|
document.write('<div id="clickcomment">Comments</div>');
|
|
</script>
|
|
<div class="flush"></div>
|
|
<div class="corps" id="comment">
|
|
<h2 class="first">comments</h2>
|
|
<noscript>
|
|
You must enable javascript to comment.
|
|
</noscript>
|
|
|
|
<script type="text/javascript">
|
|
var idcomments_acct = 'a307f0044511ff1b5cfca573fc0a52e7';
|
|
var idcomments_post_id = '/Scratch/en/blog/2010-02-15-All-but-something-regexp/';
|
|
var idcomments_post_url = 'http://yannesposito.com/Scratch/en/blog/2010-02-15-All-but-something-regexp/';
|
|
</script>
|
|
<span id="IDCommentsPostTitle" style="display:none"></span>
|
|
<script type='text/javascript' src='/Scratch/js/genericCommentWrapperV2.js'></script>
|
|
|
|
</div>
|
|
|
|
<div id="entete" class="corps_spaced">
|
|
<div id="liens">
|
|
<ul><li><a href="/Scratch/en/">Home</a></li>
|
|
<li><a href="/Scratch/en/blog/">Blog</a></li>
|
|
<li><a href="/Scratch/en/softwares/">Softwares</a></li>
|
|
<li><a href="/Scratch/en/about/">About</a></li></ul>
|
|
</div>
|
|
<div class="flush"></div>
|
|
<hr/>
|
|
<div id="next_before_articles">
|
|
<div id="previous_articles">
|
|
previous entries
|
|
|
|
<div class="previous_article">
|
|
<a href="/Scratch/en/blog/2010-01-12-antialias-font-in-Firefox-under-Ubuntu/"><span class="nicer">«</span> antialias font in Firefox under Ubuntu</a>
|
|
</div>
|
|
|
|
|
|
<div class="previous_article">
|
|
<a href="/Scratch/en/blog/2010-01-04-Change-default-shell-on-Mac-OS-X/"><span class="nicer">«</span> Change default shell on Mac OS X</a>
|
|
</div>
|
|
|
|
|
|
<div class="previous_article">
|
|
<a href="/Scratch/en/blog/2009-12-14-Git-vs--Bzr/"><span class="nicer">«</span> Git vs. Bzr</a>
|
|
</div>
|
|
|
|
|
|
</div>
|
|
<div id="next_articles">
|
|
next entries
|
|
|
|
<div class="next_article">
|
|
<a href="/Scratch/en/blog/2010-02-16-All-but-something-regexp--2-/">Pragmatic Regular Expression Exclude (2) <span class="nicer">»</span></a>
|
|
</div>
|
|
|
|
|
|
<div class="next_article">
|
|
<a href="/Scratch/en/blog/2010-02-18-split-a-file-by-keyword/">split a file by keyword <span class="nicer">»</span></a>
|
|
</div>
|
|
|
|
|
|
<div class="next_article">
|
|
<a href="/Scratch/en/blog/2010-02-23-When-regexp-is-not-the-best-solution/">When regexp is not the best solution <span class="nicer">»</span></a>
|
|
</div>
|
|
|
|
|
|
</div>
|
|
<div class="flush"></div>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
<div id="bottom">
|
|
<div>
|
|
<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Copyright ©, Yann Esposito</a>
|
|
</div>
|
|
<div id="lastmod">
|
|
Created: 02/15/2010
|
|
Modified: 01/11/2012
|
|
</div>
|
|
<div>
|
|
Entirely done with
|
|
<a href="http://www.vim.org">Vim</a>
|
|
and
|
|
<a href="http://nanoc.stoneship.org">nanoc</a>
|
|
</div>
|
|
<div>
|
|
<a href="/Scratch/en/validation/">Validation</a>
|
|
<a href="http://validator.w3.org/check?uri=referer"> [xhtml] </a>
|
|
.
|
|
<a href="http://jigsaw.w3.org/css-validator/check/referer?profile=css3"> [css] </a>
|
|
.
|
|
<a href="http://validator.w3.org/feed/check.cgi?url=http%3A//yannesposito.com/Scratch/en/blog/feed/feed.xml">[rss]</a>
|
|
</div>
|
|
</div>
|
|
<div class="clear"></div>
|
|
</div>
|
|
</body>
|
|
</html> |