2011-04-20 12:29:01 +00:00
<?xml version="1.0" encoding="utf-8"?>
< !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
< html xmlns = "http://www.w3.org/1999/xhtml" lang = "fr" xml:lang = "fr" >
< head >
< meta http-equiv = "Content-Type" content = "text/html; charset=UTF-8" / >
2011-12-07 15:40:03 +00:00
< meta name = "keywords" content = "XML, Perl, programming, tree, theory, mathematics, regexp, script" >
2011-04-20 12:29:01 +00:00
2011-04-20 13:56:52 +00:00
< link rel = "shortcut icon" type = "image/x-icon" href = "/Scratch/img/favicon.ico" / >
< link rel = "stylesheet" type = "text/css" href = "/Scratch/assets/css/main.css" / >
2012-04-02 21:43:39 +00:00
< link rel = "stylesheet" type = "text/css" href = "/Scratch/css/solarized.css" / >
2011-04-20 13:56:52 +00:00
< link rel = "stylesheet" type = "text/css" href = "/Scratch/css/idc.css" / >
2012-05-02 15:43:56 +00:00
< link href = 'http://fonts.googleapis.com/css?family=Inconsolata' rel = 'stylesheet' type = 'text/css' >
2011-04-20 12:29:01 +00:00
< link rel = "alternate" type = "application/rss+xml" title = "RSS" href = "http://feeds.feedburner.com/yannespositocomen" / >
2011-04-20 13:56:52 +00:00
< link rel = "alternate" lang = "fr" xml:lang = "fr" title = "Arbres ; Pragmatisme et Formalisme" type = "text/html" hreflang = "fr" href = "/Scratch/fr/blog/2010-05-24-Trees--Pragmatism-and-Formalism/" / >
< link rel = "alternate" lang = "en" xml:lang = "en" title = "Trees; Pragmatism and Formalism" type = "text/html" hreflang = "en" href = "/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/" / >
< script type = "text/javascript" src = "/Scratch/js/jquery-1.3.1.min.js" > < / script >
< script type = "text/javascript" src = "/Scratch/js/jquery.cookie.js" > < / script >
< script type = "text/javascript" src = "/Scratch/js/index.js" > < / script >
2012-05-02 15:43:56 +00:00
< script type = "text/javascript" src = "/Scratch/js/highlight/highlight.pack.js" > < / script >
< script type = "text/javascript" src = "/Scratch/js/article.js" > < / script >
2011-04-20 12:29:01 +00:00
<!-- [if lt IE 9]>
< script src = "http://ie7-js.googlecode.com/svn/version/2.1(beta4)/IE9.js" > < / script >
<![endif]-->
< title > Trees; Pragmatism and Formalism< / title >
< / head >
2011-10-18 22:30:00 +00:00
< body lang = "en" class = "article" >
2011-04-20 12:29:01 +00:00
< script type = "text/javascript" > / / < ! [ C D A T A [
2011-04-20 13:56:52 +00:00
document.write('< div id = "blackpage" > < img src = "/Scratch/img/loading.gif" alt = "loading..." / > < / div > ');
2011-04-20 12:29:01 +00:00
// ]]>
< / script >
< div id = "content" >
< div id = "choix" >
< div class = "return" > < a href = "#entete" > ↓ Menu ↓ < / a > < / div >
< div id = "choixlang" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/fr/blog/2010-05-24-Trees--Pragmatism-and-Formalism/" onclick = "setLanguage('fr')" > en Français< / a >
2011-04-20 12:29:01 +00:00
< / div >
2011-09-28 16:05:55 +00:00
< div class = "flush" > < / div >
2011-04-20 12:29:01 +00:00
< / div >
< div id = "titre" >
< h1 >
Trees; Pragmatism and Formalism
< / h1 >
< h2 >
When theory is more efficient than practice
< / h2 >
< / div >
< div class = "flush" > < / div >
< div class = "flush" > < / div >
< div id = "afterheader" >
< div class = "corps" >
< div class = "intro" >
< p > < abbr title = "Too Long; Don't Read" > < span class = "sc" > tl;dr< / span > < / abbr > : < / p >
< ul >
< li > I tried to program a simple filter< / li >
< li > Was blocked 2 days< / li >
< li > Then stopped working like an engineer monkey< / li >
< li > Used a pen and a sheet of paper< / li >
< li > Made some math.< / li >
< li > Crushed the problem in 10 minutes< / li >
2012-05-02 15:43:56 +00:00
< li > Conclusion: The pragmatism shouldn’ t mean “never use theory”.< / li >
2011-04-20 12:29:01 +00:00
< / ul >
< / div >
< h2 id = "abstract-longer-than-abbr-titletoo-long-dont-readsctldrscabbr" > Abstract (longer than < abbr title = "Too Long; Don't Read" > < span class = "sc" > tl;dr< / span > < / abbr > )< / h2 >
< p > For my job, I needed to resolve a problem. It first seems not too hard.
Then I started working directly on my program.
I entered in the < em > infernal< / em > : < em > try & repair loop< / em > .
Each step was like:< / p >
< blockquote >
2012-05-02 15:43:56 +00:00
< p > – Just this thing to repair and that should be done.< br / >
– OK, now that should just work.< br / >
– Yeah!!!< br / >
– Oops! I forgotten that…< br / >
2011-04-20 12:29:01 +00:00
< code > repeat until death< / code > < / p >
< / blockquote >
< p > After two days of this < a href = "http://fr.wikipedia.org/wiki/Sisyphe" > Sisyphus< / a > work, I finally just stopped to rethink the problem.
I took a pen, a sheet of paper. I simplified the problem, reminded what I learned during my Ph.D. about trees.
Finally, the problem was crushed in less than 20 minutes.< / p >
< p > I believe the important lesson is to remember that the most efficient methodology to resolve this < em > pragmatic< / em > problem was the < em > theoretical< / em > one.
And therefore, argues opposing science, theory to pragmatism and efficiency are fallacies.< / p >
< / div >
< div class = "corps" >
< h1 class = "first" id = "first-my-experience" > First: my experience< / h1 >
< p > Apparently 90% of programmer are unable to program a binary search without bug.
The algorithm is well known and easy to understand.
However it is difficult to program it without any flaw.
I participated to < a href = "http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/" > this contest< / a > .
And you can see the < a href = "http://reprog.wordpress.com/2010/04/21/binary-search-redux-part-1/" > results here< / a > < sup id = "fnref:1" > < a href = "#fn:1" rel = "footnote" > 1< / a > < / sup > .
I had to face a problem of the same kind at my job. The problem was simple to the start. Simply transform an < span class = "sc" > xml< / span > from one format to another.< / p >
< p > The source < span class = "sc" > xml< / span > was in the following general format:< / p >
2012-05-02 15:43:56 +00:00
< pre > < code class = "xml" > < rubrique>
< contenu>
< tag1> value1< /tag1>
< tag2> value2< /tag2>
2011-04-20 12:29:01 +00:00
...
2012-05-02 15:43:56 +00:00
< /contenu>
< enfant>
< rubrique>
2011-04-20 12:29:01 +00:00
...
2012-05-02 15:43:56 +00:00
< /rubrique>
2011-04-20 12:29:01 +00:00
...
2012-05-02 15:43:56 +00:00
< rubrique>
2011-04-20 12:29:01 +00:00
...
2012-05-02 15:43:56 +00:00
< /rubrique>
< /enfant>
< /menu>
< / code > < / pre >
2011-04-20 12:29:01 +00:00
< p > and the destination format was in the following general format:< / p >
2012-05-02 15:43:56 +00:00
< pre > < code class = "xml" > < item name="Menu0">
< value>
< item name="menu">
< value>
< item name="tag1">
< value> value1< /value>
< /item>
< item name="tag2">
< value> value2< /value>
< /item>
2011-04-20 12:29:01 +00:00
...
2012-05-02 15:43:56 +00:00
< item name="menu">
< value>
2011-04-20 12:29:01 +00:00
...
2012-05-02 15:43:56 +00:00
< /value>
< value>
2011-04-20 12:29:01 +00:00
...
2012-05-02 15:43:56 +00:00
< /value>
< /item>
< /value>
< /item>
< /value>
< /item>
< / code > < / pre >
2011-04-20 12:29:01 +00:00
< p > At first sight I believed it will be easy. I was so certain it will be easy that I fixed to myself the following rules:< / p >
< ol >
< li > do not use < span class = "sc" > xslt< / span > < / li >
< li > avoid the use of an < span class = "sc" > xml< / span > parser< / li >
< li > resolve the problem using a simple perl script[^2]< / li >
< / ol >
< p > You can try if you want. If you attack the problem directly opening an editor, I assure you, it will certainly be not so simple.
2012-05-02 15:43:56 +00:00
I can tell that, because it’ s what I’ ve done. And I must say I lost almost a complete day at work trying to resolve this. There was also, many small problems around that make me lose more than two days for this problem.< / p >
2011-04-20 12:29:01 +00:00
< p > Why after two days did I was unable to resolve this problem which seems so simple?< / p >
< p > What was my behaviour (workflow)?< / p >
< ol >
< li > Think< / li >
< li > Write the program< / li >
< li > Try the program < / li >
< li > Verify the result< / li >
< li > Found a bug< / li >
< li > Resolve the bug< / li >
< li > Go to step 3.< / li >
< / ol >
< p > This was a < em > standard< / em > workflow for computer engineer. The flaw came from the first step.
I thought about how to resolve the problem but with the eyes of a < em > pragmatic engineer< / em > . I was saying:< / p >
< blockquote >
< p > That should be a simple perl search and replace program.< br / >
2012-05-02 15:43:56 +00:00
Let’ s begin to write code< / p >
2011-04-20 12:29:01 +00:00
< / blockquote >
< p > This is the second sentence that was plainly wrong. I started in the wrong direction. And the workflow did not work from this entry point.< / p >
< h2 id = "think" > Think< / h2 >
2012-05-02 15:43:56 +00:00
< p > After some times, I just stopped to work. Tell myself < em > “it is enough, now, I must finish it!”< / em > .
2011-04-20 12:29:01 +00:00
I took a sheet of paper, a pen and began to draw some trees.< / p >
< p > I began by make by removing most of the verbosity.
I first renamed < code > < item name="Menu"> < / code > by simpler name < code > M< / code > for example.
I obtained something like:< / p >
2011-04-20 13:56:52 +00:00
< p > < img alt = "The source tree" src = "/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/graph/The_source_tree.png" / > < / p >
2011-04-20 12:29:01 +00:00
< p > and< / p >
2011-04-20 13:56:52 +00:00
< p > < img alt = "The destination tree" src = "/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/graph/The_destination_tree.png" / > < / p >
2011-04-20 12:29:01 +00:00
< p > Then I made myself the following reflexion:< / p >
< p > Considering Tree Edit Distance, each unitary transformation of tree correspond to a simple search and replace on my < span class = "sc" > xml< / span > source< sup id = "fnref:nb" > < a href = "#fn:nb" rel = "footnote" > 2< / a > < / sup > .
We consider three atomic transformations on trees:< / p >
< ul >
< li > < em > substitution< / em > : renaming a node< / li >
< li > < em > insertion< / em > : adding a node< / li >
< li > < em > deletion< / em > : remove a node< / li >
< / ul >
< p > One of the particularity of atomic transformations on trees, is ; if you remove a node, all children of this node, became children of its father.< / p >
< p > An example:< / p >
< pre class = "twilight" >
r - x - a
\ \
\ b
y - c
< / pre >
< p > If you delete the < code > x< / code > node, you obtain< / p >
< pre class = "twilight" >
a
/
r - b
\
y - c
< / pre >
< p > And look at what it implies when you write it in < span class = "sc" > xml< / span > :< / p >
2012-05-02 15:43:56 +00:00
< pre > < code class = "xml" > < r>
< x>
< a> value for a< /a>
< b> value for b< /b>
< /x>
< y>
< c> value for c< /c>
< /y>
< /r>
< / code > < / pre >
2011-04-20 12:29:01 +00:00
< p > Then deleting all < code > x< / code > nodes is equivalent to pass the < span class = "sc" > xml< / span > via the following search and replace script:< / p >
2012-05-02 15:43:56 +00:00
< pre > < code class = "perl" > s/< \/?x> //g
< / code > < / pre >
2011-04-20 12:29:01 +00:00
< p > Therefore, if there exists a one state deterministic transducer which transform my trees ;
I can transform the < span class = "sc" > xml< / span > from one format to another with just a simple list of search and replace directives.< / p >
< h1 id = "solution" > Solution< / h1 >
< p > Transform this tree:< / p >
< pre class = "twilight" >
R - C - tag1
\ \
\ tag2
E -- R - C - tag1
\ \ \
\ \ tag2
\ E ...
R - C - tag1
\ \
\ tag2
E ...
< / pre >
< p > to this tree:< / p >
< pre class = "twilight" >
tag1
/
M - V - M - V - tag2 tag1
\ /
M --- V - tag2
\ \
\ M
\ tag1
\ /
V - tag2
\
M
< / pre >
< p > can be done using the following one state deterministic tree transducer:< / p >
< blockquote >
2012-05-02 15:43:56 +00:00
< p > C → ε< br / >
2011-04-20 12:29:01 +00:00
E → M< br / >
R → V < / p >
< / blockquote >
< p > Wich can be traduced by the following simple search and replace directives: < / p >
2012-05-02 15:43:56 +00:00
< pre > < code class = "perl" > s/C//g
s/E/M/g
s/R/V/g
< / code > < / pre >
2011-04-20 12:29:01 +00:00
< p > Once adapted to < span class = "sc" > xml< / span > it becomes:< / p >
2012-05-02 15:43:56 +00:00
< pre > < code class = "perl" > s%< /?contenu> %%g
s%< enfant> %< item name="menu"> %g
s%< /enfant> %< /item> %g
s%< rubrique> %< value> %g
s%< /rubrique> %< /value> %g
< / code > < / pre >
2011-04-20 12:29:01 +00:00
< p > That is all.< / p >
< h1 id = "conclusion" > Conclusion< / h1 >
< p > It should seems a bit paradoxal, but sometimes the most efficient approach to a pragmatic problem is to use the theoretical methodology.< / p >
< hr / > < div class = "footnotes" >
< ol >
< li id = "fn:1" >
2012-02-20 14:41:09 +00:00
< p > Hopefully I am in the 10% who had given a bug free implementation.< a href = "#fnref:1" rel = "reference" > ↩ < / a > < / p >
2011-04-20 12:29:01 +00:00
< / li >
< li id = "fn:nb" >
2012-02-20 14:41:09 +00:00
< p > I did a program which generate automatically the weight in a matrix of each edit distance from data.< a href = "#fnref:nb" rel = "reference" > ↩ < / a > < / p >
2011-04-20 12:29:01 +00:00
< / li >
< / ol >
< / div >
< / div >
2012-04-10 13:56:34 +00:00
< div id = "social" >
< div class = "left" > < a href = "https://twitter.com/share" class = "twitter-share-button" data-via = "yogsototh" > Tweet< / a >
< script > ! function ( d , s , id ) { var js , fjs = d . getElementsByTagName ( s ) [ 0 ] ; if ( ! d . getElementById ( id ) ) { js = d . createElement ( s ) ; js . id = id ; js . src = "//platform.twitter.com/widgets.js" ; fjs . parentNode . insertBefore ( js , fjs ) ; } } ( document , "script" , "twitter-wjs" ) ; < / script >
< / div >
< div class = "left" > < div class = "g-plusone" data-size = "medium" data-annotation = "inline" data-width = "106" > < / div >
< script type = "text/javascript" >
(function() {
var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true;
po.src = 'https://apis.google.com/js/plusone.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
})();
< / script >
< / div >
< div class = "flush" > < / div >
< / div >
2011-04-20 12:29:01 +00:00
< div id = "choixrss" >
< a id = "rss" href = "http://feeds.feedburner.com/yannespositocomen" >
Subscribe
< / a >
< / div >
< script type = "text/javascript" >
$(document).ready(function(){
$('#comment').hide();
$('#clickcomment').click(showComments);
});
function showComments() {
$('#comment').show();
$('#clickcomment').fadeOut();
}
2012-04-10 13:56:34 +00:00
document.write('< div id = "clickcomment" > Comments & Share< / div > ');
2011-04-20 12:29:01 +00:00
< / script >
< div class = "flush" > < / div >
2012-04-10 13:56:34 +00:00
2011-04-20 12:29:01 +00:00
< div class = "corps" id = "comment" >
< h2 class = "first" > comments< / h2 >
< noscript >
You must enable javascript to comment.
< / noscript >
< script type = "text/javascript" >
var idcomments_acct = 'a307f0044511ff1b5cfca573fc0a52e7';
2011-04-20 13:56:52 +00:00
var idcomments_post_id = '/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/';
var idcomments_post_url = 'http://yannesposito.com/Scratch/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism/';
2011-04-20 12:29:01 +00:00
< / script >
< span id = "IDCommentsPostTitle" style = "display:none" > < / span >
2011-04-20 13:56:52 +00:00
< script type = 'text/javascript' src = '/Scratch/js/genericCommentWrapperV2.js' > < / script >
2011-04-20 12:29:01 +00:00
< / div >
< div id = "entete" class = "corps_spaced" >
< div id = "liens" >
2011-04-20 13:56:52 +00:00
< ul > < li > < a href = "/Scratch/en/" > Home< / a > < / li >
< li > < a href = "/Scratch/en/blog/" > Blog< / a > < / li >
< li > < a href = "/Scratch/en/softwares/" > Softwares< / a > < / li >
< li > < a href = "/Scratch/en/about/" > About< / a > < / li > < / ul >
2011-04-20 12:29:01 +00:00
< / div >
< div class = "flush" > < / div >
< hr / >
< div id = "next_before_articles" >
< div id = "previous_articles" >
previous entries
< div class = "previous_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-05-19-How-to-cut-HTML-and-repair-it/" > < span class = "nicer" > «< / span > How to repair a cutted XML?< / a >
2011-04-20 12:29:01 +00:00
< / div >
< div class = "previous_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-05-17-at-least-this-blog-revive/" > < span class = "nicer" > «< / span > I live again!< / a >
2011-04-20 12:29:01 +00:00
< / div >
< div class = "previous_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-03-23-Encapsulate-git/" > < span class = "nicer" > «< / span > Encapsulate git< / a >
2011-04-20 12:29:01 +00:00
< / div >
< / div >
< div id = "next_articles" >
next entries
< div class = "next_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-06-14-multi-language-choices/" > multi language choices < span class = "nicer" > »< / span > < / a >
2011-04-20 12:29:01 +00:00
< / div >
< div class = "next_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-06-15-Get-my-blog-engine/" > Get my blog engine < span class = "nicer" > »< / span > < / a >
2011-04-20 12:29:01 +00:00
< / div >
< div class = "next_article" >
2011-04-20 13:56:52 +00:00
< a href = "/Scratch/en/blog/2010-06-17-track-events-with-google-analytics/" > Track Events with Google Analytics < span class = "nicer" > »< / span > < / a >
2011-04-20 12:29:01 +00:00
< / div >
< / div >
< div class = "flush" > < / div >
< / div >
< / div >
< div id = "bottom" >
2012-04-02 21:43:39 +00:00
< div >
2012-04-10 13:56:34 +00:00
< a href = "https://twitter.com/yogsototh" > Follow @yogsototh< / a >
2012-04-02 21:43:39 +00:00
< / div >
2011-04-20 12:29:01 +00:00
< div >
< a rel = "license" href = "http://creativecommons.org/licenses/by-sa/3.0/" > Copyright ©, Yann Esposito< / a >
< / div >
< div id = "lastmod" >
Created: 05/24/2010
2011-12-07 16:37:23 +00:00
Modified: 12/07/2011
2011-04-20 12:29:01 +00:00
< / div >
< div >
Entirely done with
< a href = "http://www.vim.org" > Vim< / a >
and
< a href = "http://nanoc.stoneship.org" > nanoc< / a >
< / div >
< / div >
< div class = "clear" > < / div >
< / div >
< / body >
< / html >