scratch/content/html/en/blog/2010-05-24-Trees--Pragmatism-and-Formalism.md

-----
isHidden:       true
menupriority:   1
kind:           article
created_at:     2010-05-24T20:05:14+02:00
title: Trees; Pragmatism and Formalism
subtitle: When theory is more efficient than practice
author_name: Yann Esposito
author_uri: yannesposito.com
tags:
    - tree
    - theory
    - mathematics
    - regexp
    - script
-----

begindiv(intro)
<% tldr=%{<abbr title="Too Long; Don't Read"><sc>tl;dr</sc></abbr>} %>

<%=tldr%>: 

- I tried to program a simple filter
- Was blocked 2 days
- Then stopped working like an engineer monkey
- Used a pen and a sheet of paper
- Made some math.
- Crushed the problem in 10 minutes
- Conclusion: The pragmatism shouldn't mean "never use theory".

enddiv

## Abstract (longer than <%=tldr%>)

For my job, I needed to resolve a problem. It first seems not too hard. 
Then I started working directly on my program. 
I entered in the *infernal*: *try &amp; repair loop*.
Each step was like:


>   -- Just this thing to repair and that should be done.  
>   -- OK, now that should just work.  
>   -- Yeah!!!  
>   -- Oops! I forgotten that...  
>   `repeat until death`


After two days of this [Sisyphus](http://fr.wikipedia.org/wiki/Sisyphe) work, I finally just stopped to rethink the problem.
I took a pen, a sheet of paper. I simplified the problem, reminded what I learned during my Ph.D. about trees.
Finally, the problem was crushed in less than 20 minutes.

I believe the important lesson is to remember that the most efficient methodology to resolve this *pragmatic* problem was the *theoretical* one. 
And therefore, argues opposing science, theory to pragmatism and efficiency are fallacies.

newcorps

# First: my experience

Apparently 90% of programmer are unable to program a binary search without bug. 
The algorithm is well known and easy to understand. 
However it is difficult to program it without any flaw. 
I participated to [this contest](http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/).
And you can see the [results here](http://reprog.wordpress.com/2010/04/21/binary-search-redux-part-1/)[^1].
I had to face a problem of the same kind at my job. The problem was simple to the start. Simply transform an <sc>xml</sc> from one format to another.


[^1]: Hopefully I am in the 10% who had given a bug free implementation.

The source <sc>xml</sc> was in the following general format:

<code class="xml">
<rubrique>
    <contenu>
        <tag1>value1</tag1>
        <tag2>value2</tag2>
        ...
    </contenu>
    <enfant>
        <rubrique>
            ...
        </rubrique>
        ...
        <rubrique>
            ...
        </rubrique>
    </enfant>
</menu>
</code>

and the destination format was in the following general format:

<code class="xml">
<item name="Menu0">
    <value>
        <item name="menu">
            <value>
                <item name="tag1">
                    <value>value1</value>
                </item>
                <item name="tag2">
                    <value>value2</value>
                </item>
                ...
                <item name="menu">
                    <value>
                        ...
                    </value>
                    <value>
                        ...
                    </value>
                </item>
            </value>
        </item>
    </value>
</item>
</code>

At first sight I believed it will be easy. I was so certain it will be easy that I fixed to myself the following rules:

1. do not use <sc>xslt</sc>
2. avoid the use of an <sc>xml</sc> parser
3. resolve the problem using a simple perl script[^2]

You can try if you want. If you attack the problem directly opening an editor, I assure you, it will certainly be not so simple.
I can tell that, because it's what I've done. And I must say I lost almost a complete day at work trying to resolve this. There was also, many small problems around that make me lose more than two days for this problem.

Why after two days did I was unable to resolve this problem which seems so simple?

What was my behaviour (workflow)?

1. Think
2. Write the program
3. Try the program 
4. Verify the result
5. Found a bug
6. Resolve the bug
7. Go to step 3.


This was a *standard* workflow for computer engineer. The flaw came from the first step. 
I thought about how to resolve the problem but with the eyes of a *pragmatic engineer*. I was saying:

> That should be a simple perl search and replace program.  
> Let's begin to write code

This is the second sentence that was plainly wrong. I started in the wrong direction. And the workflow did not work from this entry point.

## Think

After some times, I just stopped to work. Tell myself *"it is enough, now, I must finish it!"*.
I took a sheet of paper, a pen and began to draw some trees.

I began by make by removing most of the verbosity.
I first renamed `<item name="Menu">` by simpler name `M` for example.
I obtained something like:

<%= blogimage('formal_DCR_tree.png', 'The source tree') %>

and

<%= blogimage('formal_Menu_tree.png', 'The destination tree') %>


Then I made myself the following reflexion:

Considering Tree Edit Distance, each unitary transformation of tree correspond to a simple search and replace on my <sc>xml</sc> source[^nb].
We consider three atomic transformations on trees:

 - *substitution*: renaming a node
 - *insertion*: adding a node
 - *deletion*: remove a node


[^nb]: I did a program which generate automatically the weight in a matrix of each edit distance from data.

One of the particularity of atomic transformations on trees, is ; if you remove a node, all children of this node, became children of its father.

An example:

<pre class="twilight">
r - x - a
  \   \
   \    b
    y - c   
</pre>

If you delete the `x` node, you obtain

<pre class="twilight">
    a
  /
r - b
  \
    y - c   
</pre>

Et regardez ce que ça implique quand on l'écrit en <sc>xml</sc> :

<code class="xml">
<r>
  <x>
    <a>value for a</a>
    <b>vblue for b</b>
  </x>
  <y>
    <c>value for c</c>
  </y>
</r>
</code>

Then deleting all `x` nodes is equivalent to pass the <sc>xml</sc> via the following search and replace script:

<code class="perl">
s/<\/?x>//g
</code>

Therefore, if there exists a one state deterministic transducer which transform my trees ;
I can transform the <sc>xml</sc> from one format to another with just a simple list of search and replace directives.

# Solution

Transform this tree:

<pre class="twilight">
R - C - tag1
  \   \
   \    tag2
    E -- R - C - tag1
      \   \    \
       \   \     tag2
        \    E ...
         R - C - tag1 
           \    \
            \     tag2
             E ...
</pre>

to this tree:

<pre class="twilight">
                tag1
              /
M - V - M - V - tag2      tag1
              \         / 
                M --- V - tag2
                  \     \ 
                   \      M
                    \     tag1
                     \  / 
                      V - tag2
                        \ 
                          M
</pre>


can be done using the following one state deterministic tree transducer:


>    C -> &epsilon;  
>    E -> R  
>    R -> V  

Wich can be traduced by the following simple search and replace directives: 

<code class="perl">
s/C//g
s/E/M/g
s/R/V/g
</code>

Once adapted to <sc>xml</sc> it becomes:

<code class="perl">
s%</?contenu>%%g
s%<enfant>%<item name="menu">%g
s%</enfant>%<item>%g
s%</?rubrique>%<value>%g
s%</rubrique>%</value>%g
</code>

That is all.

# Conclusion

It should seems a bit paradoxal, but sometimes the most efficient approach to a pragmatic problem is to use the theoretical methodology.
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`-----`
			`isHidden: true`
			`menupriority: 1`
			`kind: article`
			`created_at: 2010-05-24T20:05:14+02:00`
			`title: Trees; Pragmatism and Formalism`
step in english 2010-05-26 14:46:40 +00:00			`subtitle: When theory is more efficient than practice`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`author_name: Yann Esposito`
			`author_uri: yannesposito.com`
			`tags:`
French + some more 2010-05-28 10:12:46 +00:00			`- tree`
			`- theory`
			`- mathematics`
			`- regexp`
			`- script`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`-----`
Ameliorations 2010-05-26 21:49:51 +00:00
			`begindiv(intro)`
French + some more 2010-05-28 10:12:46 +00:00			`<% tldr=%{<abbr title="Too Long; Don't Read"><sc>tl;dr</sc></abbr>} %>`
Ameliorations 2010-05-26 21:49:51 +00:00
French + some more 2010-05-28 10:12:46 +00:00			`<%=tldr%>:`
Ameliorations 2010-05-26 21:49:51 +00:00
			`- I tried to program a simple filter`
			`- Was blocked 2 days`
			`- Then stopped working like an engineer monkey`
French + some more 2010-05-28 10:12:46 +00:00			`- Used a pen and a sheet of paper`
Ameliorations 2010-05-26 21:49:51 +00:00			`- Made some math.`
			`- Crushed the problem in 10 minutes`
			`- Conclusion: The pragmatism shouldn't mean "never use theory".`

			`enddiv`

French + some more 2010-05-28 10:12:46 +00:00			`## Abstract (longer than <%=tldr%>)`
Ameliorations 2010-05-26 21:49:51 +00:00
			`For my job, I needed to resolve a problem. It first seems not too hard.`
			`Then I started working directly on my program.`
French + some more 2010-05-28 10:12:46 +00:00			`I entered in the infernal: try & repair loop.`
			`Each step was like:`
Ameliorations 2010-05-26 21:49:51 +00:00

French + some more 2010-05-28 10:12:46 +00:00			`> -- Just this thing to repair and that should be done.`
			`> -- OK, now that should just work.`
			`> -- Yeah!!!`
			`> -- Oops! I forgotten that...`
			> `repeat until death`


			`After two days of this [Sisyphus](http://fr.wikipedia.org/wiki/Sisyphe) work, I finally just stopped to rethink the problem.`
			`I took a pen, a sheet of paper. I simplified the problem, reminded what I learned during my Ph.D. about trees.`
			`Finally, the problem was crushed in less than 20 minutes.`

			`I believe the important lesson is to remember that the most efficient methodology to resolve this pragmatic problem was the theoretical one.`
			`And therefore, argues opposing science, theory to pragmatism and efficiency are fallacies.`
Ameliorations 2010-05-26 21:49:51 +00:00
			`newcorps`

tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`# First: my experience`

step in english 2010-05-26 14:46:40 +00:00			`Apparently 90% of programmer are unable to program a binary search without bug.`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`The algorithm is well known and easy to understand.`
			`However it is difficult to program it without any flaw.`
			`I participated to [this contest](http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/).`
			`And you can see the [results here](http://reprog.wordpress.com/2010/04/21/binary-search-redux-part-1/)[^1].`
step in english 2010-05-26 14:46:40 +00:00			`I had to face a problem of the same kind at my job. The problem was simple to the start. Simply transform an <sc>xml</sc> from one format to another.`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00
French + some more 2010-05-28 10:12:46 +00:00
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`[^1]: Hopefully I am in the 10% who had given a bug free implementation.`

			`The source <sc>xml</sc> was in the following general format:`

			`<code class="xml">`
Good trees 2010-05-27 14:57:29 +00:00			`<rubrique>`
			`<contenu>`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`<tag1>value1</tag1>`
			`<tag2>value2</tag2>`
			`...`
Good trees 2010-05-27 14:57:29 +00:00			`</contenu>`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`<enfant>`
Good trees 2010-05-27 14:57:29 +00:00			`<rubrique>`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`...`
Good trees 2010-05-27 14:57:29 +00:00			`</rubrique>`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`...`
Good trees 2010-05-27 14:57:29 +00:00			`<rubrique>`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`...`
Good trees 2010-05-27 14:57:29 +00:00			`</rubrique>`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`</enfant>`
			`</menu>`
			`</code>`

French + some more 2010-05-28 10:12:46 +00:00			`and the destination format was in the following general format:`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00
			`<code class="xml">`
Almost done english version 2010-05-28 12:33:04 +00:00			`<item name="Menu0">`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`<value>`
			`<item name="menu">`
Ameliorations 2010-05-26 21:49:51 +00:00			`<value>`
Good trees 2010-05-27 14:57:29 +00:00			`<item name="tag1">`
			`<value>value1</value>`
			`</item>`
			`<item name="tag2">`
			`<value>value2</value>`
			`</item>`
			`...`
			`<item name="menu">`
			`<value>`
			`...`
			`</value>`
			`<value>`
			`...`
			`</value>`
			`</item>`
Ameliorations 2010-05-26 21:49:51 +00:00			`</value>`
Good trees 2010-05-27 14:57:29 +00:00			`</item>`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`</value>`
Good trees 2010-05-27 14:57:29 +00:00			`</item>`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00			`</code>`

step in english 2010-05-26 14:46:40 +00:00			`At first sight I believed it will be easy. I was so certain it will be easy that I fixed to myself the following rules:`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00
step in english 2010-05-26 14:46:40 +00:00			`1. do not use <sc>xslt</sc>`
			`2. avoid the use of an <sc>xml</sc> parser`
			`3. resolve the problem using a simple perl script[^2]`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00
step in english 2010-05-26 14:46:40 +00:00			`You can try if you want. If you attack the problem directly opening an editor, I assure you, it will certainly be not so simple.`
French + some more 2010-05-28 10:12:46 +00:00			`I can tell that, because it's what I've done. And I must say I lost almost a complete day at work trying to resolve this. There was also, many small problems around that make me lose more than two days for this problem.`
step in english 2010-05-26 14:46:40 +00:00
			`Why after two days did I was unable to resolve this problem which seems so simple?`

			`What was my behaviour (workflow)?`

			`1. Think`
			`2. Write the program`
			`3. Try the program`
			`4. Verify the result`
			`5. Found a bug`
			`6. Resolve the bug`
French + some more 2010-05-28 10:12:46 +00:00			`7. Go to step 3.`
step in english 2010-05-26 14:46:40 +00:00
French + some more 2010-05-28 10:12:46 +00:00
			`This was a standard workflow for computer engineer. The flaw came from the first step.`
step in english 2010-05-26 14:46:40 +00:00			`I thought about how to resolve the problem but with the eyes of a pragmatic engineer. I was saying:`

			`> That should be a simple perl search and replace program.`
			`> Let's begin to write code`

French + some more 2010-05-28 10:12:46 +00:00			`This is the second sentence that was plainly wrong. I started in the wrong direction. And the workflow did not work from this entry point.`
step in english 2010-05-26 14:46:40 +00:00
Almost done english version 2010-05-28 12:33:04 +00:00			`## Think`
step in english 2010-05-26 14:46:40 +00:00
Almost done english version 2010-05-28 12:33:04 +00:00			`After some times, I just stopped to work. Tell myself "it is enough, now, I must finish it!".`
Finished before reread 2010-05-28 13:02:52 +00:00			`I took a sheet of paper, a pen and began to draw some trees.`
Almost done english version 2010-05-28 12:33:04 +00:00
Finished before reread 2010-05-28 13:02:52 +00:00			`I began by make by removing most of the verbosity.`
Almost done english version 2010-05-28 12:33:04 +00:00			I first renamed `<item name="Menu">` by simpler name `M` for example.
			`I obtained something like:`

			`<%= blogimage('formal_DCR_tree.png', 'The source tree') %>`

			`and`

			`<%= blogimage('formal_Menu_tree.png', 'The destination tree') %>`


Finished before reread 2010-05-28 13:02:52 +00:00			`Then I made myself the following reflexion:`
Almost done english version 2010-05-28 12:33:04 +00:00
Finished before reread 2010-05-28 13:02:52 +00:00			`Considering Tree Edit Distance, each unitary transformation of tree correspond to a simple search and replace on my <sc>xml</sc> source[^nb].`
Almost done english version 2010-05-28 12:33:04 +00:00			`We consider three atomic transformations on trees:`

Finished before reread 2010-05-28 13:02:52 +00:00			`- substitution: renaming a node`
			`- insertion: adding a node`
			`- deletion: remove a node`
Almost done english version 2010-05-28 12:33:04 +00:00
Finished before reread 2010-05-28 13:02:52 +00:00
			`[^nb]: I did a program which generate automatically the weight in a matrix of each edit distance from data.`

			`One of the particularity of atomic transformations on trees, is ; if you remove a node, all children of this node, became children of its father.`
Almost done english version 2010-05-28 12:33:04 +00:00
			`An example:`

			`<pre class="twilight">`
			`r - x - a`
			`\ \`
			`\ b`
			`y - c`
			`</pre>`

			If you delete the `x` node, you obtain

			`<pre class="twilight">`
			`a`
			`/`
			`r - b`
			`\`
			`y - c`
			`</pre>`

Finished before reread 2010-05-28 13:02:52 +00:00			`Et regardez ce que ça implique quand on l'écrit en <sc>xml</sc> :`
Almost done english version 2010-05-28 12:33:04 +00:00
			`<code class="xml">`
			`<r>`
			`<x>`
			`<a>value for a</a>`
			`<b>vblue for b</b>`
			`</x>`
			`<y>`
			`<c>value for c</c>`
			`</y>`
			`</r>`
			`</code>`

Finished before reread 2010-05-28 13:02:52 +00:00			Then deleting all `x` nodes is equivalent to pass the <sc>xml</sc> via the following search and replace script:
step in english 2010-05-26 14:46:40 +00:00
Good trees 2010-05-27 14:57:29 +00:00			`<code class="perl">`
Almost done english version 2010-05-28 12:33:04 +00:00			`s/<\/?x>//g`
step in english 2010-05-26 14:46:40 +00:00			`</code>`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00
Finished before reread 2010-05-28 13:02:52 +00:00			`Therefore, if there exists a one state deterministic transducer which transform my trees ;`
Almost done english version 2010-05-28 12:33:04 +00:00			`I can transform the <sc>xml</sc> from one format to another with just a simple list of search and replace directives.`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00
Good trees 2010-05-27 14:57:29 +00:00			`# Solution`

			`Transform this tree:`

French + some more 2010-05-28 10:12:46 +00:00			`<pre class="twilight">`
Good trees 2010-05-27 14:57:29 +00:00			`R - C - tag1`
French + some more 2010-05-28 10:12:46 +00:00			`\ \`
			`\ tag2`
			`E -- R - C - tag1`
			`\ \ \`
			`\ \ tag2`
			`\ E ...`
			`R - C - tag1`
			`\ \`
			`\ tag2`
			`E ...`
Good trees 2010-05-27 14:57:29 +00:00			`</pre>`

			`to this tree:`

French + some more 2010-05-28 10:12:46 +00:00			`<pre class="twilight">`
Good trees 2010-05-27 14:57:29 +00:00			`tag1`
			`/`
French + some more 2010-05-28 10:12:46 +00:00			`M - V - M - V - tag2 tag1`
			`\ /`
			`M --- V - tag2`
			`\ \`
			`\ M`
Good trees 2010-05-27 14:57:29 +00:00			`\ tag1`
			`\ /`
			`V - tag2`
			`\`
			`M`
French + some more 2010-05-28 10:12:46 +00:00			`</pre>`
Good trees 2010-05-27 14:57:29 +00:00
Finished before reread 2010-05-28 13:02:52 +00:00
			`can be done using the following one state deterministic tree transducer:`
Good trees 2010-05-27 14:57:29 +00:00

French + some more 2010-05-28 10:12:46 +00:00			`> C -> ε`
			`> E -> R`
			`> R -> V`

Finished before reread 2010-05-28 13:02:52 +00:00			`Wich can be traduced by the following simple search and replace directives:`
French + some more 2010-05-28 10:12:46 +00:00
			`<code class="perl">`
			`s/C//g`
			`s/E/M/g`
			`s/R/V/g`
			`</code>`

Finished before reread 2010-05-28 13:02:52 +00:00			`Once adapted to <sc>xml</sc> it becomes:`
French + some more 2010-05-28 10:12:46 +00:00
			`<code class="perl">`
Almost done english version 2010-05-28 12:33:04 +00:00			`s%</?contenu>%%g`
			`s%<enfant>%<item name="menu">%g`
			`s%</enfant>%<item>%g`
			`s%</?rubrique>%<value>%g`
			`s%</rubrique>%</value>%g`
French + some more 2010-05-28 10:12:46 +00:00			`</code>`
Good trees 2010-05-27 14:57:29 +00:00
			`That is all.`

Finished before reread 2010-05-28 13:02:52 +00:00			`# Conclusion`
tree article with small technical ameliorations 2010-05-25 06:13:27 +00:00
Finished before reread 2010-05-28 13:02:52 +00:00			`It should seems a bit paradoxal, but sometimes the most efficient approach to a pragmatic problem is to use the theoretical methodology.`