----- isHidden: false menupriority: 1 kind: article created_at: 2010-05-19T22:20:34+02:00 title: How to repair a cutted XML? subtitle: and how to do it without any parsor? author_name: Yann Esposito author_uri: yannesposito.com tags: - tree - HTML - script - ruby ----- For my main page, you can see, a list of my latest blog entry. And you have the first part of each article. To accomplish that, I needed to include the begining of the entry and to cut it somewhere. But now, I had to repair this cutted HTML. Here is an example:

Introduction

The first paragraph

an image

Another long paragraph

After the cut, I obtain:

Introduction

The first paragraph

[] [div]
[div, div]
[div, div, p]

Introduction [div, div]

[div]
[div, p]

The first paragraph [div]

[div] an image [div, p]

Another long paragraph [div]

[]
The algorihm, is then really simple: let res be the XML as a string ; read res and each time you encouter a tag: if it is an opening one: push it to the stack else if it is a closing one: pop the stack. remove any malformed/cutted tag in the end of res for each tag in the stack, pop it, and write: res = res + closed tag return res And `res` contain the repaired XML. Finally, this is the code in ruby I use. The `xml` variable contain the cutted XML. # repair cutted XML code by closing the tags # work even if the XML is cut into a tag. # example: # transform '
toto

hello ]*$/m,'') depth-=1 depth.downto(0).each { |x| res<<= %{} } res end I don't know if the code can help you, but the raisonning should definitively be known.