test please delete

**NickFitz** · 12 April 2009, 21:03

Originally posted by NickFitz View Post

I'll have to work out just where the problem lies first - it could be in WP, or it could be in TinyMCE, which WP uses as an editor.

In fact it could be both... it looks something is resolving entity references when it shouldn't, so things like <xsl:template> get turned into <xsl:template>, and something else then tries to treat it as XHTML, but without being namespace-aware, so that gets turned into <xsl :template> (with a space between the namespace prefix and the colon) and then the closing </xsl:template> gets turned into </xsl>

On which note, I'm off to the pub, rather later than planned

**NickFitz** · 13 April 2009, 02:51

Originally posted by NickFitz View Post

I'll have to work out just where the problem lies first - it could be in WP, or it could be in TinyMCE, which WP uses as an editor.

In fact it could be both... it looks something is resolving entity references when it shouldn't, so things like <xsl:template> get turned into <xsl:template>, and something else then tries to treat it as XHTML, but without being namespace-aware, so that gets turned into <xsl :template> (with a space between the namespace prefix and the colon) and then the closing </xsl:template> gets turned into </xsl>

Of course the fundamental problem is immediately apparent: somebody is labouring under the delusion that it's possible to parse XML (of which both XHTML and XSLT are dialects) using regular expressions.

It isn't.

You may be able to get away with it when the XML is definitely constrained to one or several pre-specified dialects. For the general case you can't. Parsing XML requires a Turing-complete parser, and regular expression engines aren't Turing-complete: they are finite automata.

When the use of namespaces is thrown into the mix, the failure of regular expressions as a means for parsing XML becomes even more apparent.

It all comes down to the common misunderstanding that XML is a text-based format for data representation, leading to the conclusion that as regular expressions are good for dealing with strings they must be good for parsing XML.

Of course XML isn't a text-based format for data representation. It is merely a data representation model that may be easily serialised into a textual format, which is not the same thing at all.

In fact, this is the very first point made in Henri Sivonen's seminal paper "HOWTO Avoid Being Called a Bozo When Producing XML".

The other points in that paper should also have been read and understood by the people responsible for the fail I am currently enduring

(BTW, if you're ever invited to "bug bash" somebody else's code, it's always worth entering a few "astral plane" Unicode characters into a form - the fail is usually epic. The last time I did this, the entire database had to be rolled back to an earlier backup before we could continue looking for bugs

)

**BrilloPad** · 13 April 2009, 07:49

Morning all

**BrilloPad** · 13 April 2009, 07:50

Originally posted by NickFitz View Post

Of course the fundamental problem is immediately apparent: somebody is labouring under the delusion that it's possible to parse XML (of which both XHTML and XSLT are dialects) using regular expressions.

It isn't.

You may be able to get away with it when the XML is definitely constrained to one or several pre-specified dialects. For the general case you can't. Parsing XML requires a Turing-complete parser, and regular expression engines aren't Turing-complete: they are finite automata.

When the use of namespaces is thrown into the mix, the failure of regular expressions as a means for parsing XML becomes even more apparent.

It all comes down to the common misunderstanding that XML is a text-based format for data representation, leading to the conclusion that as regular expressions are good for dealing with strings they must be good for parsing XML.

Of course XML isn't a text-based format for data representation. It is merely a data representation model that may be easily serialised into a textual format, which is not the same thing at all.

In fact, this is the very first point made in Henri Sivonen's seminal paper "HOWTO Avoid Being Called a Bozo When Producing XML".

The other points in that paper should also have been read and understood by the people responsible for the fail I am currently enduring

(BTW, if you're ever invited to "bug bash" somebody else's code, it's always worth entering a few "astral plane" Unicode characters into a form - the fail is usually epic. The last time I did this, the entire database had to be rolled back to an earlier backup before we could continue looking for bugs

)

test please delete

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Partners

Advertisers

Contractor Services

CUK News

test please delete

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Partners

Advertisers

Contractor Services

CUK News

Tag Cloud