Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
I'll have to work out just where the problem lies first - it could be in WP, or it could be in TinyMCE, which WP uses as an editor.
In fact it could be both... it looks something is resolving entity references when it shouldn't, so things like <xsl:template> get turned into <xsl:template>, and something else then tries to treat it as XHTML, but without being namespace-aware, so that gets turned into <xsl :template> (with a space between the namespace prefix and the colon) and then the closing </xsl:template> gets turned into </xsl>
On which note, I'm off to the pub, rather later than planned
I'll have to work out just where the problem lies first - it could be in WP, or it could be in TinyMCE, which WP uses as an editor.
In fact it could be both... it looks something is resolving entity references when it shouldn't, so things like <xsl:template> get turned into <xsl:template>, and something else then tries to treat it as XHTML, but without being namespace-aware, so that gets turned into <xsl :template> (with a space between the namespace prefix and the colon) and then the closing </xsl:template> gets turned into </xsl>
Of course the fundamental problem is immediately apparent: somebody is labouring under the delusion that it's possible to parse XML (of which both XHTML and XSLT are dialects) using regular expressions.
It isn't.
You may be able to get away with it when the XML is definitely constrained to one or several pre-specified dialects. For the general case you can't. Parsing XML requires a Turing-complete parser, and regular expression engines aren't Turing-complete: they are finite automata.
When the use of namespaces is thrown into the mix, the failure of regular expressions as a means for parsing XML becomes even more apparent.
It all comes down to the common misunderstanding that XML is a text-based format for data representation, leading to the conclusion that as regular expressions are good for dealing with strings they must be good for parsing XML.
Of course XML isn't a text-based format for data representation. It is merely a data representation model that may be easily serialised into a textual format, which is not the same thing at all.
The other points in that paper should also have been read and understood by the people responsible for the fail I am currently enduring
(BTW, if you're ever invited to "bug bash" somebody else's code, it's always worth entering a few "astral plane" Unicode characters into a form - the fail is usually epic. The last time I did this, the entire database had to be rolled back to an earlier backup before we could continue looking for bugs )
Of course the fundamental problem is immediately apparent: somebody is labouring under the delusion that it's possible to parse XML (of which both XHTML and XSLT are dialects) using regular expressions.
It isn't.
You may be able to get away with it when the XML is definitely constrained to one or several pre-specified dialects. For the general case you can't. Parsing XML requires a Turing-complete parser, and regular expression engines aren't Turing-complete: they are finite automata.
When the use of namespaces is thrown into the mix, the failure of regular expressions as a means for parsing XML becomes even more apparent.
It all comes down to the common misunderstanding that XML is a text-based format for data representation, leading to the conclusion that as regular expressions are good for dealing with strings they must be good for parsing XML.
Of course XML isn't a text-based format for data representation. It is merely a data representation model that may be easily serialised into a textual format, which is not the same thing at all.
The other points in that paper should also have been read and understood by the people responsible for the fail I am currently enduring
(BTW, if you're ever invited to "bug bash" somebody else's code, it's always worth entering a few "astral plane" Unicode characters into a form - the fail is usually epic. The last time I did this, the entire database had to be rolled back to an earlier backup before we could continue looking for bugs )
Comment