Skip to content

Conversation

@mickeyl
Copy link

@mickeyl mickeyl commented Sep 13, 2013

To ease CDATA processing, TBXML used to use a trick where after
moving the actual content 'over' the start of the CDATA section,
the remaning characters to the right are been overwritten with whitespace.
Lateron, TBXML cleans all the whitespace within(!) the text content,
thus removing the whitespace it added in the first step.

However, removing significant whitespace (enclosed within the text
portion of a tag) is against the spec. It's unfortunately not enough
to remove said portions of code, but you also need to catch up with the
CDATA trick. My approach is to indicate the end of the text with a \0
marker and setting the elementStart appropriately to continue searching.

As a neat side-effect, this also slightly improves the parsing speed.

To ease CDATA processing, TBXML used to use a trick where after
moving the actual content 'over' the start of the CDATA section,
the remaning characters to the right are been overwritten with whitespace.
Lateron, TBXML cleans all the whitespace within(!) the text content,
thus removing the whitespace it added in the first step.

However, removing significant whitespace (enclosed within the text
portion of a tag) is against the spec. It's unfortunately not enough
to remove said portions of code, but you also need to catch up with the
CDATA trick. My approach is to indicate the end of the text with a \0
marker and setting the elementStart appropriately to continue searching.

As a neat side-effect, this also slightly improves the parsing speed.
@mickeyl
Copy link
Author

mickeyl commented Mar 12, 2014

Thanks for your comment. In order to fix another oddity, I had to patch the computation of the elementStart to something more simple – I believe this also fixes your issue with multiple CDATA sections back-to-back. I will run a test asap.

Unfortunately some servers still send ascii, macosroman, or similar encodings –
in such cases, TBXML parses null instead of the right strings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants