Skip to content

Implement case-sensitive parsing for SVG tags and attributes #102

Closed
AhnafCodes wants to merge 0 commit intot-strings:mainfrom
AhnafCodes:dev
Closed

Implement case-sensitive parsing for SVG tags and attributes #102
AhnafCodes wants to merge 0 commit intot-strings:mainfrom
AhnafCodes:dev

Conversation

@AhnafCodes
Copy link

@AhnafCodes AhnafCodes commented Feb 14, 2026

  • Add SVG_TAG_FIX and SVG_CASE_FIX mappings to tdom/nodes.py to restore camelCase for SVG elements and attributes.

  • Update TemplateParser in tdom/parser.py to track SVG nesting depth and apply case corrections when inside an <svg> context.

  • Add SVG rendering tests to tdom/nodes_test.py.

  • Add SVG parsing tests to tdom/parser_test.py covering tag/attribute casing, nesting, and non-SVG contexts.

  • Bump version to 0.1.13 in uv.lock.

  • ignore commits check "files changed"

@AhnafCodes AhnafCodes changed the title Implement case-sensitive parsing for SVG tags and attributes Implement case-sensitive parsing for SVG tags and attributes and allowing Sequence list and tuple(as default) Feb 14, 2026
@AhnafCodes AhnafCodes changed the title Implement case-sensitive parsing for SVG tags and attributes and allowing Sequence list and tuple(as default) Implement case-sensitive parsing for SVG tags and attributes Feb 15, 2026
@davepeck
Copy link
Contributor

Why would we do this? Why not just get the case right in your t-string to begin with?

@ianjosephwilson
Copy link
Contributor

This appears to be a significant drawback of using an html parser but this seems like a work around that would probably work most of the time?

I did have a few concerns though:

    1. Is MathML "core" going to have a similar issues? I have not used it but it seems like another loose end out there.
    1. With this implementation, is there a reason to not use the self.stack to just lookup if "svg" is a parent? Ie. Instead of using a depth integer? Also, does it matter how many times an svg element is nested in another svg element?

We might need to do a little bit more to make dynamic combinations of templates work. Maybe store some information with the cache key regarding the "context" of the root nodes. Or maybe we'd have a completely separate cache for that? To handle rewrite situations like t"<div>{t'<clippath />'}</div>" vs t"<svg>{t'<clippath />'}</svg>" in one case the clippath would be rewritten and then in one case it would not. I guess the concept would be an implied default namespace and then an explicit namespace that isn't the default. Like key_html = ('', t'<clippath />'.strings) vs key_svg = ('svg', t'<clippath />'.strings)

@davepeck
Copy link
Contributor

davepeck commented Feb 16, 2026

This appears to be a significant drawback of using an html parser

Wait, what's the drawback as things currently stand? Is the parser base class altering the incoming case?

@ianjosephwilson
Copy link
Contributor

@davepeck I think the HTMLParser lower cases the tags and attributes by default: https://docs.python.org/3/library/html.parser.html#html.parser.HTMLParser.handle_starttag

I'm not super familiar with svg but it appears to be XML based and the tags and attributes are case sensitive and many of them are camelCase. Which I guess is a surprise but maybe that was just the only way to get the standards through?

@davepeck
Copy link
Contributor

davepeck commented Feb 16, 2026

I think the HTMLParser lower cases the tags and attributes by default

Ugh. Okay, makes sense then. And yes, it looks like MathML may have similar issues.

Hrm. Agree then that this PR approach may be generally good. Although maybe we should consider adopting a different parser.

@ianjosephwilson
Copy link
Contributor

My chrome and firefox don't seem to care so maybe it is just the spec? I think the fact that tdom overrides what is in the template combined with the output not being compliant is kind of annoying though. I was looking around yesterday and there is a way to get the whole open tag string with the current parser and maybe we could dig out the original tag case from there but doing the attributes would have tdom implementing ... a parser in a parser. Maybe we should think on it?

The simplicity of the current parser is kind of nice because it isn't trying to enforce anything that might break the placeholder trick. If we tried to do an xml parser for svg for example then the place holders might not work. Sort of that html but not html (but definitely! not xml) problem.

@AhnafCodes
Copy link
Author

AhnafCodes commented Feb 16, 2026

this is primarily for SVG support. HTML-parser-based libraries that supports SVG (e.g., html5lib, BeautifulSoup) do the similar thing. The lookup tables (SVG_TAG_FIX, SVG_CASE_FIX)
the same ones defined in the https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inforeign
tdom's parser extends HTMLParser, that normalizes all HTML(case-insensitive) tags and attributes to lowercase before handing them to handle_starttag. clipPath becomes
clippath and viewBox is viewbox. SVG is case-sensitive. The browsers require:

  • Tags like clipPath, foreignObject, linearGradient
  • Attributes like viewBox, attributeName, baseFrequency (camelCase, not lowercase)

@davepeck
Copy link
Contributor

Thanks again @AhnafCodes first for bringing this to our attention and then for providing a potential solution. @ianjosephwilson after mulling it over, I don't know if long term we'll adopt an entirely different (or even custom) parser base, but for now... this seems roughly like a reasonable set of changes to me.

As for the question about caching, etc. Maybe we would like an explicit svg(template: Template) -> ... function as well, or a html(template: Template, *, svg: bool = False) -> ...? If you've got an <svg> in the markup you're sending in, html(...) is fine; if not -- say, it's an svg fragment -- then you've got to be explicit about it. Hrm.

@ianjosephwilson
Copy link
Contributor

@davepeck It does now seem that the HTMLParser is probably doing the right thing here.

My feeling so far is that this "processing context" situation is going to have so many edge cases that it becomes a common scenario so we probably better start embracing it. I was thinking something like this:

class ProcessContext:
    parent_tag: str | None = None
    ns: str | None = "html"

def html(template: Template, assume_ctx: ProcessContext | None = None):
    ...

Where we can set a process context either implicitly or explictly which could then be carried recursively through the processing pipeline as something like last_ctx.

This is going to (temporarily?) block my intermediate compiled Template implementation. I just can't imagine a "not crazy" way of making it work with that. Especially since attributes must be "fixed".

That being said I think until we know better for some reason we should probably try to keep the tnodes as normalized as possible, ie. all lower case as parsed. The semantic information that the SVG is in there OR NOT is already captured. Fixing the case OR NOT seems like another stage in the pipeline. I think it would be another example of a "speed-up" we could get with a two-stage processor but its just too complicated to do that right now.

Thanks for all the information @AhnafCodes I think this could work but it would be easier for now to put the "fixing" into the processor until we have more information about the MathML and larger tdom changes have settled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants