Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 44 additions & 3 deletions lib/hologram/template/parser.ex
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,23 @@ defmodule Hologram.Template.Parser do
|> parse_tokens(:end_tag, rest)
end

def parse_tokens(context, :end_tag, [
{:symbol, "-"} = token1 | [{:string, str} = token2 | rest]
]) do
context
|> set_tag_name(context.tag_name <> "-" <> str)
|> add_processed_token(token1)
|> add_processed_token(token2)
|> parse_tokens(:end_tag, rest)
end

def parse_tokens(context, :end_tag, [{:symbol, "-"} = token | rest]) do
context
|> set_tag_name(context.tag_name <> "-")
|> add_processed_token(token)
|> parse_tokens(:end_tag, rest)
end
Comment on lines +269 to +284
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Symmetric script-mode issue in the end-tag hyphen handlers

maybe_disable_script_mode("script") is invoked in :end_tag_name (line 256) before the hyphen continuation runs. This means </script-element> prematurely exits script mode — which would produce incorrect parsing if encountered inside a real <script> block (unlikely in practice, but the same root cause applies).

The mirror fix is to call maybe_enable_script_mode(context.tag_name) in both new :end_tag clauses to undo the premature deactivation when the prefix was "script".

🐛 Proposed fix
  def parse_tokens(context, :end_tag, [
        {:symbol, "-"} = token1 | [{:string, str} = token2 | rest]
      ]) do
    context
+   |> maybe_enable_script_mode(context.tag_name)
    |> set_tag_name(context.tag_name <> "-" <> str)
    |> add_processed_token(token1)
    |> add_processed_token(token2)
    |> parse_tokens(:end_tag, rest)
  end

  def parse_tokens(context, :end_tag, [{:symbol, "-"} = token | rest]) do
    context
+   |> maybe_enable_script_mode(context.tag_name)
    |> set_tag_name(context.tag_name <> "-")
    |> add_processed_token(token)
    |> parse_tokens(:end_tag, rest)
  end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def parse_tokens(context, :end_tag, [
{:symbol, "-"} = token1 | [{:string, str} = token2 | rest]
]) do
context
|> set_tag_name(context.tag_name <> "-" <> str)
|> add_processed_token(token1)
|> add_processed_token(token2)
|> parse_tokens(:end_tag, rest)
end
def parse_tokens(context, :end_tag, [{:symbol, "-"} = token | rest]) do
context
|> set_tag_name(context.tag_name <> "-")
|> add_processed_token(token)
|> parse_tokens(:end_tag, rest)
end
def parse_tokens(context, :end_tag, [
{:symbol, "-"} = token1 | [{:string, str} = token2 | rest]
]) do
context
|> maybe_enable_script_mode(context.tag_name)
|> set_tag_name(context.tag_name <> "-" <> str)
|> add_processed_token(token1)
|> add_processed_token(token2)
|> parse_tokens(:end_tag, rest)
end
def parse_tokens(context, :end_tag, [{:symbol, "-"} = token | rest]) do
context
|> maybe_enable_script_mode(context.tag_name)
|> set_tag_name(context.tag_name <> "-")
|> add_processed_token(token)
|> parse_tokens(:end_tag, rest)
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/hologram/template/parser.ex` around lines 269 - 284, The end-tag hyphen
continuation clauses in parse_tokens (the two def parse_tokens(..., :end_tag,
...) clauses) currently run after maybe_disable_script_mode("script") was called
in :end_tag_name and thus can leave script mode disabled incorrectly; update
both of these :end_tag handlers to call
maybe_enable_script_mode(context.tag_name) after you extend context.tag_name
(i.e., after set_tag_name(...) completes) so that if the accumulated prefix is
"script" script mode is re-enabled before continuing parsing, then proceed with
add_processed_token(...) and recursive parse_tokens(:end_tag, rest).


def parse_tokens(context, :end_tag, [{:symbol, ">"} = token | rest]) do
context
|> add_end_tag()
Expand Down Expand Up @@ -435,7 +452,6 @@ defmodule Hologram.Template.Parser do
context
|> reset_attributes()
|> set_tag_name(tag_name)
|> maybe_enable_script_mode(tag_name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving maybe_enable_script_mode from :start_tag_name to the :start_tag ">" / "/>" handlers misses another path that closes start tags:

def parse_tokens(context, :attribute_name, [{:symbol, ">"} = token | rest]) do
...

<script defer> triggers this path (boolean attribute immediately before >), so script mode is never enabled. <script defer > (with a space) still works because whitespace returns to :start_tag first.

How would a solution look that keeps setting script mode in :start_tag_name (the original placement)? That way there's a single place that handles it, instead of needing to patch every clause that can close a start tag - which is fragile if more constructs are introduced in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking something like this.

  def parse_tokens(context, :start_tag_name, [{:symbol, "-"} = token | rest]) do
    context
    |> set_tag_name(context.tag_name <> "-")
    |> add_processed_token(token)
    |> parse_tokens(:start_tag_name, rest)
  end

  def parse_tokens(context, :start_tag_name, [{:string, tag_name} = token | rest]) do
    context
    |> set_tag_name(context.tag_name <> tag_name)
    |> add_processed_token(token)
    |> parse_tokens(:start_tag_name, rest)
  end

  def parse_tokens(context, :start_tag_name, tokens) do
    context
    |> reset_attributes()
    |> maybe_enable_script_mode(context.tag_name)
    |> set_prev_status(:start_tag_name)
    |> parse_tokens(:start_tag, tokens)
  end

This will require a slight change to the case when matching on < in order to start parsing the :start_tag_name. We would need to set_tag_name to "". There may be a better place to do this as well.

The reason for this is that the original implementation (current dev) expected the context.tag_name to have been overwritten by the single :string that makes up the name. With this change, multiple :strings can create the tag name, meaning we potentially need to concatenate multiple values to create the tag name. The fact that context.tag_name is never cleared/removed meant that something like

Hologram.Template.Parser.parse_markup("<script></script><script></script>")

would end up being parsed as

[
  start_tag: {"script", []},
  end_tag: "script",
  start_tag: {"scriptscript", []},
  end_tag: "scriptscript"
]

Notice how the second set of tags is scriptscript instead of the expected script.

The above implementation should allow parsing only the :strings and symbol: "-" for a tag name. After it reaches another token, such as :whitespace or a different :symbol, it will check if it should enable script mode and continue parsing as expected.

This does have the downside that there is an extra function call for each :start_tag_name because of the third function head that acts as a catch-all because I thought it would be worse to explicitly check for either :whitespace to start (potentially) checking for attributes or check for a symbol: ">" in order to close the tag. With this catch-all in place parsing will continue as it currently does.

Let me know if this approach is in line with what you were thinking or if there is another approach that you would like me to explore as well.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankhers Thanks for the detailed write-up! I like the idea of resetting the tag name to avoid the concatenation issue with consecutive tags. However, I'd like to avoid the catch-all and the extra dispatch it introduces for every tag.

What if instead we call maybe_disable_script_mode when {:symbol, "-"} is encountered in :start_tag_name? That would undo script mode if it was prematurely enabled for something like <script-element>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that we can do that. The main reason for the catch-all is to move from :start_tag_name to :start_tag. Otherwise we won't really be able to make that transition. I just also threw the script mode check in there so that there is a single place for it instead of doing it in multiple places.

Previously there was only one valid case for :start_tag_name. That being the whole name of the element. This meant that as soon as you have this, you could move on to :start_tag.

Unfortunately now, there are multiple valid cases for :start_tag_name. The minimum of a string as the first part of the name, then optionally some number of - and other strings. Ultimately it will be one of > or /> in order to close the tag, or some kind of whitespace which is likely to start the attributes. Though it could also be whitespace and then closing the tag as well.

If we wanted to avoid the extra dispatch per tag, we would require three additional function heads,

  1. Check for > - which would allow us to skip checking for attributes and such, but there is now 2 places where we are checking for this
  2. Check for `/> - the same issue as above
  3. Check for whitespace - which could allow us to move on to checking for attributes

I think for complexity purposes, I would prefer the additional function call per element, but I will admit I do not know how much overhead that would add to the process, especially for larger documents.

|> add_processed_token(token)
|> set_prev_status(:start_tag_name)
|> parse_tokens(:start_tag, rest)
Expand Down Expand Up @@ -476,11 +492,36 @@ defmodule Hologram.Template.Parser do
end

def parse_tokens(context, :start_tag, [{:symbol, "/>"} = token | rest]) do
parse_start_tag_end(context, token, rest, true)
context
|> maybe_enable_script_mode(context.tag_name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't maybe_enable_script_mode/2 call redundant for self-closed cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right.

|> parse_start_tag_end(token, rest, true)
end

def parse_tokens(context, :start_tag, [{:symbol, ">"} = token | rest]) do
parse_start_tag_end(context, token, rest, false)
context
|> maybe_enable_script_mode(context.tag_name)
|> parse_start_tag_end(token, rest, false)
end

def parse_tokens(%{prev_status: :start_tag_name} = context, :start_tag, [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this 2 to # --- START TAG NAME --- section?

{:symbol, "-"} = token1 | [{:string, str} = token2 | rest]
]) do
context
|> set_tag_name(context.tag_name <> "-" <> str)
|> add_processed_token(token1)
|> add_processed_token(token2)
|> set_prev_status(:start_tag_name)
|> parse_tokens(:start_tag, rest)
end

def parse_tokens(%{prev_status: :start_tag_name} = context, :start_tag, [
{:symbol, "-"} = token | rest
]) do
context
|> set_tag_name(context.tag_name <> "-")
|> add_processed_token(token)
|> set_prev_status(:start_tag_name)
|> parse_tokens(:start_tag, rest)
end

def parse_tokens(context, :start_tag, [{:string, _str} = token | rest]) do
Expand Down
60 changes: 60 additions & 0 deletions test/elixir/hologram/template/parser_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,66 @@ defmodule Hologram.Template.ParserTest do
test "whitespaces after end tag name" do
assert parse_markup("</div \n\r\t>") == [end_tag: "div"]
end

test "hyphenated element start tag" do
assert parse_markup("<foo-bar>") == [start_tag: {"foo-bar", []}]
end

test "multi-hyphenated element start tag" do
assert parse_markup("<foo-bar-baz>") == [start_tag: {"foo-bar-baz", []}]
end

test "hyphenated ending element start tag" do
assert parse_markup("<foo-bar->") == [start_tag: {"foo-bar-", []}]
end

test "element name cannot start with a hyphen" do
assert_raise TemplateSyntaxError, fn ->
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test the error message?

parse_markup("<-foo>")
end
end

test "multiple simultaneous hyphenated element start tag" do
assert parse_markup("<foo--bar>") == [start_tag: {"foo--bar", []}]
end

test "hyphenated element start tag with attributes" do
assert parse_markup(~s'<foo-bar class="x">') == [
start_tag: {"foo-bar", [{"class", [text: "x"]}]}
]
end

test "hyphenated element self-closed start tag" do
assert parse_markup("<foo-bar />") == [self_closing_tag: {"foo-bar", []}]
end

test "multi-hyphenated element self-closed start tag" do
assert parse_markup("<foo-bar-baz />") == [self_closing_tag: {"foo-bar-baz", []}]
end

test "hyphenated ending element self-closed start tag" do
assert parse_markup("<foo-bar- />") == [self_closing_tag: {"foo-bar-", []}]
end

test "multiple consecutive hyphenated element self-closed start tag" do
assert parse_markup("<foo--bar />") == [self_closing_tag: {"foo--bar", []}]
end

test "hyphenated element end tag" do
assert parse_markup("</foo-bar>") == [end_tag: "foo-bar"]
end

test "multi-hyphenated element end tag" do
assert parse_markup("</foo-bar-baz>") == [end_tag: "foo-bar-baz"]
end

test "hyphenated ending element end tag" do
assert parse_markup("</foo-bar->") == [end_tag: "foo-bar-"]
end

test "multiple simultaneous hyphenated element end tag" do
assert parse_markup("</foo--bar>") == [end_tag: "foo--bar"]
end
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are quite a few hyphenated tag tests now, it might be worth extracting them into a separate describe "custom element tags" block (and renaming the current one to "standard element tags"). This would keep things organized as more cases are added.

Also a few tests I think are missing:

  • Leading hyphen in end tags (</-foo>) should raise
  • Round-trip: <foo-bar>text</foo-bar>
  • Script mode is NOT enabled for <script-widget>...</script-widget>
  • Script mode IS still enabled for <script> with attributes, e.g. <script defer>var x = "<div>";</script>
  • etc.


describe "component tags" do
Expand Down
Loading