-
Notifications
You must be signed in to change notification settings - Fork 3
Description
I really like how this library isn't too bothered about the order of different XML elements, i.e. diffing the following:
<root>
<example1 id="example1" />
<example2 id="example2" />
<example3 id="example3" />
<example4 id="example4" />
</root>and
<root>
<example3 id="example3" />
<example1 id="example1" />
<example4 id="example4" />
</root>results in:
= Element "root"
...- Element "example2" "id"="example2"
However, I've noticed that the diff shown when removing an element that has siblings with the same name isn't as "clean" i.e. (just removing the numbers from the element names compared to the sample above):
<root>
<example id="example1" />
<example id="example2" />
<example id="example3" />
<example id="example4" />
</root>vs
<root>
<example id="example3" />
<example id="example1" />
<example id="example4" />
</root>which results in:
= Element "root"
...= Element "example" "id"="example3"
......- Attribute: "id" with value: "example1"
......+ Attribute: "id" with value: "example3"
...= Element "example" "id"="example1"
......- Attribute: "id" with value: "example2"
......+ Attribute: "id" with value: "example1"
...= Element "example" "id"="example4"
......- Attribute: "id" with value: "example3"
......+ Attribute: "id" with value: "example4"
...- Element "example" "id"="example4"
It'd be great if we could add the ability to produce a diff like the following instead:
= Element "root"
...- Element "example" "id"="example2"
In my opinion it makes it much clearer to see what actually changed. I imagine that if we want to keep the old behavior, we could add a parameter to the XmlComparer constructor which would define which comparison algorithm to use, and it could have a default value so as not to affect any existing applications using this library.
I think it's this code and/or this code that would need to change.
I realize that it would probably be useful if we can define what makes an element "the same". When the elements have no children, it is fairly simple - an element with the same name and exact same attributes can be considered the same.
In fact, rather than deciding on complicated logic for the other cases (elements with children, more/less attributes/different values etc.), maybe the simplest method to produce the "cleanest" diffs would be to find the element with the same name that has the least differences. i.e. for each element in the destination document, find all elements with the same name in the source document and order them by the count of differences between the two, and take the one with the least differences. If all the destination elements have already found matches in the source document, those remaining unmatched source elements can be considered removed. Hopefully this won't affect performance too much, as I like how fast this library currently is.
Maybe I'm over-engineering this in my head, but it may even make sense to add weighting so that changes to attributes would score differently to changes to children (or text content), so that something like:
<root>
<example>Hello World!</example>
<example attr="value" />
</root>vs
<root>
<example attr="foobar" />
<example>hello world.</example>
</root>would produce a more "natural" diff - i.e. the Hello World! text becomes hello world. and the attr value becomes foobar, i.e. one "operation", as opposed to 1. removing the attribute 2. adding the text (or the opposite) or 1. removing the attribute. 2. adding the same attribute with a different value being 2 operations.