Conversation
…ault value In Weblate, the maxlinelength is set to 77. If we want to use this package to touch PO files managed by Weblate, it kept being updated by weblate again, then updated by Karambolo because they don't use the same line length
There was a problem hiding this comment.
Fundamentally, your changes look good.
But could you add some tests?
I'd like to see what happens in edge cases where MaxLineLength is 3 or less. In principle, the minimum value that makes sense is 3 4 because the starting and ending quotes take up two characters and escape sequences must be kept together (see below). A MaxLineLength value of 3 can send the generator into an infinite loop currently, we definitely need to do something about that.
We need to decide what to do with nonsensical values. Probably, it's the easiest way to treat those values as
- the minimal possible line length (4)
- or unlimited line length.
What do you think?
UPDATE: Sorry, I was wrong. I did some tests and it turned out that the theoretical minimum of MaxLineLength is larger than 3:
-
On the one hand, escape sequences take up two characters and they must be kept together. E.g.
"\n"is 4 chars long. -
On the other hand, it also depends on character encoding.
Currently, the line break algorithm interprets max. line length in terms of .NET string length. Strings in .NET use UTF16 encoding but define length as the number of code units, not as the number of code points. There is a difference as soon as code points beyond the Basic Multilingual Plane are included, since those are represented using two code units (so called surrogate pairs) in UTF16.
The line break algo doesn't consider surrogate pairs at the moment, which is a bug:
Consider a situation in which the line break happens in the middle of a surrogate pair. This will result in invalid code points at the end of the line and at the start of the next one since .NET usually replaces lone surrogates with a placeholder. That is, the string content will be corrupted.
And then we didn't even consider UTF8, which is usually used as the PO file encoding and uses 1-4 bytes to represent code points...
It seems that the only reasonable solution that could offer a way out of this mess is interpreting max. line length in terms of Unicode code points. But this will need some more research and the rework of the line break algorithm. So let's handle this separately from this PR.
| else | ||
| #endif | ||
| orderedHeaders = headers.OrderBy(kvp => kvp.Key, POHeaderDefaultOrderComparer.Instance); | ||
| orderedHeaders = headers.OrderBy(kvp => kvp.Key, POHeaderDefaultOrderComparer.Instance); |
There was a problem hiding this comment.
Please don't make code formatting changes in a feature PR.
| orderedHeaders = headers.OrderBy(kvp => kvp.Key, POHeaderDefaultOrderComparer.Instance); | |
| orderedHeaders = headers.OrderBy(kvp => kvp.Key, POHeaderDefaultOrderComparer.Instance); |
In Weblate, the maxlinelength is set to 77. If we want to use this package to touch PO files managed by Weblate, it kept being updated by weblate again, then updated by Karambolo because they don't use the same line length