Add UTF-16 and UTF-32 support

Currently, UTF-8 is the only supported encoding for guardonce. UTF-16 and UTF-32 files cannot be processed. I began to improve the handling of those files in #22 by ensuring they were flagged as a problem or ignored. However, it's possible to do better.

There's are a few heuristics that can be used to guess that a file is UTF-16 or UTF-32. The simplest is to check if the first few bytes of the file match a BOM. It's extremely unlikely that any header file in another encoding would start with the same bytes as a UTF BOM, so this seems sufficient for our case.

The encoding used for reading should also be used for writing when processing the file in place. However, I'm less sure about the correct behaviour for printing the new file to stdout. I suspect that I should just use the same encoding there too, though perhaps I should use the output stream's desired encoding if it is known.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add UTF-16 and UTF-32 support #28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add UTF-16 and UTF-32 support #28

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions