Skip to content

Conversation

@mike-clark-8192
Copy link

  • Three files had corrupted single-byte CRLF (0d 0a) mixed with UTF-16LE content, breaking git's working-tree-encoding conversion
  • Corrected to proper UTF-16LE CRLF (0d 00 0a 00) so files round-trip correctly through git

- Three files had corrupted single-byte CRLF (0d 0a) mixed with UTF-16LE content, breaking git's working-tree-encoding conversion
- Corrected to proper UTF-16LE CRLF (0d 00 0a 00) so files round-trip correctly through git
@RamonUnch
Copy link
Owner

I cannot find the single word 0D0A in AltSnap.dni file nor in fr_FR.ini, However ko_KR.ini seems to be utf-8 encoded.

It seems you are actually converting to UTF-8 some files?

I do not understant what github is doing with encoding, maybe I should remove the .gitattributes flags, and treat those files like binary blobs.

@mike-clark-8192
Copy link
Author

mike-clark-8192 commented Jan 27, 2026

I think I ran into issues with EOL conversions interacting badly with the character encoding confusion, so the content of my pull request is probably not very helpful.

However, I did some more research, which led me to this recipe. I tested it and it appears to work. It may seem overly complex, but I wasn't able to simplify it further. I hope this helps.

#!/bin/bash
set -euo pipefail

# Fix for AltSnap repo encoding issues.
#
# We want to use a .gitattributes with working-tree-encoding=utf-16le-bom, as
# this enables git diffs on UTF-16 files, which is nice. Also it explicitly
# declares the required encoding for these files, also nice. However
# working-tree-encoding always requires that the blobs be stored as UTF-8 in
# the git repository, regardless of what encoding they will ultimately be
# checked out to. In this repo though, most of the blobs were committed as raw
# UTF-16LE — never converted to UTF-8. When the .gitattributes was added later,
# it confuses git. It results in warnings and results in working-tree-encoding
# not function properly. A broken .gitattributes makes it difficult to correct
# the encoding of the blobs to UTF-8, so we have to do it in multiple steps,
# the first of which is to temporarily remove .gitattributes (we will restore it
# later).
#
# This script:
#   1. Removes .gitattributes so git treats files as raw bytes
#   2. Converts the UTF-16LE blobs to UTF-8 (the format git expects)
#   3. Normalizes line endings to DOS (CRLF)
#   4. Restores .gitattributes (with eol=crlf added)
#   5. Lets git apply the encoding rules, producing proper UTF-16LE-BOM
#      working tree files from the now-correct UTF-8 blobs

REPO_URL="https://github.com/RamonUnch/AltSnap"
CLONE_DIR="AltSnap"

# Fresh clone (expect encoding errors — that's the bug we're fixing)
git clone "$REPO_URL" "$CLONE_DIR"
cd "$CLONE_DIR"

# Step 1: Remove .gitattributes and commit
git rm .gitattributes
git commit -m "remove .gitattributes"

# Re-checkout files as raw bytes (without .gitattributes filtering)
git rm --cached -r . > /dev/null
git reset HEAD > /dev/null 2>&1
git checkout -- .

# Step 2: Convert UTF-16 to UTF-8 and normalize line endings
# Using -f utf-16 (not utf-16le) so iconv reads and consumes the BOM.
# ko_KR.ini is already UTF-8 so skip the iconv step for it.
for f in AltSnap.dni Lang/*; do
    if [ "$f" != "Lang/ko_KR.ini" ]; then
        iconv -f utf-16 -t utf-8 "$f" > "$f.tmp"
        mv "$f.tmp" "$f"
    fi
    unix2dos -e "$f"
done

git add .
git commit -m "normalize git-internal encoding to UTF-8 and line endings to DOS"

# Step 3: Restore .gitattributes with an additional specifier of eol=crlf to 
# enforce Windows compatibility and avoid EOL conversion warnings.
cat > .gitattributes << 'EOF'
*.dni diff merge text eol=crlf linguist-language=ini working-tree-encoding=utf-16le-bom
Lang/* diff merge text eol=crlf linguist-language=ini working-tree-encoding=utf-16le-bom
EOF

git add .gitattributes
git commit -m "add back .gitattributes with eol=crlf"

# Step 4: Force git to re-checkout with the new .gitattributes rules
# Clearing the index is necessary — git won't re-apply working-tree-encoding
# if it thinks the checkout is already current.
git rm --cached -r . > /dev/null 2>&1
git reset --hard HEAD

# Step 5: Commit the re-encoded working tree
git add .
git commit -m "allow .gitattributes to take effect"

# Verify
echo ""
echo "=== Verification ==="
unix2dos -idumbteph AltSnap.dni Lang/*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants