Skip to content

Fails on filenames that use a character encoding different from the system #28

@StyXman

Description

@StyXman

I have a friend that has a audio collection that predates the general availability of UTF-8 on OSs. He also has a lot of music with band, album and son names that include non ascii chars. Combine those two and you get:

Traceback (most recent call last):
  File "/usr/bin/collectiongain", line 6, in <module>
    collectiongain()
  File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 341, in collectiongain
    do_collectiongain(args[0], opts.ref_level, opts.force, opts.dry_run,
  File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 274, in do_collectiongain
    collect_files(music_dir, files, visited_cache,
  File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 117, in collect_files
    print("  [%i] %s |" % (i, filepath), end='')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udced' in position 49: surrogates not allowed

Notice that these are valid filenames (from the OS point of view; on Unix, any char except \0x00 and / can be part of the path), just not valid UTF-8. Yes, he could sit down and rename all those files and directories, but I guess he won't be the only one.

OTOH, you could say 'go fix your filenames' and we will understand. Cheers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions