Fails on filenames that use a character encoding different from the system

I have a friend that has a audio collection that predates the general availability of UTF-8 on OSs. He also has a lot of music with band, album and son names that include non ascii chars. Combine those two and you get:

```
Traceback (most recent call last):
  File "/usr/bin/collectiongain", line 6, in <module>
    collectiongain()
  File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 341, in collectiongain
    do_collectiongain(args[0], opts.ref_level, opts.force, opts.dry_run,
  File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 274, in do_collectiongain
    collect_files(music_dir, files, visited_cache,
  File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 117, in collect_files
    print("  [%i] %s |" % (i, filepath), end='')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udced' in position 49: surrogates not allowed
```

Notice that these are valid filenames (from the OS point of view; on Unix, any char except `\0x00` and `/` can be part of the path), just not valid UTF-8. Yes, he could sit down and rename all those files and directories, but I guess he won't be the only one.

OTOH, you could say 'go fix your filenames' and we will understand. Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails on filenames that use a character encoding different from the system #28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fails on filenames that use a character encoding different from the system #28

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions