A question about handling c-quoted filenames (from git diff)

The `path()` method (getter) in `PatchedFile` class has the code that tries to handle quoted filenames:

```python
        quoted = filepath.startswith('"') and filepath.endswith('"')
        if quoted:
            filepath = filepath[1:-1]

        if RE_PATCH_FILE_PREFIX.match(filepath):
            filepath = filepath[2:]

        if quoted:
            filepath = '"{}"'.format(filepath)

        return filepath
```

However, if `path()` was meant to give the file name as it exist in the filesystem in the repository, this is not enough. The code simply strips file prefix, and re-wraps the result in quotes (e.g. extracting `"name with \"quotes\""` from `"a/name with \"quotes\""`, and `file` from `a/file`).

Should unidiff decode the quoted path, or provide a separate mechanism to decode a c-quoted path that `git diff` uses?

Here is my code that actually tries to decode c-quoted filename:

```python
def decode_c_quoted_str(text: str) -> str:
    """C-style name unquoting

    See unquote_c_style() function in 'quote.c' file in git/git source code
    https://github.com/git/git/blob/master/quote.c#L401

    This is subset of escape sequences supported by C and C++
    https://learn.microsoft.com/en-us/cpp/c-language/escape-sequences
    """
    escape_dict = {
        'a': '\a',  # Bell (alert)
        'b': '\b',  # Backspace
        'f': '\f',  # Form feed
        'n': '\n',  # New line
        'r': '\r',  # Carriage return
        't': '\t',  # Horizontal tab
        'v': '\v',  # Vertical tab
    }

    quoted = text.startswith('"') and text.endswith('"')
    if quoted:
        text = text[1:-1]  # remove quotes

        buf = bytearray()
        escaped = False
        oct_str = ''

        for ch in text:
            if not escaped:
                if ch != '\\':
                    buf.append(ord(ch))
                else:
                    escaped = True
                    oct_str = ''
            else:
                if ch in ('"', '\\'):
                    buf.append(ord(ch))
                    escaped = False
                elif ch in escape_dict:
                    buf.append(ord(escape_dict[ch]))
                    escaped = False
                elif '0' <= ch <= '7':  # octal values with first digit over 4 overflow
                    oct_str += ch
                    if len(oct_str) == 3:
                        byte = int(oct_str, base=8)  # byte in octal notation
                        if byte > 256:
                            raise ValueError(f'Invalid octal escape sequence \\{oct_str} in "{text}"')
                        buf.append(byte)
                        escaped = False
                        oct_str = ''
                else:
                    raise ValueError(f'Unexpected character \'{ch}\' in escape sequence when parsing "{text}"')

        if escaped:
            raise ValueError(f'Unfinished escape sequence when parsing "{text}"')

        text = buf.decode(errors=ENCODING_ERRORS)

    return text
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about handling c-quoted filenames (from git diff) #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

A question about handling c-quoted filenames (from git diff) #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions