Skip to content

A question about handling c-quoted filenames (from git diff) #9

@jnareb

Description

@jnareb

The path() method (getter) in PatchedFile class has the code that tries to handle quoted filenames:

        quoted = filepath.startswith('"') and filepath.endswith('"')
        if quoted:
            filepath = filepath[1:-1]

        if RE_PATCH_FILE_PREFIX.match(filepath):
            filepath = filepath[2:]

        if quoted:
            filepath = '"{}"'.format(filepath)

        return filepath

However, if path() was meant to give the file name as it exist in the filesystem in the repository, this is not enough. The code simply strips file prefix, and re-wraps the result in quotes (e.g. extracting "name with \"quotes\"" from "a/name with \"quotes\"", and file from a/file).

Should unidiff decode the quoted path, or provide a separate mechanism to decode a c-quoted path that git diff uses?

Here is my code that actually tries to decode c-quoted filename:

def decode_c_quoted_str(text: str) -> str:
    """C-style name unquoting

    See unquote_c_style() function in 'quote.c' file in git/git source code
    https://github.com/git/git/blob/master/quote.c#L401

    This is subset of escape sequences supported by C and C++
    https://learn.microsoft.com/en-us/cpp/c-language/escape-sequences
    """
    escape_dict = {
        'a': '\a',  # Bell (alert)
        'b': '\b',  # Backspace
        'f': '\f',  # Form feed
        'n': '\n',  # New line
        'r': '\r',  # Carriage return
        't': '\t',  # Horizontal tab
        'v': '\v',  # Vertical tab
    }

    quoted = text.startswith('"') and text.endswith('"')
    if quoted:
        text = text[1:-1]  # remove quotes

        buf = bytearray()
        escaped = False
        oct_str = ''

        for ch in text:
            if not escaped:
                if ch != '\\':
                    buf.append(ord(ch))
                else:
                    escaped = True
                    oct_str = ''
            else:
                if ch in ('"', '\\'):
                    buf.append(ord(ch))
                    escaped = False
                elif ch in escape_dict:
                    buf.append(ord(escape_dict[ch]))
                    escaped = False
                elif '0' <= ch <= '7':  # octal values with first digit over 4 overflow
                    oct_str += ch
                    if len(oct_str) == 3:
                        byte = int(oct_str, base=8)  # byte in octal notation
                        if byte > 256:
                            raise ValueError(f'Invalid octal escape sequence \\{oct_str} in "{text}"')
                        buf.append(byte)
                        escaped = False
                        oct_str = ''
                else:
                    raise ValueError(f'Unexpected character \'{ch}\' in escape sequence when parsing "{text}"')

        if escaped:
            raise ValueError(f'Unfinished escape sequence when parsing "{text}"')

        text = buf.decode(errors=ENCODING_ERRORS)

    return text

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions