Skip to content

windows_path __repr__ does not work if it contains surrogate characters#181

Merged
yunzheng merged 9 commits intofox-it:mainfrom
respondersGY:fix/surrogates_not_allowed
Aug 21, 2025
Merged

windows_path __repr__ does not work if it contains surrogate characters#181
yunzheng merged 9 commits intofox-it:mainfrom
respondersGY:fix/surrogates_not_allowed

Conversation

@respondersGY
Copy link
Contributor

@respondersGY
Copy link
Contributor Author

@yunzheng and @JSCU-CNI are there ways to make this more robust? See fox-it/dissect.target#1276 for more information about the issue.

Copy link
Member

@yunzheng yunzheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add an unit test?

I think this should be good enough, as this is only when printing to the terminal for viewing consumption.

What it should print is a different matter though. If one would expect that this "\udce4" should be printed as "ä" (i'm not sure if that is the case here as well in your issue). Then this needs to be fixed with the correct encoding when storing it as a utf-8 string. I'm not sure if shellbag is always a fixed windows encoding or dependent on the language setting of the OS.

@DissectBot
Copy link

@respondersGY thank you for your contribution! As this is your first code contribution, please read the following Contributor License Agreement (CLA). If you agree with the CLA, please reply with the following information:

@DissectBot agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
Contributor License Agreement

Contribution License Agreement

This Contribution License Agreement ("Agreement") governs your Contribution(s) (as defined below) and conveys certain license rights to Fox-IT B.V. ("Fox-IT") for your Contribution(s) to Fox-IT"s open source Dissect project. This Agreement covers any and all Contributions that you ("You" or "Your"), now or in the future, Submit (as defined below) to this project. This Agreement is between Fox-IT B.V. and You and takes effect when you click an “I Accept” button, check box presented with these terms, otherwise accept these terms or, if earlier, when You Submit a Contribution.

  1. Definitions.
    "Contribution" means any original work of authorship, including any modifications or additions to an existing work, that is intentionally submitted by You to Fox-IT for inclusion in, or documentation of, any of the software products owned or managed by, or on behalf of, Fox-IT as part of the Project (the "Work").
    "Project" means any of the projects owned or managed by Fox-IT and offered under a license approved by the Open Source Initiative (www.opensource.org).
    "Submit" means any form of electronic, verbal, or written communication sent to Fox-IT or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, Fox-IT for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by You as "Not a Contribution."

  2. Grant of Copyright License. Subject to the terms and conditions of this Agreement, You hereby grant to Fox-IT and to recipients of software distributed by Fox-IT a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works.

  3. Grant of Patent License. Subject to the terms and conditions of this Agreement, You hereby grant to Fox-IT and to recipients of software distributed by Fox-IT a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, maintain, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution(s) alone or by combination of Your Contribution(s) with the Work to which such Contribution(s) was submitted. If any entity institutes patent litigation against You or any other entity (including a cross-claim or counterclaim in a lawsuit) alleging that your Contribution, or the Work to which you have contributed, constitutes direct or contributory patent infringement, then any patent licenses granted to that entity under this Agreement for that Contribution or Work shall terminate as of the date such litigation is filed.

  4. Representations. You represent that:

    • You are legally entitled to grant the above license.
    • each of Your Contributions is Your original creation (see section 8 for submissions on behalf of others).
    • Your Contribution submissions include complete details of any third-party license or other restriction (including, but not limited to, related patents and trademarks) of which you are personally aware and which are associated with any part of Your Contributions.
  5. Employer. If Your Contribution is made in the course of Your work for an employer or Your employer has intellectual property rights in Your Submission by contract or applicable law, You must secure permission from Your employer to make the Contribution before signing this Agreement. In that case, the term "You" in this Agreement will refer to You and the employer collectively. If You change employers in the future and desire to Submit additional Contribution for the new employer, then You agree to sign a new Agreement and secure permission from the new employer before Submitting those Contributions.

  6. Support. You are not expected to provide support for Your Contribution, unless You choose to do so. Any such support provided to the Project is provided free of charge.

  7. Warranty. Unless required by applicable law or agreed to in writing, You provide Your Contributions on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.

  8. Third party material. Should You wish to submit work that is not Your original creation, You may only submit it to Fox-IT separately from any Contribution, identifying the complete details of its source and of any license or other restriction (including, but not limited to, related patents, trademarks, and license agreements) of which You are personally aware, and conspicuously marking the work as "Submitted on behalf of a third-party: [named here]".

  9. Notify. You agree to notify Fox-IT of any facts or circumstances of which You become aware that would make the above representations inaccurate in any respect.

  10. Governing law / competent court. This Agreement is governed by the laws of the Netherlands. Any disputes that may arise are resolved by arbitration in accordance with the Arbitration Regulations of the Foundation for the Settlement of Automation Disputes (Stichting Geschillenoplossing Automatisering – SGOA – (www.sgoa.eu), this without prejudice to either party"s right to request preliminary relief in preliminary relief proceedings or arbitral preliminary relief proceedings. Arbitration proceedings take place in Amsterdam, or in any other place designated in the Arbitration Regulations. Arbitration shall take place in English.

@respondersGY
Copy link
Contributor Author

respondersGY commented Aug 19, 2025

@DissectBot agree [company="Responders B.V."]

`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 38: invalid continuation byte`
@respondersGY
Copy link
Contributor Author

I added a test that triggers UnicodeEncodeError: [...] surrogates not allowed with the old code but does not work due to UnicodeDecodeError: [...] invalid continuation byte. Is there a different way to test this?

`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 38: invalid continuation byte`
@yunzheng yunzheng force-pushed the fix/surrogates_not_allowed branch from be7cef5 to b410276 Compare August 20, 2025 13:30
@yunzheng
Copy link
Member

This more looks like an issue with windows_path's __repr__:

def __repr__(self) -> str:

While the normal path does repr(str(self)), the windows path works on str(self), which I think causes the issue with the surrogate characters.

@yunzheng
Copy link
Member

So it indeed looks more to be an issue with windows_path() not able to handle surrogate characters when doing repr. I modified it's repr to this:

class windows_path(pathlib.PureWindowsPath, path):
    def __repr__(self) -> str:
        s = str(self)
        # Only use repr() if we have surrogates that need escaping
        try:
            s.encode('utf-8')
        except UnicodeEncodeError:
            # Has surrogates - use repr but fix the over-escaping
            s = repr(s)[1:-1]  # This escapes surrogates as \udcXX
            s = s.replace('\\\\', '\\')  # Fix double backslashes
            s = s.replace("\\'", "'")   # Fix over-escaped quotes
            s = s.replace('\\"', '"')   # Fix over-escaped double quotes        
    
        quote = "'"
        if "'" in s:
            if '"' in s:
                s = s.replace("'", "\\'")
            else:
                quote = '"'
        return f"{quote}{s}{quote}"

Then it seems to work (without the change in the RecordPrinter)

@respondersGY
Copy link
Contributor Author

Added in 9324f50 (#181).

I added \udce4 to the test because that triggers the UnicodeEncodeError with the previous implementation of __repr__.

@respondersGY respondersGY requested a review from yunzheng August 21, 2025 10:55
Copy link
Member

@yunzheng yunzheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added small remark, looks good otherwise

@respondersGY respondersGY requested a review from yunzheng August 21, 2025 12:16
@yunzheng yunzheng changed the title Handle surrogates with errors=surrogateescape windows_path __repr__ does not work if it contains surrogate characters Aug 21, 2025
@codecov
Copy link

codecov bot commented Aug 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.07%. Comparing base (1dce701) to head (c2edb71).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #181      +/-   ##
==========================================
+ Coverage   83.03%   83.07%   +0.03%     
==========================================
  Files          34       34              
  Lines        3602     3609       +7     
==========================================
+ Hits         2991     2998       +7     
  Misses        611      611              
Flag Coverage Δ
unittests 83.07% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yunzheng yunzheng merged commit c3447bd into fox-it:main Aug 21, 2025
25 checks passed
@respondersGY respondersGY deleted the fix/surrogates_not_allowed branch August 21, 2025 13:12
@respondersGY
Copy link
Contributor Author

Thank you for the help and the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shellbag fails due to UnicodeEncodeError

3 participants