Skip to content

fix: Add support for symbolic links in file compression#349

Merged
deepin-bot[bot] merged 1 commit intolinuxdeepin:develop/snipefrom
dengzhongyuan365-dev:pzip-cpp
Jan 16, 2026
Merged

fix: Add support for symbolic links in file compression#349
deepin-bot[bot] merged 1 commit intolinuxdeepin:develop/snipefrom
dengzhongyuan365-dev:pzip-cpp

Conversation

@dengzhongyuan365-dev
Copy link
Contributor

@dengzhongyuan365-dev dengzhongyuan365-dev commented Jan 16, 2026

  • Introduced isSymlink and symlinkTarget attributes in FileTask to handle symbolic links.
  • Updated Archiver to compress symlink targets and adjust ZIP header properties accordingly.
  • Modified FileTask reset method to detect and read symlink targets, ensuring proper file size handling.

Log: Enhance file compression capabilities by supporting symbolic links

Summary by Sourcery

Add support for correctly archiving symbolic links by detecting them during file task setup and treating their targets as stored data in ZIP entries.

New Features:

  • Support compressing symbolic links by storing their targets in the ZIP archive.

Enhancements:

  • Adjust ZIP header population to mark symbolic links with appropriate method, sizes, and Unix file mode attributes.
  • Update relative path handling to avoid resolving symbolic links when computing archive entry names.

@sourcery-ai
Copy link

sourcery-ai bot commented Jan 16, 2026

Reviewer's Guide

Adds first-class support for symbolic links to the pzip archiver by tracking symlink metadata in FileTask, reading symlink targets during reset, and updating Archiver’s compression and ZIP header logic to store link targets correctly with appropriate sizes, CRC, and Unix attributes.

Sequence diagram for Archiver::compress with symlink handling

sequenceDiagram
    actor User
    participant Archiver
    participant FileTask
    participant FileSystem

    User->>Archiver: compress(task)
    Archiver->>FileTask: check isSymlink
    alt task isSymlink
        Archiver->>FileTask: read symlinkTarget
        Archiver->>FileTask: write(targetData, targetSize)
        Archiver->>FileTask: set header.crc32 for target
        Archiver-->>User: Error()
    else regular file
        Archiver->>FileSystem: open file(task.path)
        FileSystem-->>Archiver: file stream
        loop read and compress
            Archiver->>FileSystem: read chunk
            Archiver->>FileTask: write(compressedChunk)
        end
        Archiver-->>User: Error()
    end
Loading

Class diagram for updated FileTask and Archiver symlink support

classDiagram
    class FileTask {
        fs_path path
        fs_file_status status
        uintmax_t fileSize
        ZipFileHeader header
        bool isSymlink
        string symlinkTarget
        z_stream* compressor
        Error reset(fs_path filePath, fs_path relativeTo)
        Error write(uint8_t* buffer, size_t size)
    }

    class ZipFileHeader {
        string name
        uint32_t crc32
        uint32_t compressedSize
        uint32_t uncompressedSize
        uint16_t method
        uint16_t flags
        uint32_t externalAttr
    }

    class Archiver {
        Error compress(FileTask* task)
        void populateHeader(FileTask* task)
    }

    Archiver --> FileTask : uses
    FileTask o-- ZipFileHeader : has
Loading

File-Level Changes

Change Details Files
Track symbolic link state and target path in FileTask and initialize them on reset.
  • Add isSymlink and symlinkTarget members to FileTask to represent symlink metadata
  • Initialize isSymlink to false and clear symlinkTarget at the start of reset
  • Use fs::symlink_status instead of fs::status so symlinks are not followed when determining file status
  • When a path is a symlink, read the symlink target and set fileSize to the target string length; otherwise, fall back to file_size for regular files
3rdparty/pzip/src/file_task.cpp
3rdparty/pzip/include/pzip/file_task.h
Adjust relative path and header name computation to avoid resolving symlinks while still producing ZIP-style paths.
  • Replace fs::relative(...) with manual string-based relative path calculation for cases where a base directory is provided, to avoid symlink resolution
  • Ensure the computed relative name is normalized via utils::toZipPath and fall back to using the filename when the path does not share the expected prefix
3rdparty/pzip/src/file_task.cpp
Update Archiver to write symlink targets as stored entries and set appropriate ZIP header fields and Unix attributes.
  • Short-circuit Archiver::compress for symlinks by writing the symlink target string as the entry data and computing CRC over that string
  • In Archiver::populateHeader, add a dedicated symlink branch that forces STORE method, disables data descriptors, sets compressed/uncompressed sizes to the target string length, and encodes Unix symlink permissions in externalAttr
3rdparty/pzip/src/archiver.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The relative-path computation for symlinks in reset does manual std::string prefix checks with '/', which will break on Windows-style \ separators; consider normalizing separators or using fs::relative where possible to avoid platform-specific bugs.
  • Storing the symlink target as fs::read_symlink(path).string() and writing that verbatim into the ZIP may embed platform-specific path formats; consider normalizing to a consistent ZIP-style path (e.g., forward slashes) or storing it as fs::path and converting via utils::toZipPath when writing.
  • When setting externalAttr for symlinks you currently hardcode 0120777 permissions; if you care about preserving original permission bits, you may want to derive the mode from status.permissions() and OR it with S_IFLNK instead of using a fixed value.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The relative-path computation for symlinks in `reset` does manual `std::string` prefix checks with `'/'`, which will break on Windows-style `\` separators; consider normalizing separators or using `fs::relative` where possible to avoid platform-specific bugs.
- Storing the symlink target as `fs::read_symlink(path).string()` and writing that verbatim into the ZIP may embed platform-specific path formats; consider normalizing to a consistent ZIP-style path (e.g., forward slashes) or storing it as `fs::path` and converting via `utils::toZipPath` when writing.
- When setting `externalAttr` for symlinks you currently hardcode `0120777` permissions; if you care about preserving original permission bits, you may want to derive the mode from `status.permissions()` and OR it with `S_IFLNK` instead of using a fixed value.

## Individual Comments

### Comment 1
<location> `3rdparty/pzip/src/file_task.cpp:122-128` </location>
<code_context>
-        if (ec) {
-            // 如果无法计算相对路径,使用文件名
-            header.name = path.filename().string();
+        std::string pathStr = path.string();
+        std::string relToStr = relativeTo.string();
+        if (!relToStr.empty() && relToStr.back() != '/') {
+            relToStr += '/';
+        }
+        
+        if (pathStr.find(relToStr) == 0) {
+            // path 以 relativeTo 开头,直接截取
+            header.name = utils::toZipPath(pathStr.substr(relToStr.size()));
</code_context>

<issue_to_address>
**suggestion (bug_risk):** String-prefix check for relative path can misbehave on edge cases and is platform-sensitive.

`pathStr.find(relToStr) == 0` can match unintended prefixes (e.g., `/foo` vs `/foobar`) and breaks when separators differ (`/` vs `\` on Windows). To avoid `fs::relative` while still handling symlinks carefully, consider normalizing both paths (including separators and ensuring a path-boundary after `relToStr`) or using `fs::path` APIs up to the point of constructing the final relative string, so the prefix check is based on canonicalized components rather than raw strings.

Suggested implementation:

```cpp
    // 设置相对路径名
    // 注意:fs::relative() 会解析符号链接,这里使用路径组件比较避免解析符号链接
    if (!relativeTo.empty()) {
        // 使用 path 组件而不是字符串前缀,避免 /foo 与 /foobar 等前缀问题,
        // 同时不依赖平台分隔符
        auto relBegin = relativeTo.begin();
        auto relEnd   = relativeTo.end();
        auto pathBegin = path.begin();
        auto pathEnd   = path.end();

        // relativeTo 的组件数不能多于 path
        if (std::distance(relBegin, relEnd) <= std::distance(pathBegin, pathEnd) &&
            std::equal(relBegin, relEnd, pathBegin)) {
            // 构造相对路径:path 去掉 relativeTo 前缀后的部分
            fs::path relPath;
            auto it = pathBegin;
            std::advance(it, std::distance(relBegin, relEnd));
            for (; it != pathEnd; ++it) {
                relPath /= *it;
            }

            // 如果恰好相等,则仍然退回到使用文件名,以避免空名
            if (!relPath.empty()) {
                header.name = utils::toZipPath(relPath);
            } else {
                header.name = utils::toZipPath(path.filename());
            }
        } else {
            // 回退到使用文件名
            header.name = utils::toZipPath(path.filename());
        }
    } else {
        header.name = utils::toZipPath(path.filename());

```

If this file does not already include `<algorithm>` and `<iterator>`, you should add them at the top of `3rdparty/pzip/src/file_task.cpp` so that `std::equal` and `std::distance` are available:

- `#include <algorithm>`
- `#include <iterator>`
</issue_to_address>

### Comment 2
<location> `3rdparty/pzip/src/file_task.cpp:85` </location>
<code_context>

     bufferUsed_ = 0;
     written_ = 0;
+    isSymlink = false;
+    symlinkTarget.clear();

</code_context>

<issue_to_address>
**issue (complexity):** Consider replacing the custom relative-path logic with a helper using `lexically_relative` and extracting symlink/file-type handling into a dedicated helper to simplify `reset`.

The manual relative-path handling and the extra branching in `reset` do increase complexity; you can simplify without changing behavior.

### 1. Replace manual string-based relative path with `lexically_relative`

You can avoid mixing `std::filesystem::path` with ad‑hoc string manipulation and avoid `fs::relative()` (which resolves symlinks) by using `lexically_relative` and a small helper:

```cpp
// helper (e.g. in an anonymous namespace or a util header)
fs::path makeZipRelativePath(const fs::path& path, const fs::path& base) {
    fs::path rel = path.lexically_relative(base);
    // If not under base, fall back to filename (current behavior)
    if (rel.empty() || rel.native().rfind("..", 0) == 0 || rel.is_absolute()) {
        return path.filename();
    }
    return rel;
}
```

Usage in `reset`:

```cpp
// 设置相对路径名
if (!relativeTo.empty()) {
    fs::path relPath = makeZipRelativePath(path, relativeTo);
    header.name = utils::toZipPath(relPath);
} else {
    header.name = utils::toZipPath(path.filename());
}
```

This keeps symlinks un-resolved (purely lexical arithmetic), removes assumptions about `/`, and avoids repeated `string()` calls.

### 2. Extract symlink handling out of `reset`

`reset` is doing many things now (state reset, status detection, symlink handling, size determination, relative path). You can factor out the symlink/file-type logic into a small helper to reduce branching and make `reset` easier to scan:

```cpp
// helper
Error initFileTypeAndSize(const fs::path& path,
                          fs::file_status& status,
                          bool& isSymlink,
                          std::string& symlinkTarget,
                          uint64_t& fileSize) {
    std::error_code ec;
    status = fs::symlink_status(path, ec);
    if (ec) {
        return Error(ErrorCode::FILE_NOT_FOUND, "Cannot stat file: " + path.string());
    }

    isSymlink = fs::is_symlink(status);
    if (isSymlink) {
        symlinkTarget = fs::read_symlink(path, ec).string();
        if (ec) {
            return Error(ErrorCode::FILE_READ_ERROR, "Cannot read symlink target: " + path.string());
        }
        fileSize = symlinkTarget.size();
    } else if (fs::is_regular_file(status)) {
        fileSize = fs::file_size(path, ec);
        if (ec) {
            return Error(ErrorCode::FILE_READ_ERROR, "Cannot get file size: " + path.string());
        }
    } else {
        fileSize = 0;
    }
    return Error::success();
}
```

Then in `reset`:

```cpp
isSymlink = false;
symlinkTarget.clear();
path = filePath;

Error err = initFileTypeAndSize(path, status, isSymlink, symlinkTarget, fileSize);
if (!err.ok()) {
    return err;
}
```

This preserves all new functionality (symlink size and target handling) but reduces complexity and concentrates path and symlink logic in focused helpers.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

- Introduced isSymlink and symlinkTarget attributes in FileTask to handle symbolic links.
- Updated Archiver to compress symlink targets and adjust ZIP header properties accordingly.
- Modified FileTask reset method to detect and read symlink targets, ensuring proper file size handling.

Log: Enhance file compression capabilities by supporting symbolic links
@deepin-ci-robot
Copy link

deepin pr auto review

这段代码主要实现了在 ZIP 压缩工具中添加对符号链接(Symlink)的支持。整体逻辑是正确的,能够识别符号链接、读取其目标路径,并在 ZIP 文件中正确设置 Unix 文件属性以表示这是一个符号链接。

以下是从语法逻辑、代码质量、代码性能和代码安全四个方面的详细审查意见:

1. 语法逻辑

  • 符号链接属性设置
    archiver.cpppopulateHeader 中:
    h.externalAttr = static_cast<uint32_t>(S_IFLNK | 0777) << 16;
    逻辑是正确的。ZIP 格式规范中,externalAttr 的高 16 位用于存储 MS-DOS/Unix 文件属性。S_IFLNK 表示这是一个符号链接,0777 是权限掩码。左移 16 位将其放入正确位置。
  • 相对路径计算
    file_task.cppreset 中,为了避免 fs::relative 解析符号链接,代码使用了手动迭代的方式来计算相对路径。
    // 使用 path 迭代器跳过共同前缀
    auto pathIt = path.begin();
    auto relIt = relativeTo.begin();
    // ...
    潜在逻辑缺陷:这段代码假设 pathrelativeTo 的子路径。如果 pathrelativeTo 指向不同的驱动器(Windows)或者根本没有共同祖先,这段代码生成的路径可能不符合预期(例如,它不会生成 ../ 来向上回溯)。虽然 fs::relative 在某些情况下会跟随符号链接,但手动实现一个健壮的相对路径计算器比较复杂。
    建议:如果 fs::relative 解析符号链接的问题仅限于 path 本身是符号链接的情况,可以考虑先对 path 调用 fs::canonical 或类似处理(但这会改变路径指向),或者接受当前手动实现的局限性,前提是调用者能保证 path 始终在 relativeTo 之下。

2. 代码质量

  • 资源管理
    archiver.cpp 中处理符号链接时:
    if (task->isSymlink) {
        const auto& target = task->symlinkTarget;
        task->write(reinterpret_cast<const uint8_t*>(target.data()), target.size());
        // ...
        return Error();
    }
    这里直接返回了,跳过了后续的文件流操作,逻辑清晰,符合预期。
  • 代码可读性
    手动计算相对路径的代码块稍显冗长,且注释说明了原因,这是好的做法。
  • 类型转换
    使用了 reinterpret_cast,这在处理字节流时是必要的,但需要确保调用方(task->write)确实期望 uint8_t*

3. 代码性能

  • 路径计算
    手动迭代路径组件(begin()/end())通常比 fs::relative 更快,因为它避免了文件系统的访问(stat 等),这在处理大量文件时是一个性能优势。
  • 内存分配
    symlinkTarget = fs::read_symlink(path, ec).string(); 会产生一次字符串拷贝。考虑到符号链接通常很短,这可以忽略不计。

4. 代码安全

  • 符号链接攻击 (Symlink Attacks / Zip Slip)
    这是最需要关注的安全点。当解压这个 ZIP 文件时,如果解压工具不检查符号链接的目标路径,攻击者可以创建一个指向系统敏感文件(如 /etc/passwd)的符号链接,或者指向压缩包外的目录(Zip Slip 变种)。
    • 当前代码:作为压缩端,代码忠实地记录了符号链接的目标。这是正确的行为。
    • 建议:虽然这是压缩代码,但建议在项目文档或代码注释中明确指出:生成的 ZIP 包包含符号链接,解压时必须验证符号链接目标,防止目录遍历攻击。
  • TOCTOU (Time-of-check to time-of-use)
    file_task.cpp 中:
    status = fs::symlink_status(path, ec);
    // ...
    isSymlink = fs::is_symlink(status);
    if (isSymlink) {
        symlinkTarget = fs::read_symlink(path, ec);
    }
    获取状态和读取目标之间有一个时间窗口。如果文件在这个窗口内被删除或替换,read_symlink 可能会失败。代码通过检查 ec 处理了这种情况,返回了错误,这是安全的。
  • 缓冲区安全
    task->write(...) 接收指针和大小。由于 targetstd::stringtarget.data()target.size() 是安全的,不会越界。

改进建议总结

  1. 关于相对路径计算(重要)
    当前手动计算相对路径的逻辑在 path 不是 relativeTo 的严格子路径时可能失效(例如跨盘符或需要 .. 的情况)。
    改进方案:如果必须避免 fs::relative 的符号链接解析行为,建议完善手动计算逻辑,或者确认业务场景中 path 始终是子路径。
    如果只是为了防止 path 本身被解析,可以考虑对 relativeTopath 的父级目录使用 fs::relative,但这依然可能涉及解析。
    更简单的替代方案:如果 fs::relative 仅在 path 是符号链接时解析它,而你希望保留符号链接作为路径名,可以尝试使用 fs::weakly_canonical(仅解析存在的部分)或直接使用原始路径字符串操作(如果路径格式可控)。
    针对当前代码的修改建议:如果无法保证路径关系,建议保留 fs::relative 的回退逻辑,或者至少在手动计算失败时回退到绝对路径或文件名。

  2. 安全性注释
    populateHeader 设置符号链接属性的地方,建议添加注释,提醒解压方需要验证符号链接目标的安全性。

  3. 错误处理
    read_symlink 读取失败时,当前返回 FILE_READ_ERROR。这是合理的,但可以更具体一些,例如 ErrorCode::SYMLINK_READ_ERROR(如果有定义的话),以便调试。

  4. 代码风格
    h.externalAttr = static_cast<uint32_t>(S_IFLNK | 0777) << 16;
    建议使用位或运算符的宏或常量来增强可读性,例如:

    // 假设有这些定义
    const uint32_t UNIX_FILE_TYPE_MASK = 0xF0000000;
    const uint32_t UNIX_SYMLINK_TYPE = 0xA0000000; // S_IFLNK << 16
    const uint32_t UNIX_PERMISSION_MASK = 0x0000FFFF;
    const uint32_t DEFAULT_SYMLINK_PERMS = 0777;
    
    h.externalAttr = UNIX_SYMLINK_TYPE | (DEFAULT_SYMLINK_PERMS & UNIX_PERMISSION_MASK);

    这样更清晰地表达了意图。

修改后的代码片段建议 (针对相对路径计算的健壮性)

由于手动计算相对路径比较复杂且容易出错(特别是处理 .. 和不同根目录的情况),如果 fs::relative 的主要问题是它解析了 path(如果它是符号链接),我们可以尝试一种折中方案:只对 relativeTo 进行规范化处理,然后利用字符串特性。

但考虑到 C++17 fs::relative 的行为是标准化的,如果是为了防止解析符号链接,当前的手动迭代是常见做法。为了修复潜在的逻辑缺陷(如跨盘或非子目录),建议在手动计算逻辑中增加对根目录的检查:

// 在 file_task.cpp 中
if (!relativeTo.empty()) {
    // 检查是否有共同的根目录(例如 Windows 下的 C: 和 D: 没有共同根)
    if (path.root_path() != relativeTo.root_path()) {
        // 如果根目录不同,无法生成相对路径,使用绝对路径或文件名
        // 这里选择使用文件名作为回退,或者根据需求使用完整路径
        header.name = utils::toZipPath(path.filename());
    } else {
        // 原有的手动迭代逻辑
        auto pathIt = path.begin();
        auto relIt = relativeTo.begin();
        
        // 跳过共同前缀
        while (pathIt != path.end() && relIt != relativeTo.end() && *pathIt == *relIt) {
            ++pathIt;
            ++relIt;
        }
        
        // 处理 relativeTo 剩余的部分(需要添加 ../)
        // 注意:原代码缺少这部分,如果 relativeTo 比 path 深,生成的路径就是错的
        fs::path relPath;
        for (; relIt != relativeTo.end(); ++relIt) {
            relPath /= "..";
        }
        
        // 添加 path 剩余的部分
        for (; pathIt != path.end(); ++pathIt) {
            relPath /= *pathIt;
        }
        
        if (!relPath.empty()) {
            header.name = utils::toZipPath(relPath);
        } else {
            // 理论上如果完全相同,这里应该是 ".",但 ZIP 通常不需要 "."
            // 或者指向文件本身
            header.name = utils::toZipPath(path.filename());
        }
    }
}

注意:上述修改增加了对 .. 的处理,使得相对路径计算更加健壮。但请确保 utils::toZipPath 能够正确处理包含 .. 的路径,或者在解压时处理好路径规范化,防止 Zip Slip。

总结

代码整体实现了功能,但在相对路径计算的健壮性上存在改进空间,特别是在处理非子目录路径时。同时,作为生成包含符号链接 ZIP 的工具,应明确安全责任边界。

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dengzhongyuan365-dev, lzwind

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dengzhongyuan365-dev
Copy link
Contributor Author

/forecemerge

@dengzhongyuan365-dev
Copy link
Contributor Author

/forcemerge

@deepin-bot deepin-bot bot merged commit 275f4c1 into linuxdeepin:develop/snipe Jan 16, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants