Skip to content

ByteArrayRawHasher改进建议 #32

@ZikangYuan

Description

@ZikangYuan

目前的开源版本代码没有实现移动构造函数和移动赋值运算符。std::unordered_map 在内部(例如 rehash 时)会大量移动元素。没有移动语义,它会回退到(更慢的)拷贝操作,导致不必要的内存分配和复制。在最坏的情况下,不正确的隐式移动可能导致双重释放(double-free)错误。

建议修改成下述表达以避免隐患:

struct ByteArrayRaw {
    std::vector<uint8_t> data;
    static size_t size; 

    ByteArrayRaw() : data(size) {}

    bool operator==(const ByteArrayRaw &other) const {
        return data == other.data;
    }
};

struct ByteArrayRawHasher {
  size_t operator()(const ByteArrayRaw &arr) const {
    const size_t dataSize = arr.data.size();
    const uint8_t* raw_data = arr.data.data();

    const uint64_t seed = 0xc3a5c85c97cb3127ULL;
    const uint64_t m = 0xc6a4a7935bd1e995ULL;
    const int r = 47;

    size_t h = seed ^ (dataSize * m);

    const uint64_t *data = reinterpret_cast<const uint64_t *>(raw_data);
    const uint64_t *end = data + (dataSize / 8);

    while (data != end) {
        uint64_t k = *data++;
        k *= m; k ^= k >> r; k *= m;
        h ^= k; h *= m;
    }

    const unsigned char *tail = reinterpret_cast<const unsigned char *>(data);

    switch (dataSize & 7) {
    case 7: h ^= uint64_t(tail[6]) << 48;
    case 6: h ^= uint64_t(tail[5]) << 40;
    case 5: h ^= uint64_t(tail[4]) << 32;
    case 4: h ^= uint64_t(tail[3]) << 24;
    case 3: h ^= uint64_t(tail[2]) << 16;
    case 2: h ^= uint64_t(tail[1]) << 8;
    case 1: h ^= uint64_t(tail[0]); h *= m;
    }

    h ^= h >> r; h *= m; h ^= h >> r;
    return h;
  }
};

因为std::vector实现了移动语义

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions