diff --git a/README.md b/README.md index 84da0b2..648b4cd 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,8 @@ # he4 + Fast fixed-memory hash table. -``` +```text |_| _ |_| | |(/_ | He4 @@ -11,49 +12,48 @@ He4 [![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/453/badge)](https://bestpractices.coreinfrastructure.org/projects/453) -|Branch |Status | -|------------|---------| -|master | [![Build Status](https://travis-ci.org/sprowell/he4.svg?branch=master)](https://travis-ci.org/sprowell/he4?branch=master) | -|dev | [![Build Status](https://travis-ci.org/sprowell/he4.svg?branch=dev)](https://travis-ci.org/sprowell/he4) [![Coverage Status](https://coveralls.io/repos/sprowell/he4/badge.svg?branch=dev&service=github)](https://coveralls.io/github/sprowell/he4?branch=dev)| +| Branch | Status | +| ------ | ------ | +| master | [![Build Status](https://travis-ci.org/sprowell/he4.svg?branch=master)](https://travis-ci.org/sprowell/he4?branch=master) | +| dev | [![Build Status](https://travis-ci.org/sprowell/he4.svg?branch=dev)](https://travis-ci.org/sprowell/he4) [![Coverage Status](https://coveralls.io/repos/sprowell/he4/badge.svg?branch=dev&service=github)](https://coveralls.io/github/sprowell/he4?branch=dev) | **Branch Policy:** - - The "master" branch is intended to be stable at all times. - - The "dev" branch is the one where all contributions must be merged before - being promoted to master. - + If you plan to propose a patch, please commit into the "dev" branch or a - separate feature branch. Direct commit to "master" will not be accepted. +- The `master` branch is intended to be stable at all times. +- The `dev` branch is the one where all contributions must be merged before + being promoted to master. + - If you plan to propose a patch, please commit into the `dev` branch or a + separate feature branch. Direct commit to `master` will not be accepted. ## Overview The idea of this library is to create a fast in-memory hash table implementation in C99. The following are the design criteria. - * This is a fixed-size hash table; new memory is never allocated by this - library once the initial table is created, but you *can* force a rehash. - * Because the table is fixed-size, it can fill up. If it fills up, then - there are two possible outcomes: new entries are dropped, or older - entries are dropped. - * A default hash algorithm is supplied; you can use your own. The default - is the [xxHash algorithm][xxhash]. - -Why? Well, embedded systems with fixed memory may need hash tables, too. - -This library uses [semantic versioning][semver]. The version number is set +- This is a fixed-size hash table; new memory is never allocated by this + library once the initial table is created, but you _can_ force a rehash. +- Because the table is fixed-size, it can fill up. If it fills up, then + there are two possible outcomes: new entries are dropped, or older + entries are dropped. +- A default hash algorithm is supplied; you can use your own. The default + is the [xxHash algorithm][xxhash]. + +Why? Well, embedded systems with fixed memory may need hash tables, too. + +This library uses [semantic versioning][semver]. The version number is set in the `CMakeLists.txt` file and can be obtained at runtime with the `he4_version` function. You may **contribute** to this project by submitting issues or feature requests, forking the project and submitting pull requests, and by contributing to the -meagre documentation. All help is welcome! +meagre documentation. All help is welcome! Please report vulnerabilities on the [issue tracker](https://github.com/sprowell/he4/issues) with highest priority. -No specific style guide is adopted for this project. The closest is +No specific style guide is adopted for this project. The closest is [this guide][c-style]. - ## Building This project is built with [CMake][cmake] and the API documentation is built @@ -81,14 +81,14 @@ make examples ``` The `count` example counts the occurrences of lines in a file and write the -result. It provides an example of a table that maps `char *` keys to concrete -values (that is, values that are not pointers). An example input file +result. It provides an example of a table that maps `char *` keys to concrete +values (that is, values that are not pointers). An example input file `wordlist.txt` is provided. ## CMake Generators To instead create an Eclipse project, a Ninja file, or something else, use -a generator. The command `cmake -h` will list the generators supported by your +a generator. The command `cmake -h` will list the generators supported by your version of CMake. ## Windows @@ -99,7 +99,7 @@ You can get a Windows standalone SDK (for Windows 10 you can find it [here](https://dev.windows.com/en-US/downloads/windows-10-sdk)). Be sure to open Visual Studio and create at least one minimal C project so -that any "on demand" components are installed. Then do the following from +that any "on demand" components are installed. Then do the following from the command prompt. ```bash @@ -117,26 +117,26 @@ Studio. This table implementation uses open addressing with linear probing (because linear probing is simply faster to compute, even though it can lead to -clustering). As such, you should allocate enough space that your fixed-size +clustering). As such, you should allocate enough space that your fixed-size table will never have a load factor above 0.7. This implementation uses lazy deletion, which makes deletion fast, but can -increase search time. To improve on this, any successful search that has +increase search time. To improve on this, any successful search that has passed a lazily deleted entry will relocate the found entry to the first open -slot. Thus found items "move to the front of the line" to improve search time. +slot. Thus found items "move to the front of the line" to improve search time. ## Performance -**Be aware!** If the table becomes full your performance is going to be -*horrible*. Again, a full table will have *horrible* performance, since -it cannot automagically rehash. Paradoxically this is worse as the table +**Be aware!** If the table becomes full your performance is going to be +_horrible_. Again, a full table will have _horrible_ performance, since +it cannot automagically rehash. Paradoxically this is worse as the table becomes larger, because every search may have to go through the entire table to find an entry. ## Basic Usage To use the library `#include ` and then make use of the functions -found therein. If you are happy with your platform's `malloc` and are not +found therein. If you are happy with your platform's `malloc` and are not constrained by library restrictions, then there is little else you need to think about (but do read the following two sections on handling full tables). @@ -160,16 +160,16 @@ he4_delete(table); Insert items with `he4_insert` and remove them with `he4_remove` (to get the value back) or `he4_discard` (to let the library free it). -Find an entry in the table with `he4_get`. Alternately use `he4_find`, which +Find an entry in the table with `he4_get`. Alternately use `he4_find`, which returns a pointer to the item found in the table, as opposed to the item -itself. This allows modification of the item in place in the table, and can -improve performance significantly by avoiding multiple searches. See the +itself. This allows modification of the item in place in the table, and can +improve performance significantly by avoiding multiple searches. See the example `basics.c`. ## Rehashing -Of course you can *force* a rehash. Code like the following is recommended, -if you have memory to spare. Be aware that the rehash function returns `NULL` +Of course you can _force_ a rehash. Code like the following is recommended, +if you have memory to spare. Be aware that the rehash function returns `NULL` to signal that it cannot get memory for the new table. ```c @@ -182,27 +182,27 @@ if (he4_load(table) > 0.7) { The second argument (`0`) uses the default new capacity, which is double the old capacity. -The original table is freed, but *only* if the new table is successfully -constructed. See the next section for how you might use this strategy. +The original table is freed, but _only_ if the new table is successfully +constructed. See the next section for how you might use this strategy. ## Least-Recently-Used -By default the library adds a field to each entry called the *touch index*. -The table has a stored maximum value. Whenever an entry is found or inserted, +By default the library adds a field to each entry called the _touch index_. +The table has a stored maximum value. Whenever an entry is found or inserted, its touch index is set to the table maximum and the maximum is then incremented. The item with the lowest touch index at any time is the least-recently-used item. -You can use this to clean up the table. Set a threshold and trim the table by -discarding everything with touch index below the threshold. See -`he4_trim`. The new table's touch indices are debited by the threshold (so +You can use this to clean up the table. Set a threshold and trim the table by +discarding everything with touch index below the threshold. See +`he4_trim`. The new table's touch indices are debited by the threshold (so least-recently-used information is preserved). -You can also use this when rehashing. Set a threshold and rehash to the same -size table. The items with touch index below the threshold will be removed. +You can also use this when rehashing. Set a threshold and rehash to the same +size table. The items with touch index below the threshold will be removed. See `he4_trim_and_rehash`. -Maintaining the touch index comes with a slight cost in time and space. If you +Maintaining the touch index comes with a slight cost in time and space. If you do not want this, then uncomment the appropriate line in `CMakeLists.txt`. ```c @@ -215,17 +215,17 @@ while (he4_load(table) > 0.5) { ## Include Files and Dependencies -The library includes [Doug Lea's][dlmalloc] `malloc` implementation. If you +The library includes [Doug Lea's][dlmalloc] `malloc` implementation. If you want to use it, then uncomment the correct line in `CMakeLists.txt`. Alternately you can just `#define` the macros `HE4MALLOC` and `HE4FREE` as -indicated in `he4.h`. In all these cases, `stdlib.h` is no longer included and +indicated in `he4.h`. In all these cases, `stdlib.h` is no longer included and no longer required. -The debugging facility uses `stdio.h`. You can `#define NODEBUG` to eliminate +The debugging facility uses `stdio.h`. You can `#define NODEBUG` to eliminate this dependency. Both `stdint.h` and `stdbool.h` are required, but are also easily replaced. -Building requires `string.h`, and the string library is needed to run. It +Building requires `string.h`, and the string library is needed to run. It should not be hard to replace it if you need to. If you want debugging but can't tolerate `stdio.h`, have a look at @@ -233,34 +233,45 @@ If you want debugging but can't tolerate `stdio.h`, have a look at ## nostdlib -Working on this. For now you're stuck. Right now you can uncomment +Working on this. For now you're stuck. Right now you can uncomment `set(NO_STD_LIB 1)` in the `CMakeLists.txt` and compile, but you will have -a few undefined symbols. On my Mac I get a few undefined symbols. - - * `___error` - * `___memcpy_chk` - * `_abort` - * `_memcmp` - * `_memcpy` - * `_memset` - * `_mmap` - * `_munmap` - * `_sysconf` +a few undefined symbols. On my Mac I get a few undefined symbols. + +- `___error` +- `___memcpy_chk` +- `_abort` +- `_memcmp` +- `_memcpy` +- `_memset` +- `_mmap` +- `_munmap` +- `_sysconf` + +## Memory Testing + +This library is tested for memory leaks using `valgrind`. For example, you can try the following. + +```bash +valgrind --leak-check=full -s ./table-test +valgrind --leak-check=full -s ./insert-test +valgrind --leak-check=full -s ./insert-2-test +``` + +These should all complete with **no errors** reported. ## License -This is licensed under the two-clause "simplified" BSD license. See the +This is licensed under the two-clause "simplified" BSD license. See the [LICENSE](LICENSE) file in the distribution. -[xxHash][xxhash] is BSD licensed. See the [LICENSE-xxHash](LICENSE-xxHash) -file in the distribution. The **branch policy** was taken from xxHash. +[xxHash][xxhash] is BSD licensed. See the [LICENSE-xxHash](LICENSE-xxHash) +file in the distribution. The **branch policy** was taken from xxHash. [Doug Lea's malloc][dlmalloc] is in the public domain. The Helium creep image used in the generated documentation is from -[Wikipedia][superfluid]. They determine our reality, and the image is -licensed under the [Creative Commons Attribution Share Alike 3.0 License][cc-by-sa-3]. - +[Wikipedia][superfluid]. They determine our reality, and the image is +licensed under the [Creative Commons Attribution Share Alike 3.0 License][cc-by-sa-3]. [xxhash]: https://github.com/Cyan4973/xxHash [dlmalloc]: https://github.com/sailfish009/malloc @@ -272,4 +283,3 @@ licensed under the [Creative Commons Attribution Share Alike 3.0 License][cc-by- [superfluid]: https://en.wikipedia.org/wiki/Superfluidity#/media/File:Helium-II-creep.svg [cc-by-sa-3]: http://creativecommons.org/licenses/by-sa/3.0/ [c-style]: https://users.ece.cmu.edu/~eno/coding/CCodingStandard.html - diff --git a/src/xxhash.h b/src/xxhash.h index c2344b8..1ad2ddb 100644 --- a/src/xxhash.h +++ b/src/xxhash.h @@ -241,7 +241,7 @@ * xxHash prototypes and implementation */ -#if defined (__cplusplus) +#if defined(__cplusplus) && !defined(XXH_NO_EXTERNC_GUARD) extern "C" { #endif @@ -1110,6 +1110,24 @@ XXH_PUBLIC_API XXH_PUREF XXH64_hash_t XXH64_hashFromCanonical(XXH_NOESCAPE const * * The API supports one-shot hashing, streaming mode, and custom secrets. */ + +/*! + * @ingroup tuning + * @brief Possible values for @ref XXH_VECTOR. + * + * Unless set explicitly, determined automatically. + */ +# define XXH_SCALAR 0 /*!< Portable scalar version */ +# define XXH_SSE2 1 /*!< SSE2 for Pentium 4, Opteron, all x86_64. */ +# define XXH_AVX2 2 /*!< AVX2 for Haswell and Bulldozer */ +# define XXH_AVX512 3 /*!< AVX512 for Skylake and Icelake */ +# define XXH_NEON 4 /*!< NEON for most ARMv7-A, all AArch64, and WASM SIMD128 */ +# define XXH_VSX 5 /*!< VSX and ZVector for POWER8/z13 (64-bit) */ +# define XXH_SVE 6 /*!< SVE for some ARMv8-A and ARMv9-A */ +# define XXH_LSX 7 /*!< LSX (128-bit SIMD) for LoongArch64 */ +# define XXH_LASX 8 /*!< LASX (256-bit SIMD) for LoongArch64 */ + + /*-********************************************************************** * XXH3 64-bit variant ************************************************************************/ @@ -1345,7 +1363,7 @@ XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update (XXH_NOESCAPE XXH3_state_t* stat * * @see @ref streaming_example "Streaming Example" */ -XXH_PUBLIC_API XXH_PUREF XXH64_hash_t XXH3_64bits_digest (XXH_NOESCAPE const XXH3_state_t* statePtr); +XXH_PUBLIC_API XXH_PUREF XXH64_hash_t XXH3_64bits_digest (XXH_NOESCAPE const XXH3_state_t* statePtr); #endif /* !XXH_NO_STREAM */ /* note : canonical representation of XXH3 is the same as XXH64 @@ -1655,9 +1673,9 @@ XXH_PUBLIC_API XXH_PUREF XXH128_hash_t XXH128_hashFromCanonical(XXH_NOESCAPE con struct XXH32_state_s { XXH32_hash_t total_len_32; /*!< Total length hashed, modulo 2^32 */ XXH32_hash_t large_len; /*!< Whether the hash is >= 16 (handles @ref total_len_32 overflow) */ - XXH32_hash_t v[4]; /*!< Accumulator lanes */ - XXH32_hash_t mem32[4]; /*!< Internal buffer for partial reads. Treated as unsigned char[16]. */ - XXH32_hash_t memsize; /*!< Amount of data in @ref mem32 */ + XXH32_hash_t acc[4]; /*!< Accumulator lanes */ + unsigned char buffer[16]; /*!< Internal buffer for partial reads. */ + XXH32_hash_t bufferedSize; /*!< Amount of data in @ref buffer */ XXH32_hash_t reserved; /*!< Reserved field. Do not read nor write to it. */ }; /* typedef'd to XXH32_state_t */ @@ -1678,9 +1696,9 @@ struct XXH32_state_s { */ struct XXH64_state_s { XXH64_hash_t total_len; /*!< Total length hashed. This is always 64-bit. */ - XXH64_hash_t v[4]; /*!< Accumulator lanes */ - XXH64_hash_t mem64[4]; /*!< Internal buffer for partial reads. Treated as unsigned char[32]. */ - XXH32_hash_t memsize; /*!< Amount of data in @ref mem64 */ + XXH64_hash_t acc[4]; /*!< Accumulator lanes */ + unsigned char buffer[32]; /*!< Internal buffer for partial reads.. */ + XXH32_hash_t bufferedSize; /*!< Amount of data in @ref buffer */ XXH32_hash_t reserved32; /*!< Reserved field, needed for padding anyways*/ XXH64_hash_t reserved64; /*!< Reserved field. Do not read or write to it. */ }; /* typedef'd to XXH64_state_t */ @@ -1710,6 +1728,7 @@ struct XXH64_state_s { #endif /*! + * @internal * @brief The size of the internal XXH3 buffer. * * This is the optimal update size for incremental hashing. @@ -1719,10 +1738,11 @@ struct XXH64_state_s { #define XXH3_INTERNALBUFFER_SIZE 256 /*! - * @internal - * @brief Default size of the secret buffer (and @ref XXH3_kSecret). + * @def XXH3_SECRET_DEFAULT_SIZE + * @brief Default Secret's size * - * This is the size used in @ref XXH3_kSecret and the seeded functions. + * This is the size of internal XXH3_kSecret + * and is needed by XXH3_generateSecret_fromSeed(). * * Not to be confused with @ref XXH3_SECRET_SIZE_MIN. */ @@ -1752,7 +1772,7 @@ struct XXH64_state_s { */ struct XXH3_state_s { XXH_ALIGN_MEMBER(64, XXH64_hash_t acc[8]); - /*!< The 8 accumulators. See @ref XXH32_state_s::v and @ref XXH64_state_s::v */ + /*!< The 8 accumulators. See @ref XXH32_state_s::acc and @ref XXH64_state_s::acc */ XXH_ALIGN_MEMBER(64, unsigned char customSecret[XXH3_SECRET_DEFAULT_SIZE]); /*!< Used to store a custom secret generated from a seed. */ XXH_ALIGN_MEMBER(64, unsigned char buffer[XXH3_INTERNALBUFFER_SIZE]); @@ -1969,7 +1989,7 @@ XXH3_64bits_withSecretandSeed(XXH_NOESCAPE const void* data, size_t len, /*! * @brief Calculates 128-bit seeded variant of XXH3 hash of @p data. * - * @param data The memory segment to be hashed, at least @p len bytes in size. + * @param input The memory segment to be hashed, at least @p len bytes in size. * @param length The length of @p data, in bytes. * @param secret The secret used to alter hash result predictably. * @param secretSize The length of @p secret, in bytes (must be >= XXH3_SECRET_SIZE_MIN) @@ -2374,16 +2394,35 @@ static void XXH_free(void* p) { free(p); } #endif /* XXH_NO_STDLIB */ -#include +#ifndef XXH_memcpy +/*! + * @internal + * @brief XXH_memcpy() macro can be redirected at compile time + */ +# include +# define XXH_memcpy memcpy +#endif +#ifndef XXH_memset /*! * @internal - * @brief Modify this function to use a different routine than memcpy(). + * @brief XXH_memset() macro can be redirected at compile time */ -static void* XXH_memcpy(void* dest, const void* src, size_t size) -{ - return memcpy(dest,src,size); -} +# include +# define XXH_memset memset +#endif + +#ifndef XXH_memcmp +/*! + * @internal + * @brief XXH_memcmp() macro can be redirected at compile time + * Note: only needed by XXH128. + */ +# include +# define XXH_memcmp memcmp +#endif + + #include /* ULLONG_MAX */ @@ -2418,12 +2457,34 @@ static void* XXH_memcpy(void* dest, const void* src, size_t size) # define XXH_NO_INLINE static #endif +#if defined(XXH_INLINE_ALL) +# define XXH_STATIC XXH_FORCE_INLINE +#else +# define XXH_STATIC static +#endif + #if XXH3_INLINE_SECRET # define XXH3_WITH_SECRET_INLINE XXH_FORCE_INLINE #else # define XXH3_WITH_SECRET_INLINE XXH_NO_INLINE #endif +#if ((defined(sun) || defined(__sun)) && __cplusplus) /* Solaris includes __STDC_VERSION__ with C++. Tested with GCC 5.5 */ +# define XXH_RESTRICT /* disable */ +#elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L /* >= C99 */ +# define XXH_RESTRICT restrict +#elif (defined (__GNUC__) && ((__GNUC__ > 3) || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1))) \ + || (defined (__clang__)) \ + || (defined (_MSC_VER) && (_MSC_VER >= 1400)) \ + || (defined (__INTEL_COMPILER) && (__INTEL_COMPILER >= 1300)) +/* + * There are a LOT more compilers that recognize __restrict but this + * covers the major ones. + */ +# define XXH_RESTRICT __restrict +#else +# define XXH_RESTRICT /* disable */ +#endif /* ************************************* * Debug @@ -2955,6 +3016,61 @@ static xxh_u32 XXH32_avalanche(xxh_u32 hash) #define XXH_get32bits(p) XXH_readLE32_align(p, align) +/*! + * @internal + * @brief Sets up the initial accumulator state for XXH32(). + */ +XXH_FORCE_INLINE void +XXH32_initAccs(xxh_u32 *acc, xxh_u32 seed) +{ + XXH_ASSERT(acc != NULL); + acc[0] = seed + XXH_PRIME32_1 + XXH_PRIME32_2; + acc[1] = seed + XXH_PRIME32_2; + acc[2] = seed + 0; + acc[3] = seed - XXH_PRIME32_1; +} + +/*! + * @internal + * @brief Consumes a block of data for XXH32(). + * + * @return the end input pointer. + */ +XXH_FORCE_INLINE const xxh_u8 * +XXH32_consumeLong( + xxh_u32 *XXH_RESTRICT acc, + xxh_u8 const *XXH_RESTRICT input, + size_t len, + XXH_alignment align +) +{ + const xxh_u8* const bEnd = input + len; + const xxh_u8* const limit = bEnd - 15; + XXH_ASSERT(acc != NULL); + XXH_ASSERT(input != NULL); + XXH_ASSERT(len >= 16); + do { + acc[0] = XXH32_round(acc[0], XXH_get32bits(input)); input += 4; + acc[1] = XXH32_round(acc[1], XXH_get32bits(input)); input += 4; + acc[2] = XXH32_round(acc[2], XXH_get32bits(input)); input += 4; + acc[3] = XXH32_round(acc[3], XXH_get32bits(input)); input += 4; + } while (input < limit); + + return input; +} + +/*! + * @internal + * @brief Merges the accumulator lanes together for XXH32() + */ +XXH_FORCE_INLINE XXH_PUREF xxh_u32 +XXH32_mergeAccs(const xxh_u32 *acc) +{ + XXH_ASSERT(acc != NULL); + return XXH_rotl32(acc[0], 1) + XXH_rotl32(acc[1], 7) + + XXH_rotl32(acc[2], 12) + XXH_rotl32(acc[3], 18); +} + /*! * @internal * @brief Processes the last 0-15 bytes of @p ptr. @@ -3067,22 +3183,12 @@ XXH32_endian_align(const xxh_u8* input, size_t len, xxh_u32 seed, XXH_alignment if (input==NULL) XXH_ASSERT(len == 0); if (len>=16) { - const xxh_u8* const bEnd = input + len; - const xxh_u8* const limit = bEnd - 15; - xxh_u32 v1 = seed + XXH_PRIME32_1 + XXH_PRIME32_2; - xxh_u32 v2 = seed + XXH_PRIME32_2; - xxh_u32 v3 = seed + 0; - xxh_u32 v4 = seed - XXH_PRIME32_1; + xxh_u32 acc[4]; + XXH32_initAccs(acc, seed); - do { - v1 = XXH32_round(v1, XXH_get32bits(input)); input += 4; - v2 = XXH32_round(v2, XXH_get32bits(input)); input += 4; - v3 = XXH32_round(v3, XXH_get32bits(input)); input += 4; - v4 = XXH32_round(v4, XXH_get32bits(input)); input += 4; - } while (input < limit); - - h32 = XXH_rotl32(v1, 1) + XXH_rotl32(v2, 7) - + XXH_rotl32(v3, 12) + XXH_rotl32(v4, 18); + input = XXH32_consumeLong(acc, input, len, align); + + h32 = XXH32_mergeAccs(acc); } else { h32 = seed + XXH_PRIME32_5; } @@ -3137,11 +3243,8 @@ XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dstState, const XXH32_state_t XXH_PUBLIC_API XXH_errorcode XXH32_reset(XXH32_state_t* statePtr, XXH32_hash_t seed) { XXH_ASSERT(statePtr != NULL); - memset(statePtr, 0, sizeof(*statePtr)); - statePtr->v[0] = seed + XXH_PRIME32_1 + XXH_PRIME32_2; - statePtr->v[1] = seed + XXH_PRIME32_2; - statePtr->v[2] = seed + 0; - statePtr->v[3] = seed - XXH_PRIME32_1; + XXH_memset(statePtr, 0, sizeof(*statePtr)); + XXH32_initAccs(statePtr->acc, seed); return XXH_OK; } @@ -3155,45 +3258,37 @@ XXH32_update(XXH32_state_t* state, const void* input, size_t len) return XXH_OK; } - { const xxh_u8* p = (const xxh_u8*)input; - const xxh_u8* const bEnd = p + len; + state->total_len_32 += (XXH32_hash_t)len; + state->large_len |= (XXH32_hash_t)((len>=16) | (state->total_len_32>=16)); - state->total_len_32 += (XXH32_hash_t)len; - state->large_len |= (XXH32_hash_t)((len>=16) | (state->total_len_32>=16)); + XXH_ASSERT(state->bufferedSize < sizeof(state->buffer)); + if (len < sizeof(state->buffer) - state->bufferedSize) { /* fill in tmp buffer */ + XXH_memcpy(state->buffer + state->bufferedSize, input, len); + state->bufferedSize += (XXH32_hash_t)len; + return XXH_OK; + } - if (state->memsize + len < 16) { /* fill in tmp buffer */ - XXH_memcpy((xxh_u8*)(state->mem32) + state->memsize, input, len); - state->memsize += (XXH32_hash_t)len; - return XXH_OK; - } + { const xxh_u8* xinput = (const xxh_u8*)input; + const xxh_u8* const bEnd = xinput + len; - if (state->memsize) { /* some data left from previous update */ - XXH_memcpy((xxh_u8*)(state->mem32) + state->memsize, input, 16-state->memsize); - { const xxh_u32* p32 = state->mem32; - state->v[0] = XXH32_round(state->v[0], XXH_readLE32(p32)); p32++; - state->v[1] = XXH32_round(state->v[1], XXH_readLE32(p32)); p32++; - state->v[2] = XXH32_round(state->v[2], XXH_readLE32(p32)); p32++; - state->v[3] = XXH32_round(state->v[3], XXH_readLE32(p32)); - } - p += 16-state->memsize; - state->memsize = 0; + if (state->bufferedSize) { /* non-empty buffer: complete first */ + XXH_memcpy(state->buffer + state->bufferedSize, xinput, sizeof(state->buffer) - state->bufferedSize); + xinput += sizeof(state->buffer) - state->bufferedSize; + /* then process one round */ + (void)XXH32_consumeLong(state->acc, state->buffer, sizeof(state->buffer), XXH_aligned); + state->bufferedSize = 0; } - if (p <= bEnd-16) { - const xxh_u8* const limit = bEnd - 16; - - do { - state->v[0] = XXH32_round(state->v[0], XXH_readLE32(p)); p+=4; - state->v[1] = XXH32_round(state->v[1], XXH_readLE32(p)); p+=4; - state->v[2] = XXH32_round(state->v[2], XXH_readLE32(p)); p+=4; - state->v[3] = XXH32_round(state->v[3], XXH_readLE32(p)); p+=4; - } while (p<=limit); - + XXH_ASSERT(xinput <= bEnd); + if ((size_t)(bEnd - xinput) >= sizeof(state->buffer)) { + /* Process the remaining data */ + xinput = XXH32_consumeLong(state->acc, xinput, (size_t)(bEnd - xinput), XXH_unaligned); } - if (p < bEnd) { - XXH_memcpy(state->mem32, p, (size_t)(bEnd-p)); - state->memsize = (unsigned)(bEnd-p); + if (xinput < bEnd) { + /* Copy the leftover to the tmp buffer */ + XXH_memcpy(state->buffer, xinput, (size_t)(bEnd-xinput)); + state->bufferedSize = (unsigned)(bEnd-xinput); } } @@ -3207,17 +3302,14 @@ XXH_PUBLIC_API XXH32_hash_t XXH32_digest(const XXH32_state_t* state) xxh_u32 h32; if (state->large_len) { - h32 = XXH_rotl32(state->v[0], 1) - + XXH_rotl32(state->v[1], 7) - + XXH_rotl32(state->v[2], 12) - + XXH_rotl32(state->v[3], 18); + h32 = XXH32_mergeAccs(state->acc); } else { - h32 = state->v[2] /* == seed */ + XXH_PRIME32_5; + h32 = state->acc[2] /* == seed */ + XXH_PRIME32_5; } h32 += state->total_len_32; - return XXH32_finalize(h32, (const xxh_u8*)state->mem32, state->memsize, XXH_aligned); + return XXH32_finalize(h32, state->buffer, state->bufferedSize, XXH_aligned); } #endif /* !XXH_NO_STREAM */ @@ -3443,6 +3535,85 @@ static xxh_u64 XXH64_avalanche(xxh_u64 hash) #define XXH_get64bits(p) XXH_readLE64_align(p, align) +/*! + * @internal + * @brief Sets up the initial accumulator state for XXH64(). + */ +XXH_FORCE_INLINE void +XXH64_initAccs(xxh_u64 *acc, xxh_u64 seed) +{ + XXH_ASSERT(acc != NULL); + acc[0] = seed + XXH_PRIME64_1 + XXH_PRIME64_2; + acc[1] = seed + XXH_PRIME64_2; + acc[2] = seed + 0; + acc[3] = seed - XXH_PRIME64_1; +} + +/*! + * @internal + * @brief Consumes a block of data for XXH64(). + * + * @return the end input pointer. + */ +XXH_FORCE_INLINE const xxh_u8 * +XXH64_consumeLong( + xxh_u64 *XXH_RESTRICT acc, + xxh_u8 const *XXH_RESTRICT input, + size_t len, + XXH_alignment align +) +{ + const xxh_u8* const bEnd = input + len; + const xxh_u8* const limit = bEnd - 31; + XXH_ASSERT(acc != NULL); + XXH_ASSERT(input != NULL); + XXH_ASSERT(len >= 32); + do { + /* reroll on 32-bit */ + if (sizeof(void *) < sizeof(xxh_u64)) { + size_t i; + for (i = 0; i < 4; i++) { + acc[i] = XXH64_round(acc[i], XXH_get64bits(input)); + input += 8; + } + } else { + acc[0] = XXH64_round(acc[0], XXH_get64bits(input)); input += 8; + acc[1] = XXH64_round(acc[1], XXH_get64bits(input)); input += 8; + acc[2] = XXH64_round(acc[2], XXH_get64bits(input)); input += 8; + acc[3] = XXH64_round(acc[3], XXH_get64bits(input)); input += 8; + } + } while (input < limit); + + return input; +} + +/*! + * @internal + * @brief Merges the accumulator lanes together for XXH64() + */ +XXH_FORCE_INLINE XXH_PUREF xxh_u64 +XXH64_mergeAccs(const xxh_u64 *acc) +{ + XXH_ASSERT(acc != NULL); + { + xxh_u64 h64 = XXH_rotl64(acc[0], 1) + XXH_rotl64(acc[1], 7) + + XXH_rotl64(acc[2], 12) + XXH_rotl64(acc[3], 18); + /* reroll on 32-bit */ + if (sizeof(void *) < sizeof(xxh_u64)) { + size_t i; + for (i = 0; i < 4; i++) { + h64 = XXH64_mergeRound(h64, acc[i]); + } + } else { + h64 = XXH64_mergeRound(h64, acc[0]); + h64 = XXH64_mergeRound(h64, acc[1]); + h64 = XXH64_mergeRound(h64, acc[2]); + h64 = XXH64_mergeRound(h64, acc[3]); + } + return h64; + } +} + /*! * @internal * @brief Processes the last 0-31 bytes of @p ptr. @@ -3458,7 +3629,7 @@ static xxh_u64 XXH64_avalanche(xxh_u64 hash) * @return The finalized hash * @see XXH32_finalize(). */ -static XXH_PUREF xxh_u64 +XXH_STATIC XXH_PUREF xxh_u64 XXH64_finalize(xxh_u64 hash, const xxh_u8* ptr, size_t len, XXH_alignment align) { if (ptr==NULL) XXH_ASSERT(len == 0); @@ -3508,27 +3679,13 @@ XXH64_endian_align(const xxh_u8* input, size_t len, xxh_u64 seed, XXH_alignment xxh_u64 h64; if (input==NULL) XXH_ASSERT(len == 0); - if (len>=32) { - const xxh_u8* const bEnd = input + len; - const xxh_u8* const limit = bEnd - 31; - xxh_u64 v1 = seed + XXH_PRIME64_1 + XXH_PRIME64_2; - xxh_u64 v2 = seed + XXH_PRIME64_2; - xxh_u64 v3 = seed + 0; - xxh_u64 v4 = seed - XXH_PRIME64_1; + if (len>=32) { /* Process a large block of data */ + xxh_u64 acc[4]; + XXH64_initAccs(acc, seed); - do { - v1 = XXH64_round(v1, XXH_get64bits(input)); input+=8; - v2 = XXH64_round(v2, XXH_get64bits(input)); input+=8; - v3 = XXH64_round(v3, XXH_get64bits(input)); input+=8; - v4 = XXH64_round(v4, XXH_get64bits(input)); input+=8; - } while (inputv[0] = seed + XXH_PRIME64_1 + XXH_PRIME64_2; - statePtr->v[1] = seed + XXH_PRIME64_2; - statePtr->v[2] = seed + 0; - statePtr->v[3] = seed - XXH_PRIME64_1; + XXH_memset(statePtr, 0, sizeof(*statePtr)); + XXH64_initAccs(statePtr->acc, seed); return XXH_OK; } @@ -3600,42 +3754,36 @@ XXH64_update (XXH_NOESCAPE XXH64_state_t* state, XXH_NOESCAPE const void* input, return XXH_OK; } - { const xxh_u8* p = (const xxh_u8*)input; - const xxh_u8* const bEnd = p + len; + state->total_len += len; - state->total_len += len; + XXH_ASSERT(state->bufferedSize <= sizeof(state->buffer)); + if (len < sizeof(state->buffer) - state->bufferedSize) { /* fill in tmp buffer */ + XXH_memcpy(state->buffer + state->bufferedSize, input, len); + state->bufferedSize += (XXH32_hash_t)len; + return XXH_OK; + } - if (state->memsize + len < 32) { /* fill in tmp buffer */ - XXH_memcpy(((xxh_u8*)state->mem64) + state->memsize, input, len); - state->memsize += (xxh_u32)len; - return XXH_OK; - } + { const xxh_u8* xinput = (const xxh_u8*)input; + const xxh_u8* const bEnd = xinput + len; - if (state->memsize) { /* tmp buffer is full */ - XXH_memcpy(((xxh_u8*)state->mem64) + state->memsize, input, 32-state->memsize); - state->v[0] = XXH64_round(state->v[0], XXH_readLE64(state->mem64+0)); - state->v[1] = XXH64_round(state->v[1], XXH_readLE64(state->mem64+1)); - state->v[2] = XXH64_round(state->v[2], XXH_readLE64(state->mem64+2)); - state->v[3] = XXH64_round(state->v[3], XXH_readLE64(state->mem64+3)); - p += 32 - state->memsize; - state->memsize = 0; + if (state->bufferedSize) { /* non-empty buffer => complete first */ + XXH_memcpy(state->buffer + state->bufferedSize, xinput, sizeof(state->buffer) - state->bufferedSize); + xinput += sizeof(state->buffer) - state->bufferedSize; + /* and process one round */ + (void)XXH64_consumeLong(state->acc, state->buffer, sizeof(state->buffer), XXH_aligned); + state->bufferedSize = 0; } - if (p+32 <= bEnd) { - const xxh_u8* const limit = bEnd - 32; - - do { - state->v[0] = XXH64_round(state->v[0], XXH_readLE64(p)); p+=8; - state->v[1] = XXH64_round(state->v[1], XXH_readLE64(p)); p+=8; - state->v[2] = XXH64_round(state->v[2], XXH_readLE64(p)); p+=8; - state->v[3] = XXH64_round(state->v[3], XXH_readLE64(p)); p+=8; - } while (p<=limit); - + XXH_ASSERT(xinput <= bEnd); + if ((size_t)(bEnd - xinput) >= sizeof(state->buffer)) { + /* Process the remaining data */ + xinput = XXH64_consumeLong(state->acc, xinput, (size_t)(bEnd - xinput), XXH_unaligned); } - if (p < bEnd) { - XXH_memcpy(state->mem64, p, (size_t)(bEnd-p)); - state->memsize = (unsigned)(bEnd-p); + if (xinput < bEnd) { + /* Copy the leftover to the tmp buffer */ + XXH_memcpy(state->buffer, xinput, (size_t)(bEnd-xinput)); + state->bufferedSize = (unsigned)(bEnd-xinput); } } @@ -3649,18 +3797,14 @@ XXH_PUBLIC_API XXH64_hash_t XXH64_digest(XXH_NOESCAPE const XXH64_state_t* state xxh_u64 h64; if (state->total_len >= 32) { - h64 = XXH_rotl64(state->v[0], 1) + XXH_rotl64(state->v[1], 7) + XXH_rotl64(state->v[2], 12) + XXH_rotl64(state->v[3], 18); - h64 = XXH64_mergeRound(h64, state->v[0]); - h64 = XXH64_mergeRound(h64, state->v[1]); - h64 = XXH64_mergeRound(h64, state->v[2]); - h64 = XXH64_mergeRound(h64, state->v[3]); + h64 = XXH64_mergeAccs(state->acc); } else { - h64 = state->v[2] /*seed*/ + XXH_PRIME64_5; + h64 = state->acc[2] /*seed*/ + XXH_PRIME64_5; } h64 += (xxh_u64) state->total_len; - return XXH64_finalize(h64, (const xxh_u8*)state->mem64, (size_t)state->total_len, XXH_aligned); + return XXH64_finalize(h64, state->buffer, (size_t)state->total_len, XXH_aligned); } #endif /* !XXH_NO_STREAM */ @@ -3695,22 +3839,6 @@ XXH_PUBLIC_API XXH64_hash_t XXH64_hashFromCanonical(XXH_NOESCAPE const XXH64_can /* === Compiler specifics === */ -#if ((defined(sun) || defined(__sun)) && __cplusplus) /* Solaris includes __STDC_VERSION__ with C++. Tested with GCC 5.5 */ -# define XXH_RESTRICT /* disable */ -#elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L /* >= C99 */ -# define XXH_RESTRICT restrict -#elif (defined (__GNUC__) && ((__GNUC__ > 3) || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1))) \ - || (defined (__clang__)) \ - || (defined (_MSC_VER) && (_MSC_VER >= 1400)) \ - || (defined (__INTEL_COMPILER) && (__INTEL_COMPILER >= 1300)) -/* - * There are a LOT more compilers that recognize __restrict but this - * covers the major ones. - */ -# define XXH_RESTRICT __restrict -#else -# define XXH_RESTRICT /* disable */ -#endif #if (defined(__GNUC__) && (__GNUC__ >= 3)) \ || (defined(__INTEL_COMPILER) && (__INTEL_COMPILER >= 800)) \ @@ -3749,6 +3877,11 @@ XXH_PUBLIC_API XXH64_hash_t XXH64_hashFromCanonical(XXH_NOESCAPE const XXH64_can # include # elif defined(__SSE2__) # include +# elif defined(__loongarch_asx) +# include +# include +# elif defined(__loongarch_sx) +# include # endif #endif @@ -3838,40 +3971,13 @@ XXH_PUBLIC_API XXH64_hash_t XXH64_hashFromCanonical(XXH_NOESCAPE const XXH64_can * @ingroup tuning * @brief Overrides the vectorization implementation chosen for XXH3. * - * Can be defined to 0 to disable SIMD or any of the values mentioned in - * @ref XXH_VECTOR_TYPE. + * Can be defined to 0 to disable SIMD, + * or any other authorized value of @ref XXH_VECTOR. * * If this is not defined, it uses predefined macros to determine the best * implementation. */ # define XXH_VECTOR XXH_SCALAR -/*! - * @ingroup tuning - * @brief Possible values for @ref XXH_VECTOR. - * - * Note that these are actually implemented as macros. - * - * If this is not defined, it is detected automatically. - * internal macro XXH_X86DISPATCH overrides this. - */ -enum XXH_VECTOR_TYPE /* fake enum */ { - XXH_SCALAR = 0, /*!< Portable scalar version */ - XXH_SSE2 = 1, /*!< - * SSE2 for Pentium 4, Opteron, all x86_64. - * - * @note SSE2 is also guaranteed on Windows 10, macOS, and - * Android x86. - */ - XXH_AVX2 = 2, /*!< AVX2 for Haswell and Bulldozer */ - XXH_AVX512 = 3, /*!< AVX512 for Skylake and Icelake */ - XXH_NEON = 4, /*!< - * NEON for most ARMv7-A, all AArch64, and WASM SIMD128 - * via the SIMDeverywhere polyfill provided with the - * Emscripten SDK. - */ - XXH_VSX = 5, /*!< VSX and ZVector for POWER8/z13 (64-bit) */ - XXH_SVE = 6, /*!< SVE for some ARMv8-A and ARMv9-A */ -}; /*! * @ingroup tuning * @brief Selects the minimum alignment for XXH3's accumulators. @@ -3886,13 +3992,6 @@ enum XXH_VECTOR_TYPE /* fake enum */ { /* Actual definition */ #ifndef XXH_DOXYGEN -# define XXH_SCALAR 0 -# define XXH_SSE2 1 -# define XXH_AVX2 2 -# define XXH_AVX512 3 -# define XXH_NEON 4 -# define XXH_VSX 5 -# define XXH_SVE 6 #endif #ifndef XXH_VECTOR /* can be defined on command line */ @@ -3911,12 +4010,16 @@ enum XXH_VECTOR_TYPE /* fake enum */ { # define XXH_VECTOR XXH_AVX512 # elif defined(__AVX2__) # define XXH_VECTOR XXH_AVX2 -# elif defined(__SSE2__) || defined(_M_AMD64) || defined(_M_X64) || (defined(_M_IX86_FP) && (_M_IX86_FP == 2)) +# elif defined(__SSE2__) || defined(_M_X64) || (defined(_M_IX86_FP) && (_M_IX86_FP == 2)) # define XXH_VECTOR XXH_SSE2 # elif (defined(__PPC64__) && defined(__POWER8_VECTOR__)) \ || (defined(__s390x__) && defined(__VEC__)) \ && defined(__GNUC__) /* TODO: IBM XL */ # define XXH_VECTOR XXH_VSX +# elif defined(__loongarch_asx) +# define XXH_VECTOR XXH_LASX +# elif defined(__loongarch_sx) +# define XXH_VECTOR XXH_LSX # else # define XXH_VECTOR XXH_SCALAR # endif @@ -3954,6 +4057,10 @@ enum XXH_VECTOR_TYPE /* fake enum */ { # define XXH_ACC_ALIGN 64 # elif XXH_VECTOR == XXH_SVE /* sve */ # define XXH_ACC_ALIGN 64 +# elif XXH_VECTOR == XXH_LASX /* lasx */ +# define XXH_ACC_ALIGN 64 +# elif XXH_VECTOR == XXH_LSX /* lsx */ +# define XXH_ACC_ALIGN 64 # endif #endif @@ -4283,7 +4390,10 @@ do { \ # error "default keyset is not large enough" #endif -/*! Pseudorandom secret taken directly from FARSH. */ +/*! + * @internal + * @def XXH3_kSecret + * @brief Pseudorandom secret taken directly from FARSH. */ XXH_ALIGN(64) static const xxh_u8 XXH3_kSecret[XXH_SECRET_DEFAULT_SIZE] = { 0xb8, 0xfe, 0x6c, 0x39, 0x23, 0xa4, 0x4b, 0xbe, 0x7c, 0x01, 0x81, 0x2c, 0xf7, 0x21, 0xad, 0x1c, 0xde, 0xd4, 0x6d, 0xe9, 0x83, 0x90, 0x97, 0xdb, 0x72, 0x40, 0xa4, 0xa4, 0xb7, 0xb3, 0x67, 0x1f, @@ -5591,6 +5701,130 @@ XXH3_accumulate_sve(xxh_u64* XXH_RESTRICT acc, #endif +#if (XXH_VECTOR == XXH_LSX) +#define _LSX_SHUFFLE(z, y, x, w) (((z) << 6) | ((y) << 4) | ((x) << 2) | (w)) + +XXH_FORCE_INLINE void +XXH3_accumulate_512_lsx( void* XXH_RESTRICT acc, + const void* XXH_RESTRICT input, + const void* XXH_RESTRICT secret) +{ + XXH_ASSERT((((size_t)acc) & 15) == 0); + { + __m128i* const xacc = (__m128i *) acc; + const __m128i* const xinput = (const __m128i *) input; + const __m128i* const xsecret = (const __m128i *) secret; + + for (size_t i = 0; i < XXH_STRIPE_LEN / sizeof(__m128i); i++) { + /* data_vec = xinput[i]; */ + __m128i const data_vec = __lsx_vld(xinput + i, 0); + /* key_vec = xsecret[i]; */ + __m128i const key_vec = __lsx_vld(xsecret + i, 0); + /* data_key = data_vec ^ key_vec; */ + __m128i const data_key = __lsx_vxor_v(data_vec, key_vec); + /* data_key_lo = data_key >> 32; */ + __m128i const data_key_lo = __lsx_vsrli_d(data_key, 32); + // __m128i const data_key_lo = __lsx_vsrli_d(data_key, 32); + /* product = (data_key & 0xffffffff) * (data_key_lo & 0xffffffff); */ + __m128i const product = __lsx_vmulwev_d_wu(data_key, data_key_lo); + /* xacc[i] += swap(data_vec); */ + __m128i const data_swap = __lsx_vshuf4i_w(data_vec, _LSX_SHUFFLE(1, 0, 3, 2)); + __m128i const sum = __lsx_vadd_d(xacc[i], data_swap); + /* xacc[i] += product; */ + xacc[i] = __lsx_vadd_d(product, sum); + } + } +} +XXH_FORCE_INLINE XXH3_ACCUMULATE_TEMPLATE(lsx) + +XXH_FORCE_INLINE void +XXH3_scrambleAcc_lsx(void* XXH_RESTRICT acc, const void* XXH_RESTRICT secret) +{ + XXH_ASSERT((((size_t)acc) & 15) == 0); + { + __m128i* const xacc = (__m128i*) acc; + const __m128i* const xsecret = (const __m128i *) secret; + const __m128i prime32 = __lsx_vreplgr2vr_d(XXH_PRIME32_1); + + for (size_t i = 0; i < XXH_STRIPE_LEN / sizeof(__m128i); i++) { + /* xacc[i] ^= (xacc[i] >> 47) */ + __m128i const acc_vec = xacc[i]; + __m128i const shifted = __lsx_vsrli_d(acc_vec, 47); + __m128i const data_vec = __lsx_vxor_v(acc_vec, shifted); + /* xacc[i] ^= xsecret[i]; */ + __m128i const key_vec = __lsx_vld(xsecret + i, 0); + __m128i const data_key = __lsx_vxor_v(data_vec, key_vec); + + /* xacc[i] *= XXH_PRIME32_1; */ + xacc[i] = __lsx_vmul_d(data_key, prime32); + } + } +} + +#endif + +#if (XXH_VECTOR == XXH_LASX) +#define _LASX_SHUFFLE(z, y, x, w) (((z) << 6) | ((y) << 4) | ((x) << 2) | (w)) + +XXH_FORCE_INLINE void +XXH3_accumulate_512_lasx( void* XXH_RESTRICT acc, + const void* XXH_RESTRICT input, + const void* XXH_RESTRICT secret) +{ + XXH_ASSERT((((size_t)acc) & 31) == 0); + { + __m256i* const xacc = (__m256i *) acc; + const __m256i* const xinput = (const __m256i *) input; + const __m256i* const xsecret = (const __m256i *) secret; + + for (size_t i = 0; i < XXH_STRIPE_LEN / sizeof(__m256i); i++) { + /* data_vec = xinput[i]; */ + __m256i const data_vec = __lasx_xvld(xinput + i, 0); + /* key_vec = xsecret[i]; */ + __m256i const key_vec = __lasx_xvld(xsecret + i, 0); + /* data_key = data_vec ^ key_vec; */ + __m256i const data_key = __lasx_xvxor_v(data_vec, key_vec); + /* data_key_lo = data_key >> 32; */ + __m256i const data_key_lo = __lasx_xvsrli_d(data_key, 32); + // __m256i const data_key_lo = __lasx_xvsrli_d(data_key, 32); + /* product = (data_key & 0xffffffff) * (data_key_lo & 0xffffffff); */ + __m256i const product = __lasx_xvmulwev_d_wu(data_key, data_key_lo); + /* xacc[i] += swap(data_vec); */ + __m256i const data_swap = __lasx_xvshuf4i_w(data_vec, _LASX_SHUFFLE(1, 0, 3, 2)); + __m256i const sum = __lasx_xvadd_d(xacc[i], data_swap); + /* xacc[i] += product; */ + xacc[i] = __lasx_xvadd_d(product, sum); + } + } +} +XXH_FORCE_INLINE XXH3_ACCUMULATE_TEMPLATE(lasx) + +XXH_FORCE_INLINE void +XXH3_scrambleAcc_lasx(void* XXH_RESTRICT acc, const void* XXH_RESTRICT secret) +{ + XXH_ASSERT((((size_t)acc) & 31) == 0); + { + __m256i* const xacc = (__m256i*) acc; + const __m256i* const xsecret = (const __m256i *) secret; + const __m256i prime32 = __lasx_xvreplgr2vr_d(XXH_PRIME32_1); + + for (size_t i = 0; i < XXH_STRIPE_LEN / sizeof(__m256i); i++) { + /* xacc[i] ^= (xacc[i] >> 47) */ + __m256i const acc_vec = xacc[i]; + __m256i const shifted = __lasx_xvsrli_d(acc_vec, 47); + __m256i const data_vec = __lasx_xvxor_v(acc_vec, shifted); + /* xacc[i] ^= xsecret[i]; */ + __m256i const key_vec = __lasx_xvld(xsecret + i, 0); + __m256i const data_key = __lasx_xvxor_v(data_vec, key_vec); + + /* xacc[i] *= XXH_PRIME32_1; */ + xacc[i] = __lasx_xvmul_d(data_key, prime32); + } + } +} + +#endif + /* scalar variants - universal */ #if defined(__aarch64__) && (defined(__GNUC__) || defined(__clang__)) @@ -5821,6 +6055,18 @@ typedef void (*XXH3_f_initCustomSecret)(void* XXH_RESTRICT, xxh_u64); #define XXH3_scrambleAcc XXH3_scrambleAcc_scalar #define XXH3_initCustomSecret XXH3_initCustomSecret_scalar +#elif (XXH_VECTOR == XXH_LASX) +#define XXH3_accumulate_512 XXH3_accumulate_512_lasx +#define XXH3_accumulate XXH3_accumulate_lasx +#define XXH3_scrambleAcc XXH3_scrambleAcc_lasx +#define XXH3_initCustomSecret XXH3_initCustomSecret_scalar + +#elif (XXH_VECTOR == XXH_LSX) +#define XXH3_accumulate_512 XXH3_accumulate_512_lsx +#define XXH3_accumulate XXH3_accumulate_lsx +#define XXH3_scrambleAcc XXH3_scrambleAcc_lsx +#define XXH3_initCustomSecret XXH3_initCustomSecret_scalar + #else /* scalar */ #define XXH3_accumulate_512 XXH3_accumulate_512_scalar @@ -5876,7 +6122,7 @@ XXH3_mix2Accs(const xxh_u64* XXH_RESTRICT acc, const xxh_u8* XXH_RESTRICT secret acc[1] ^ XXH_readLE64(secret+8) ); } -static XXH64_hash_t +static XXH_PUREF XXH64_hash_t XXH3_mergeAccs(const xxh_u64* XXH_RESTRICT acc, const xxh_u8* XXH_RESTRICT secret, xxh_u64 start) { xxh_u64 result64 = start; @@ -5903,6 +6149,15 @@ XXH3_mergeAccs(const xxh_u64* XXH_RESTRICT acc, const xxh_u8* XXH_RESTRICT secre return XXH3_avalanche(result64); } +/* do not align on 8, so that the secret is different from the accumulator */ +#define XXH_SECRET_MERGEACCS_START 11 + +static XXH_PUREF XXH64_hash_t +XXH3_finalizeLong_64b(const xxh_u64* XXH_RESTRICT acc, const xxh_u8* XXH_RESTRICT secret, xxh_u64 len) +{ + return XXH3_mergeAccs(acc, secret + XXH_SECRET_MERGEACCS_START, len * XXH_PRIME64_1); +} + #define XXH3_INIT_ACC { XXH_PRIME32_3, XXH_PRIME64_1, XXH_PRIME64_2, XXH_PRIME64_3, \ XXH_PRIME64_4, XXH_PRIME32_2, XXH_PRIME64_5, XXH_PRIME32_1 } @@ -5918,10 +6173,8 @@ XXH3_hashLong_64b_internal(const void* XXH_RESTRICT input, size_t len, /* converge into final hash */ XXH_STATIC_ASSERT(sizeof(acc) == 64); - /* do not align on 8, so that the secret is different from the accumulator */ -#define XXH_SECRET_MERGEACCS_START 11 XXH_ASSERT(secretSize >= sizeof(acc) + XXH_SECRET_MERGEACCS_START); - return XXH3_mergeAccs(acc, (const xxh_u8*)secret + XXH_SECRET_MERGEACCS_START, (xxh_u64)len * XXH_PRIME64_1); + return XXH3_finalizeLong_64b(acc, (const xxh_u8*)secret, (xxh_u64)len); } /* @@ -6175,7 +6428,7 @@ XXH3_reset_internal(XXH3_state_t* statePtr, XXH_ASSERT(offsetof(XXH3_state_t, nbStripesPerBlock) > initStart); XXH_ASSERT(statePtr != NULL); /* set members from bufferedSize to nbStripesPerBlock (excluded) to 0 */ - memset((char*)statePtr + initStart, 0, initLength); + XXH_memset((char*)statePtr + initStart, 0, initLength); statePtr->acc[0] = XXH_PRIME32_3; statePtr->acc[1] = XXH_PRIME64_1; statePtr->acc[2] = XXH_PRIME64_2; @@ -6294,8 +6547,9 @@ XXH3_consumeStripes(xxh_u64* XXH_RESTRICT acc, # define XXH3_STREAM_USE_STACK 1 # endif #endif -/* - * Both XXH3_64bits_update and XXH3_128bits_update use this routine. +/* This function accepts f_acc and f_scramble as function pointers, + * making it possible to implement multiple variants with different acc & scramble stages. + * This is notably useful to implement multiple vector variants with different intrinsics. */ XXH_FORCE_INLINE XXH_errorcode XXH3_update(XXH3_state_t* XXH_RESTRICT const state, @@ -6376,12 +6630,21 @@ XXH3_update(XXH3_state_t* XXH_RESTRICT const state, return XXH_OK; } +/* + * Both XXH3_64bits_update and XXH3_128bits_update use this routine. + */ +XXH_NO_INLINE XXH_errorcode +XXH3_update_regular(XXH_NOESCAPE XXH3_state_t* state, XXH_NOESCAPE const void* input, size_t len) +{ + return XXH3_update(state, (const xxh_u8*)input, len, + XXH3_accumulate, XXH3_scrambleAcc); +} + /*! @ingroup XXH3_family */ XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update(XXH_NOESCAPE XXH3_state_t* state, XXH_NOESCAPE const void* input, size_t len) { - return XXH3_update(state, (const xxh_u8*)input, len, - XXH3_accumulate, XXH3_scrambleAcc); + return XXH3_update_regular(state, input, len); } @@ -6429,9 +6692,7 @@ XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_digest (XXH_NOESCAPE const XXH3_state_t* if (state->totalLen > XXH3_MIDSIZE_MAX) { XXH_ALIGN(XXH_ACC_ALIGN) XXH64_hash_t acc[XXH_ACC_NB]; XXH3_digest_long(acc, state, secret); - return XXH3_mergeAccs(acc, - secret + XXH_SECRET_MERGEACCS_START, - (xxh_u64)state->totalLen * XXH_PRIME64_1); + return XXH3_finalizeLong_64b(acc, secret, (xxh_u64)state->totalLen); } /* totalLen <= XXH3_MIDSIZE_MAX: digesting a short input */ if (state->useSeed) @@ -6723,6 +6984,17 @@ XXH3_len_129to240_128b(const xxh_u8* XXH_RESTRICT input, size_t len, } } +static XXH_PUREF XXH128_hash_t +XXH3_finalizeLong_128b(const xxh_u64* XXH_RESTRICT acc, const xxh_u8* XXH_RESTRICT secret, size_t secretSize, xxh_u64 len) +{ + XXH128_hash_t h128; + h128.low64 = XXH3_finalizeLong_64b(acc, secret, len); + h128.high64 = XXH3_mergeAccs(acc, secret + secretSize + - XXH_STRIPE_LEN - XXH_SECRET_MERGEACCS_START, + ~(len * XXH_PRIME64_2)); + return h128; +} + XXH_FORCE_INLINE XXH128_hash_t XXH3_hashLong_128b_internal(const void* XXH_RESTRICT input, size_t len, const xxh_u8* XXH_RESTRICT secret, size_t secretSize, @@ -6736,16 +7008,7 @@ XXH3_hashLong_128b_internal(const void* XXH_RESTRICT input, size_t len, /* converge into final hash */ XXH_STATIC_ASSERT(sizeof(acc) == 64); XXH_ASSERT(secretSize >= sizeof(acc) + XXH_SECRET_MERGEACCS_START); - { XXH128_hash_t h128; - h128.low64 = XXH3_mergeAccs(acc, - secret + XXH_SECRET_MERGEACCS_START, - (xxh_u64)len * XXH_PRIME64_1); - h128.high64 = XXH3_mergeAccs(acc, - secret + secretSize - - sizeof(acc) - XXH_SECRET_MERGEACCS_START, - ~((xxh_u64)len * XXH_PRIME64_2)); - return h128; - } + return XXH3_finalizeLong_128b(acc, secret, secretSize, (xxh_u64)len); } /* @@ -6917,7 +7180,7 @@ XXH3_128bits_reset_withSecretandSeed(XXH_NOESCAPE XXH3_state_t* statePtr, XXH_NO XXH_PUBLIC_API XXH_errorcode XXH3_128bits_update(XXH_NOESCAPE XXH3_state_t* state, XXH_NOESCAPE const void* input, size_t len) { - return XXH3_64bits_update(state, input, len); + return XXH3_update_regular(state, input, len); } /*! @ingroup XXH3_family */ @@ -6928,16 +7191,7 @@ XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_digest (XXH_NOESCAPE const XXH3_state_ XXH_ALIGN(XXH_ACC_ALIGN) XXH64_hash_t acc[XXH_ACC_NB]; XXH3_digest_long(acc, state, secret); XXH_ASSERT(state->secretLimit + XXH_STRIPE_LEN >= sizeof(acc) + XXH_SECRET_MERGEACCS_START); - { XXH128_hash_t h128; - h128.low64 = XXH3_mergeAccs(acc, - secret + XXH_SECRET_MERGEACCS_START, - (xxh_u64)state->totalLen * XXH_PRIME64_1); - h128.high64 = XXH3_mergeAccs(acc, - secret + state->secretLimit + XXH_STRIPE_LEN - - sizeof(acc) - XXH_SECRET_MERGEACCS_START, - ~((xxh_u64)state->totalLen * XXH_PRIME64_2)); - return h128; - } + return XXH3_finalizeLong_128b(acc, secret, state->secretLimit + XXH_STRIPE_LEN, (xxh_u64)state->totalLen); } /* len <= XXH3_MIDSIZE_MAX : short code */ if (state->useSeed) @@ -6948,14 +7202,12 @@ XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_digest (XXH_NOESCAPE const XXH3_state_ #endif /* !XXH_NO_STREAM */ /* 128-bit utility functions */ -#include /* memcmp, memcpy */ - /* return : 1 is equal, 0 if different */ /*! @ingroup XXH3_family */ XXH_PUBLIC_API int XXH128_isEqual(XXH128_hash_t h1, XXH128_hash_t h2) { /* note : XXH128_hash_t is compact, it has no padding byte */ - return !(memcmp(&h1, &h2, sizeof(h1))); + return !(XXH_memcmp(&h1, &h2, sizeof(h1))); } /* This prototype is compatible with stdlib's qsort(). @@ -7039,7 +7291,7 @@ XXH3_generateSecret(XXH_NOESCAPE void* secretBuffer, size_t secretSize, XXH_NOES { size_t pos = 0; while (pos < secretSize) { size_t const toCopy = XXH_MIN((secretSize - pos), customSeedSize); - memcpy((char*)secretBuffer + pos, customSeed, toCopy); + XXH_memcpy((char*)secretBuffer + pos, customSeed, toCopy); pos += toCopy; } } @@ -7064,7 +7316,7 @@ XXH3_generateSecret_fromSeed(XXH_NOESCAPE void* secretBuffer, XXH64_hash_t seed) XXH_ALIGN(XXH_SEC_ALIGN) xxh_u8 secret[XXH_SECRET_DEFAULT_SIZE]; XXH3_initCustomSecret(secret, seed); XXH_ASSERT(secretBuffer != NULL); - memcpy(secretBuffer, secret, XXH_SECRET_DEFAULT_SIZE); + XXH_memcpy(secretBuffer, secret, XXH_SECRET_DEFAULT_SIZE); } @@ -7086,6 +7338,6 @@ XXH3_generateSecret_fromSeed(XXH_NOESCAPE void* secretBuffer, XXH64_hash_t seed) #endif /* XXH_IMPLEMENTATION */ -#if defined (__cplusplus) +#if defined (__cplusplus) && !defined(XXH_NO_EXTERNC_GUARD) } /* extern "C" */ #endif diff --git a/tests/insert-2-test.c b/tests/insert-2-test.c index 8a089dd..7fa5314 100644 --- a/tests/insert-2-test.c +++ b/tests/insert-2-test.c @@ -24,6 +24,15 @@ char* strdup(const char*); #endif +// Maximum word length is 20 characters. +#define MAX_WORD_LENGTH 20 + +inline size_t strlen_n(const char * pstr) { + size_t len = strlen(pstr); + if (len > MAX_WORD_LENGTH) len = MAX_WORD_LENGTH; + return len; +} + char * words[] = { // Each row is ten words. "the", "of", "and", "a", "to", "in", "is", "you", "that", "it", @@ -44,7 +53,7 @@ char * num_to_word(size_t num) { while (num > 0) { size_t next = num % num_words; num = num / num_words; - size_t newlen = strlen(retval) + strlen(words[next]) + 2; + size_t newlen = strlen_n(retval) + strlen_n(words[next]) + 2; char * newretval = (char *)malloc(sizeof(char) * newlen); strcpy(newretval, retval); strcat(newretval, " "); @@ -78,7 +87,7 @@ START_ITEM(fill) // These go in the table, so we do not deallocate them. he4_key_t key = (he4_key_t) num_to_word(index); he4_entry_t entry = (he4_entry_t) num_to_word(index + 7); - if (he4_insert(table, key, strlen(key), entry)) { + if (he4_insert(table, key, strlen_n(key), entry)) { FAIL_TEST("insertion at key: %s", key); } ASSERT(he4_capacity(table) == 1024); IF_FAIL_STOP; @@ -96,7 +105,7 @@ START_ITEM(verify_full) he4_entry_t entry = num_to_word(index +7); // This is a pointer to the entry in the table, so it must not be // deallocated. - he4_entry_t * pentry = he4_find(table, key, strlen(key)); + he4_entry_t * pentry = he4_find(table, key, strlen_n(key)); if (pentry == NULL) { FAIL("missing entry for key: %s", key); free(key); @@ -104,7 +113,7 @@ START_ITEM(verify_full) IF_FAIL_END_ITEM; } ASSERT(strcmp(*pentry, entry) == 0); - he4_entry_t entry2 = he4_get(table, key, strlen(key)); + he4_entry_t entry2 = he4_get(table, key, strlen_n(key)); ASSERT(strcmp(entry, entry2) == 0); free(key); free(entry); @@ -118,7 +127,7 @@ START_ITEM(insert) // These do not go in the table, so they must be deallocated. he4_key_t key = num_to_word(index); he4_entry_t entry = num_to_word(index +7); - if (!he4_insert(table, key, strlen(key), entry)) { + if (!he4_insert(table, key, strlen_n(key), entry)) { // Do not deallocate, since this made it into the table. FAIL("inserted at index: %zu", index); free(key); @@ -140,14 +149,14 @@ START_ITEM(force) // These go in the table, so we do not deallocate them. he4_key_t key = (he4_key_t) num_to_word(index); he4_entry_t entry = (he4_entry_t) num_to_word(index + 7); - if (!he4_force_insert(table, key, strlen(key), entry)) { + if (!he4_force_insert(table, key, strlen_n(key), entry)) { // Since these did not go in the table, deallocate them. FAIL("insertion at index: %zu", index); free(key); free(entry); IF_FAIL_END_ITEM; } - he4_entry_t entry2 = he4_get(table, key, strlen(key)); + he4_entry_t entry2 = he4_get(table, key, strlen_n(key)); if (strcmp(entry2, entry) != 0) { FAIL_ITEM("missing forced entry for key: %s", key); }