Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 13 additions & 140 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,21 @@
Changes made by Andrew Rossman and Co.

Below are the relevant instructions on building the modified Kuku lib on your PC. You will need to use:
cmake -S . -B build -DKUKU_BUILD_EXAMPLES=ON
cmake --build build

Once that is complete, you can run our example file using this command:
./build/bin/kukuexamples

This example file will run the experiment of table 13 with times and fill rates reported to stdout. It will take some time to complete.

Below are instructions written by the microsoft research team for table installation, you may find some of it useful

# Kuku

Kuku is a simple open-source ([MIT licensed](LICENSE)) cuckoo hashing library developed by the Cryptography and Privacy Research Group at Microsoft.
Kuku is written in modern standard C++ and has no external dependencies, making it easy to compile and run in many different environments.

## Contents
- [Getting Started](#getting-started)
- [Cuckoo Hashing](#cuckoo-hashing)
- [Kuku](#kuku-1)
- [Installing from NuGet Package](#installing-from-nuget-package-windows-linux-macos)
- [Building Kuku Manually](#building-kuku-manually)
- [Building C++ Components](#building-c-components)
- [Building Kuku](#building-kuku)
- [Installing Kuku](#installing-kuku)
- [Building and Installing on Windows](#building-and-installing-on-windows)
- [CMake Options](#cmake-options)
- [Linking with Kuku through CMake](#linking-with-kuku-through-cmake)
- [Building .NET Components](#building-net-components)
- [Windows, Linux, and macOS](#windows-linux-and-macos)
- [Using Kuku for .NET](#using-kuku-for-net)
- [Building Your Own NuGet Package](#building-your-own-nuget-package)
- [Using Kuku](#using-kuku)
- [Contributing](#contributing)

## Getting Started

### Cuckoo Hashing

[Cuckoo hashing](https://en.wikipedia.org/wiki/Cuckoo_hashing) is a hashing technique that can achieve very high fill rates, and in particular create efficient hash tables with a single item per bin.
This is achieved by using multiple (often 2, 3, or 4) different hash functions as follows:
1. Denote the hash functions `H_1`, `H_2`, ..., `H_k`.
1. When an item `X` is to be inserted, choose one of the hash functions, `H_j`, and
check whether the corresponding bin is empty.
If it is empty, insert `X` in the bin denoted by `H_j(X)`, and return `true`.
Otherwise, remove the existing value, `Y`, from the bin denoted by `H_j(X)`, and insert X in its place.
Repeat the process for the item `Y`.
1. If the process fails to terminate after a pre-determined number of attempts,
place the leftover item in a stash of a pre-determined maximum size, and return `true`.
1. If the stash had already reached its maximum size, store the leftover item into
a known location and return `false`.

To check whether an item `Z` is in the hash table, it is necessary to check all possible locations, i.e., `H_1(Z)`, `H_2(Z)`, ..., `H_k(Z)` for its presence, as well as the stash.
It is not necessary to use a stash at all, in which case the stash would have size zero and obviously would not need to be checked.

### Kuku

Kuku is a minimalistic library that enables a certain variant of cuckoo hashing, as described above.
It uses [tabulation hashing](https://en.wikipedia.org/wiki/Tabulation_hashing) for the hash functions.
The item length in Kuku is exactly 128 bits and cannot be increased; however, longer items can always be hashed to 128 bits using some other hash function that accepts arbitrary length inputs, and the outputs can subsequently be used in Kuku.

### Installing from NuGet Package (Windows, Linux, macOS)

For .NET developers the easiest way of installing Kuku is by using the multiplatform NuGet package available at [NuGet.org](https://www.nuget.org/packages/Microsoft.Research.Kuku).
Simply add this package into your .NET project as a dependency and you are ready to go.

## Building Kuku Manually

### Building C++ Components
Expand Down Expand Up @@ -162,93 +125,3 @@ cmake . -DCMAKE_PREFIX_PATH=~/mylibs
```

If Kuku was installed using a package manager like vcpkg or Homebrew, please refer to their documentation for how to link with the installed library. For example, vcpkg requires you to specify the vcpkg CMake toolchain file when configuring your project.

### Building .NET Components

Kuku provides a .NET Standard library that wraps the functionality in Kuku for use in .NET development.
Using the existing [NuGet package](https://www.nuget.org/packages/Microsoft.Research.Kuku) is highly recommended, unless development of Kuku or building a custom NuGet package is intended.
Prior to building .NET components, the C wrapper library Kuku_C must be built following [Building C++ Components](#building-c-components).
The Kuku_C library is meant to be used only by the .NET library, not by end-users.

**Note**: Kuku_C and the .NET library only support 64-bit platforms.

#### Windows, Linux, and macOS

For compiling .NET code you will need to install a [.NET Core SDK (>= 3.1)](https://dotnet.microsoft.com/download).
Building the Kuku_C library with CMake will generate project files for the .NET wrapper library, examples, and unit tests.
The Kuku_C library must be discoverable when running a .NET application, e.g., be present in the same directory as your executable, which is taken care of by the .NET examples and tests project files.
Run the following scripts to build each project:

```PowerShell
dotnet build dotnet/src --configuration <Debug|Release> # Build .NET wrapper library
dotnet test dotnet/tests # Build and run .NET unit tests
dotnet run -p dotnet/examples # Build and run .NET examples
```

You can use `--configuration <Debug|Release>` to run `Debug` or `Release` examples and unit tests.
You can use `--verbosity detailed` to print the list of unit tests that are being run.

On Windows, you can also use the Microsoft Visual Studio 2019 solution file `dotnet/KukuNet.sln` to build all three projects.

#### Using Kuku for .NET

To use Kuku for .NET in your own application you need to:

1. Add a reference in your project to `KukuNet.dll`;
1. Ensure the native shared library is available for your application when run.
The easiest way to ensure this is to copy the native shared library to the same directory where your application's executable is located.

#### Building Your Own NuGet Package

You can build your own NuGet package for Kuku by following the instructions in [NUGET.md](dotnet/nuget/NUGET.md).

## Using Kuku

### C++
The cuckoo hash table is represented by an instance of the `KukuTable` class. The
constructor of `KukuTable` takes as input the size of the hash table (`table_size`),
the size of the stash (`stash_size`), the number of hash functions (`loc_func_count`),
a seed for the hash functions (`loc_func_seed`), the number of iterations allowed in
the insertion process, and a value the hash table should contain to signal an empty
slot (`empty_item`). The hash tables item are restricted to 128-bit integer data types
(`item_type`). These can be created from a pair of 64-bit integers using the `make_item`
function.

Once the table has been created, items can be inserted using the member function `insert`.
Items can be queried with the member function `query`, which returns a `QueryResult`
object. The `QueryResult` contains information about the location in the `KukuTable` where
the queried item was found, as well as the hash function that was used to eventually insert
it. `QueryResult` has an `operator bool()` defined which returns whether the queried item
was found in the hash table.

If Kuku fails to insert an item to the table or to the stash, the `insert` function will
return false, and a leftover item will be stored in a member variable that can be read with
`leftover_item()`. The same item cannot be inserted multiple times: `insert` will return
`false` in this case.

### .NET

Much like in the native library, the cuckoo hash table is represented by an instance of the
`KukuTable` class. The constructor of `KukuTable` takes as input a set of parameters,
defined by the `KukuTableParameters` class. The parameters contain the table size
`(TableSize`), the size of the stash (`StashSize`), the number of hash functions
(`LocFuncCount`), a seed for the hash functions (`LocFuncSeed`), the number of iterations
allowed in the insertion process, and a value the hash table should contain to signal
an empty slot (`EmptyItem`). The hash tables items are restricted to 128-bit integer data
types. These can be created from an array of size 2 of 64-bit integers by instantiating
the `Item` class and setting its `Data` property with a `ulong` array of size 2.

Once the table has been created, items can be inserted using the member function `Insert`.
Items can be queried with the member function `Query`, which returns a `QueryResult`
object. The `QueryResult` contains information about whether the queried item was
found in the hash table, the location where it was found, as well as the hash function that
was used to eventually insert it.

If `KukuTable.Insert` fails to insert an item to the table or to the stash, it will
return `false`, and a leftover item will be stored in a member variable that can be read
with `KukuTable.LastInsertFailItem()`. The same item cannot be inserted multiple times:
`Insert` will return `false` in this case.

## Contributing

For contributing to Kuku, please see [CONTRIBUTING.md](CONTRIBUTING.md).
146 changes: 76 additions & 70 deletions examples/example.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
#include <iomanip>
#include <iostream>
#include <kuku/kuku.h>
#include <sys/time.h>
#include<array>

using namespace std;
using namespace kuku;
Expand All @@ -14,7 +16,7 @@ ostream &operator<<(ostream &stream, item_type item)
stream << item[1] << " " << item[0];
return stream;
}

//Print table command borrowed from microsoft's example file
void print_table(const KukuTable &table)
{
table_size_type col_count = 8;
Expand All @@ -25,84 +27,88 @@ void print_table(const KukuTable &table)
<< i << ": " << setw(5) << get_high_word(item) << "," << get_low_word(item)
<< ((i % col_count == col_count - 1) ? "\n" : "\t");
}

cout << endl << endl << "Stash: " << endl;
for (table_size_type i = 0; i < table.stash().size(); i++)
{
const auto &item = table.stash(i);
cout << i << ": " << get_high_word(item) << "," << get_low_word(item) << endl;
//our examples will likely not use a stash
if(table.stash_size() > 0) {
cout << endl << endl << "Stash: " << endl;
for (table_size_type i = 0; i < table.stash().size(); i++)
{
const auto &item = table.stash(i);
cout << i << ": " << get_high_word(item) << "," << get_low_word(item) << endl;
}
cout << endl;
}
cout << endl;
}

double get_fill_rate(table_size_type, table_size_type, table_size_type, uint8_t, uint64_t, uint64_t, bool);

int main(int argc, char *argv[])
{
if (argc != 5)
{
cout << "Usage: ./example table_size stash_size loc_func_count max_probe" << endl;
cout << "E.g., ./example 256 2 4 100" << endl;

return 0;
}

auto table_size = static_cast<table_size_type>(atoi(argv[1]));
auto stash_size = static_cast<table_size_type>(atoi(argv[2]));
uint8_t loc_func_count = static_cast<uint8_t>(atoi(argv[3]));
item_type loc_func_seed = make_random_item();
uint64_t max_probe = static_cast<uint64_t>(atoi(argv[4]));
item_type empty_item = make_item(0, 0);

KukuTable table(table_size, stash_size, loc_func_count, loc_func_seed, max_probe, empty_item);

uint64_t round_counter = 0;
while (true)
{
cout << "Inserted " << round_counter * 20 << " items" << endl;
cout << "Fill rate: " << table.fill_rate() << endl;

char c;
cin.get(c);

for (uint64_t i = 0; i < 20; i++)
{
if (!table.insert(make_item(i + 1, round_counter + 1)))
{
cout << "Insertion failed: round_counter = " << round_counter << ", i = " << i << endl;
cout << "Inserted successfully " << round_counter * 20 + i << " items" << endl;
cout << "Fill rate: " << table.fill_rate() << endl;
const auto &item = table.leftover_item();
cout << "Leftover item: " << get_high_word(item) << "," << get_low_word(item) << endl << endl;
break;
}
}

print_table(table);

if (!table.is_empty_item(table.leftover_item()))
{
break;
/* Tables will have the following properties
* 1000000 available indexes
* no stash allowed
* bucketSizes between 1 and 5
* between 2 and 5 hash functions
* a limit of 100 swaps before the table declares an error, this results in
* rehashing
*/
table_size_type tableSizes = 4000000;
table_size_type stashSizes = 0;
table_size_type bucketSizes[] = {1, 2, 3, 4, 5, 6, 7, 8};
uint8_t hashFunctions[] = {2, 3, 4, 5, 6, 7, 8};
uint64_t swapLimit = 10;
uint64_t insertions = 4000000;

const int bucketListLength = sizeof(bucketSizes) / sizeof(bucketSizes[0]), hashListLength = sizeof(hashFunctions) / sizeof(hashFunctions[0]);
double fillRates[bucketListLength][hashListLength];
/*
cout << "Do not end after failure" << endl;
for(int bucketIndex = 0; bucketIndex < bucketListLength; ++bucketIndex) {
for(int hashIndex = 0; hashIndex < hashListLength; ++hashIndex) {
fillRates[bucketIndex][hashIndex] = get_fill_rate(tableSizes, stashSizes, bucketSizes[bucketIndex], hashFunctions[hashIndex], swapLimit, insertions, false);
}

round_counter++;
cout<< endl;
}

while (true)
{
cout << "Query item: ";
char hw[64];
char lw[64];
cin.getline(hw, 10, ',');
cin.getline(lw, 10, '\n');
item_type item = make_item(static_cast<uint64_t>(atoi(lw)), static_cast<uint64_t>(atoi(hw)));
QueryResult res = table.query(item);
cout << "Found: " << boolalpha << !!res << endl;
if (res)
{
cout << "Location: " << res.location() << endl;
cout << "In stash: " << boolalpha << res.in_stash() << endl;
cout << "Hash function index: " << res.loc_func_index() << endl << endl;
*/
cout << "End after need for rehash" << endl;
for(int bucketIndex = 0; bucketIndex < bucketListLength; ++bucketIndex) {
for(int hashIndex = 0; hashIndex < hashListLength; ++hashIndex) {
fillRates[bucketIndex][hashIndex] = get_fill_rate(tableSizes, stashSizes, bucketSizes[bucketIndex], hashFunctions[hashIndex], swapLimit, insertions, true);
}
cout<< endl;
}

return 0;
}
//creates and outputs various data on the hash table
double get_fill_rate(
table_size_type table_size,
table_size_type stash_size,
table_size_type bucketSize,
uint8_t loc_func_count,
uint64_t max_probe,
uint64_t insertions,
bool endOnFail
) {
item_type loc_func_seed = make_random_item();
item_type empty_item = make_item(0, 0);

KukuTable table(table_size, stash_size, loc_func_count, loc_func_seed, max_probe, empty_item, bucketSize);

uint64_t insertions_failed = 0, inserted;
struct timeval time;
gettimeofday(&time, NULL);
double startTime = (double) time.tv_sec + (double) time.tv_usec * 0.000001;
for(inserted = 0; inserted < insertions; ++inserted) {
if (!table.insert(make_item(inserted + 1, (uint64_t) rand() ))) {
insertions_failed++;
if(endOnFail) break;
else continue;
}

}
gettimeofday(&time, NULL);
double totalTime = ((double) time.tv_sec + (double) time.tv_usec * 0.000001) - startTime;
cout << "Bucket Size : " << bucketSize << ", Hash Count : " << (int) loc_func_count << ", Fill Rates : " << table.fill_rate() << ", Percent Inserted : " << (double) inserted / insertions <<", Wall Time : " << totalTime << endl;
//if(loc_func_count == 2 && bucketSize == 3) {print_table(table);};
return table.fill_rate();
}
Loading