Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 29 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,33 +19,35 @@ Both HyperLogLog and MinHash require a precision
parameter. Basic guidelines are available as follows,
and `HLLCounter.MIN_P = 4 <= p <= 18 = HLLCounter.MAX_P`.

####HyperLogLog p @ 99.7% Confidence
p | Relative Error
---:|---:
4 | 75%
5 | 65%
6 | 47%
7 | 32%
8 | 23%
9 | 16%
10 | 10%
11 | 8%
12 | 5%
13 | 4%
14 | 2.5%
15 | 2%
16 | 1.3%
17 | 1%
18 | 0.7%
#### HyperLogLog p @ 99.7% Confidence

####MinHash k @ 99% Confidence
**Relative Error** | **Intersection Size -->** | | | | *
:------------------|--------------------------:|-------:|-----:|------:|-----:
- | 0.01% | 0.1% |1.0% | 5.0% |10.0%
100% | 90000 | 9000 |900 | 170 |75
50% | 313334 | 31334 |3134 | 587 |280
25% | - | 116800 |11520 | 2208 |1040
10% | - | - |68455 | 13128 |6210
|p | Relative Error|
|---:|---:|
|4 | 75%|
|5 | 65%|
|6 | 47%|
|7 | 32%|
|8 | 23%|
|9 | 16%|
|10 | 10%|
|11 | 8%|
|12 | 5%|
|13 | 4%|
|14 | 2.5%|
|15 | 2%|
|16 | 1.3%|
|17 | 1%|
|18 | 0.7%|

#### MinHash k @ 99% Confidence

|**Relative Error** | **Intersection Size -->** | | | | * |
|:------------------|--------------------------:|-------:|-----:|------:|-----:|
|- | 0.01% | 0.1% |1.0% | 5.0% |10.0%|
|100% | 90000 | 9000 |900 | 170 |75|
|50% | 313334 | 31334 |3134 | 587 |280|
|25% | - | 116800 |11520 | 2208 |1040|
|10% | - | - |68455 | 13128 |6210|

This MinHash k table can be generated by using `minhash_k.py` in the `utils`
directory. For now, the only requirement is scipy, which you can install with
Expand All @@ -63,4 +65,4 @@ MinHash k: 4800
Error at k: 0.25
```

Additional information is available with `./utils/minhash_k.py --help`.
Additional information is available with `./utils/minhash_k.py --help`.