Switch filter and verify function

Currently using a Scalable bloom filter from `github.com/tylertreat/BoomFilters` but suspect it might not be performing optimally/correctly?
- Amount of hash functions doesn't seem to fit with [optimal amount](https://sagi.io/bloom-filters-for-the-perplexed/#arbitrarily-small-probability-of-error-false-positive)
- Amount of `cells` doesn't seem to fit with optimal amount (see above)
- Default hash function used [results in inaccurate false probability rate](https://github.com/tylertreat/BoomFilters/pull/1)
- Repo seemed to be a little messy in general: some models not documented, models missing common features found in other models, unhandled errors and use of `unsafe` (impressions from the time I wrote feedloggr v3 and first found `BoomFilters`)

Might want to try to implement a ~~Stable~~ Scalable Bloom filter myself instead:
- Stable Bloom won't keep growing forever (sort of deletes items from itself to stay with a fixed size)
- Easier to track, verify and document correctness
- Performance less relevant vs correctness
- One less unknown dependency

Or there might be another filter model that fits better? Some kind of Cuckoo filter?
Think I found the original paper for a scalable bloom filter though, doi:10.1016/j.ipl.2006.10.007

Investigate:
- https://github.com/bits-and-blooms/bloom

References:
- https://en.wikipedia.org/wiki/Bloom_filter
- https://ricardoanderegg.com/posts/understanding-bloom-filters/
- https://sagi.io/bloom-filters-for-the-perplexed/
- http://webdocs.cs.ualberta.ca/~drafiei/papers/DupDet06Sigmod.pdf
- https://bdupras.github.io/filter-tutorial/
- https://llimllib.github.io/bloomfilter-tutorial/
- https://www.waitingforcode.com/big-data-algorithms/stable-bloom-filter/read
- https://www.geeksforgeeks.org/bloom-filters-introduction-and-python-implementation/
- https://blog.cloudflare.com/when-bloom-filters-dont-bloom/
- https://news.ycombinator.com/item?id=33318788
- https://austinhenley.com/blog/challengingalgorithms.html seems to have some good examples? (also might want to save the article itself)
- https://gopiandcode.uk/logs/log-bloomfilters-debunked.html seems to have "the most correct formula" for calculating false positive rates?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch filter and verify function #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Switch filter and verify function #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions