Skip to content

carypenn/Bloomfilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bloomfilter

This is an implementation of a Bloom Filter for Ruby.

In order to confirm the false positives, the filter includes an array of the words in the file. This is not needed by the algorithm and does effect the space requirements. It can easily removed to make this implementation more stream-lined.

Initial implementation used 4 keys per value using a SHA2 digest and spliting it into 4 16-char strings and using each of those for a k. This has been tested using the wordlist file included with 10,000,000 5-char strings and has returned 0 false positives.

Test results are as follows:

94 positive searches out of 100000 0 false positives values out of 100000

920 positive searches out of 1000000 0 false positives values out of 1000000

9014 positive searches out of 10000000 0 false positives values out of 10000000

This is expected since the hash is not constrained thus is can grow to have as many entries as needed.

A second attempt introduced a BitArray which constrains the size of the hash and means that all keys MUST be an integer. This restriction meant that more false positives would be expected but we should get much faster performance. This proved to be true if the size of the BitArray was smaller than the size of the list of words. Once these reversed, the results were very positive for this implementation and it is much faster than the previous one.

Results:

BloomFilter > ruby bloom_filter_test.rb Array Size = 100000 5807 positive searches out of 10000 5796 false positives values out of 10000

Array Size = 100000000 13 positive searches out of 10000 0 false positives values out of 10000

Array Size = 2000000 925 positive searches out of 10000 918 false positives values out of 10000

Array Size = 20000000 6 positive searches out of 10000 0 false positives values out of 10000

Requirement: BitArray gem from https://github.com/peterc/bitarray

Description: Initiazer takes a path to the work library you want to load in and a size for the Bit Array then calling #contains? with an agrument of the value you want to check for will tests for that items inclusion in the filter.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages