A very basic Bloom filter implementation in Elisp.
From wikipedia 1:
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether element is a member of a set. False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate. In other words, a query returns either "possibly in set" or "definitely not in set".
I wrote it to practice Elisp and cement the notion of this filter in my overflowing lizard brain.
The small test suite passes. But I (nor anybody else AFAIK) haz not used this for a real app yet.
Needs more hash functions for real use. Only has 2 now, sxhash and one based on Jenkins hash.
Hash functions, number of hash functions, and size of the filter should be taken into consideration. 2
;; List of hash functions. See below [?] for refs to which
;; ones are best.
(defvar bfilter-hashers '(sxhash; shipped w/emacs
(lambda (in)
(bfilter--jenkins-hash in (length in)))))
;; The bit vector data structure size.
(defvar bfilter-bv-size 1024)The included el implementation results don't match the one in the C version on the wikipedia page (but is probably ok for non-production use) but the Emacs 25 dynamic module feature can solve it.3
;; Construct/return bloom filter of size bfilter-size.
(bfilter-init)
;; Given a string key INPUT, set the appro bloom filter bit vector BF slots.
;; Return the slot indexes set (can be ignored).
(bfilter-set input-key bf)
;; Given key as INPUT, return t if all associated hashed slots in BF equal t, else nil.
(bfilter-isset? input-key bf)(setq my-example-app-page-bf (bfilter-init))
(-let ((user "bob.dobbs@backleft.cult"))
;; haz user seen this page?
(if (not (bfilter-isset? user my-example-app-page-bf))
(progn
(message "do something useful here")
(bfilter-set user my-example-app-page-bf))))The tests can be run like so:
CASK_EMACS=/usr/local/Cellar/emacs/25.0.94/bin/emacs cask emacs --batch -l ert --script test/test-bfilter.el -f ert-run-tests-batch-and-exitGNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007