Hey, this is a bit unrelated, but i needed some help in running bff
I am trying to run this command
cargo run --release bff
--inputs /data/inputs
--output-directory /data/outputs
--expected-ngram-count
--fp-rate 0.01
--min-ngram-size 13
--max-ngram-size 13
--filtering-threshold 0.8
--remove-type old-both
--annotate
But im not sure what should be the format of my inputs ?
I think it should be json.gz files ?
So i just tried to test with a single input file named test.json.gz in the command
And in this file i have kept only 2 entries . Second entries "text" field is just the first 1000 characters of the "text" field of the First entry.
When i run the command, i get the following output
Creating new bloom filter...
Bloom filter has size 120 B | FP Rate 0.009989874381662656
Files 0/1 [00:00:00/00:00:00] [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░]Completed setup phase in 0 seconds
thread '' (2463081) panicked at src/main.rs:1556:50:
called Option::unwrap() on a None value
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Completed filtering all files in 0 seconds
After running, BFF sparsity was 0.051041666666666666
Completed full BFF run in 0 seconds
Stats: Saw 0 B of text | Removed NaN of them
Why is this happening ? Am I missing something ?
Hey, this is a bit unrelated, but i needed some help in running bff
I am trying to run this command
cargo run --release bff
--inputs /data/inputs
--output-directory /data/outputs
--expected-ngram-count
--fp-rate 0.01
--min-ngram-size 13
--max-ngram-size 13
--filtering-threshold 0.8
--remove-type old-both
--annotate
But im not sure what should be the format of my inputs ?
I think it should be json.gz files ?
So i just tried to test with a single input file named test.json.gz in the command
And in this file i have kept only 2 entries . Second entries "text" field is just the first 1000 characters of the "text" field of the First entry.
When i run the command, i get the following output
Creating new bloom filter...
Bloom filter has size 120 B | FP Rate 0.009989874381662656
Files 0/1 [00:00:00/00:00:00] [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░]Completed setup phase in 0 seconds
thread '' (2463081) panicked at src/main.rs:1556:50:
called
Option::unwrap()on aNonevaluenote: run with
RUST_BACKTRACE=1environment variable to display a backtraceCompleted filtering all files in 0 seconds
After running, BFF sparsity was 0.051041666666666666
Completed full BFF run in 0 seconds
Stats: Saw 0 B of text | Removed NaN of them
Why is this happening ? Am I missing something ?