Skip to content

Running on test json.gz file #118

@Aditya-Manjunatha

Description

@Aditya-Manjunatha

Hey, this is a bit unrelated, but i needed some help in running bff
I am trying to run this command
cargo run --release bff
--inputs /data/inputs
--output-directory /data/outputs
--expected-ngram-count
--fp-rate 0.01
--min-ngram-size 13
--max-ngram-size 13
--filtering-threshold 0.8
--remove-type old-both
--annotate

But im not sure what should be the format of my inputs ?
I think it should be json.gz files ?

So i just tried to test with a single input file named test.json.gz in the command

And in this file i have kept only 2 entries . Second entries "text" field is just the first 1000 characters of the "text" field of the First entry.

When i run the command, i get the following output
Creating new bloom filter...
Bloom filter has size 120 B | FP Rate 0.009989874381662656
Files 0/1 [00:00:00/00:00:00] [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░]Completed setup phase in 0 seconds

thread '' (2463081) panicked at src/main.rs:1556:50:
called Option::unwrap() on a None value
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Completed filtering all files in 0 seconds
After running, BFF sparsity was 0.051041666666666666
Completed full BFF run in 0 seconds
Stats: Saw 0 B of text | Removed NaN of them

Why is this happening ? Am I missing something ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions