Skip to content

demo/tinytettuce.ipynb errors #29

@FrejaThoresen

Description

@FrejaThoresen

Hi,

After running the notebook, demo/tinytettuce.ipynb creates a dataset train_data.json. However, if I run the following code as described in the notebook

python scripts/train.py \
    --ragtruth-path train_data.json \
    --model-name jhu-clsp/ettin-encoder-68m \
    --output-dir output/hallucination_detector \
    --batch-size 4 \
    --epochs 6 \
    --learning-rate 1e-5

I get the error

Traceback (most recent call last):
  File "LettuceDetect/scripts/train.py", line 155, in <module>
    main()
  File "LettuceDetect/scripts/train.py", line 98, in main
    ragtruth_data = HallucinationData.from_json(json.loads(ragtruth_path.read_text()))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "LettuceDetect/lettucedetect/datasets/hallucination_dataset.py", line 53, in from_json
    samples=[HallucinationSample.from_json(sample) for sample in json_dict],
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "LettuceDetect/lettucedetect/datasets/hallucination_dataset.py", line 38, in from_json
    dataset=json_dict["dataset"],
            ~~~~~~~~~^^^^^^^^^^^
KeyError: 'dataset'

And if I add the key 'dataset', a new error arise with missing key 'language'. If 'language' is also added to the train_data, a new error arise

  File "LettuceDetect/scripts/train.py", line 155, in <module>
    main()
  File "LettuceDetect/scripts/train.py", line 124, in main
    train_loader = DataLoader(
                   ^^^^^^^^^^^
  File "LettuceDetect/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 388, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore[arg-type]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "LettuceDetect/.venv/lib/python3.12/site-packages/torch/utils/data/sampler.py", line 156, in __init__
    raise ValueError(

Do you have an updated working example with tinylettuce?

lettucedetect 0.1.8
python 3.12.3

Cheers,
Freja

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions