Files for reproduction are missing

I would like to firstly express my appreciation for this remarkable work, however, I encountered some problems when trying to reproduce the experiment. The details are as follows:

When I ran the code at the first time, I found that the example file in **prompt_tuning.sh** was missing.

<img width="380" alt="pt" src="https://github.com/LogIntelligence/LogPPT/assets/140611148/fe94e0ac-333a-46f7-b617-4eca6b9e26e2">

Then, I followed the **Usage** section in readme.md and substituted the log files with "${dataset}_2k.log_structured.csv" files. However, the result files contain 2000 lines of parsed logs, meaning that the model has parsed all the log messages in the "2k.log_structured.csv" files for benchmarking. 

Since the 32-shot examples for model training are also selected from the 2k log files of each dataset (as shown in the **few_shot_sampling.py**):

<img width="485" alt="shot1" src="https://github.com/LogIntelligence/LogPPT/assets/140611148/3b25463e-c699-4a25-9dc3-cee5519c9624">

<img width="596" alt="shot2" src="https://github.com/LogIntelligence/LogPPT/assets/140611148/03802a2c-abcf-4f57-95a7-4de746a087c6">

I consider that using the "${dataset}_2k.log_structured.csv" data files could probably lead to the data leakage. Therefore, could you upload the example files for log parsing in your previous experiments so that I can better reproduce the results? Thanks for your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files for reproduction are missing #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Files for reproduction are missing #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions