Skip to content

Errors in the test dataset? #28

@caicre

Description

@caicre

For example, this:

{
  "query": "Find all text files in the testbed directory and subdirectories and concatenate them into a single file",
  "gold": "find /testbed -type f -name '*.txt' -exec cat {} ; > /testbed/concatenated_text_files.txt"
}

The ; is not escaped, so the shell treats it as a command separator and find fails with -exec requires an argument. It should be written as \; or ';'. Also, the output file is written into /testbed itself, which means find will likely pick it up and append it to itself.

The evaluation just re-runs the same gold command and compares outputs. So two errors were marked as correct.

There are several other queries with similar issues. One common problem is that many queries don’t specify an output path, while the gold command assumes one. So the solver has to guess the output path and usually fails. For example, this.

Another issue is this:

{
  "query": "List all files in the directory /testbed/dir1 and sort them by size in human-readable format",
  "gold": "ls -lhS /testbed/dir1"
}

The evaluation script will start two containers and run the gold command to compare the stdout with the solution. The two containers are initialized at different times, so the two outputs will not match even if you provide the correct answer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions