-
Notifications
You must be signed in to change notification settings - Fork 49
Description
For example, this:
{
"query": "Find all text files in the testbed directory and subdirectories and concatenate them into a single file",
"gold": "find /testbed -type f -name '*.txt' -exec cat {} ; > /testbed/concatenated_text_files.txt"
}
The ; is not escaped, so the shell treats it as a command separator and find fails with -exec requires an argument. It should be written as \; or ';'. Also, the output file is written into /testbed itself, which means find will likely pick it up and append it to itself.
The evaluation just re-runs the same gold command and compares outputs. So two errors were marked as correct.
There are several other queries with similar issues. One common problem is that many queries don’t specify an output path, while the gold command assumes one. So the solver has to guess the output path and usually fails. For example, this.
Another issue is this:
{
"query": "List all files in the directory /testbed/dir1 and sort them by size in human-readable format",
"gold": "ls -lhS /testbed/dir1"
}
The evaluation script will start two containers and run the gold command to compare the stdout with the solution. The two containers are initialized at different times, so the two outputs will not match even if you provide the correct answer.