Errors in the test dataset?


For example, [this](https://github.com/princeton-nlp/intercode/blob/master/data/nl2bash/test_queries.json#L7-L8):

```
{
  "query": "Find all text files in the testbed directory and subdirectories and concatenate them into a single file",
  "gold": "find /testbed -type f -name '*.txt' -exec cat {} ; > /testbed/concatenated_text_files.txt"
}
```

The `;` is not escaped, so the shell treats it as a command separator and `find` fails with `-exec requires an argument`. It should be written as `\;` or `';'`. Also, the output file is written into `/testbed` itself, which means `find` will likely pick it up and append it to itself.

The evaluation just re-runs the same gold command and compares outputs. So two errors were marked as correct.

There are several other queries with similar issues. One common problem is that many queries don’t specify an output path, while the gold command assumes one. So the solver has to guess the output path and usually fails. For example, [this](https://github.com/princeton-nlp/intercode/blob/master/data/nl2bash/test_queries.json#L15-L16).

Another issue is [this](https://github.com/princeton-nlp/intercode/blob/master/data/nl2bash/test_queries.json#L35-L36):

```
{
  "query": "List all files in the directory /testbed/dir1 and sort them by size in human-readable format",
  "gold": "ls -lhS /testbed/dir1"
}
```

The evaluation script will start two containers and run the gold command to compare the stdout with the solution. The two containers are initialized at different times, so the two outputs will not match even if you provide the correct answer.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Errors in the test dataset? #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Errors in the test dataset? #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions