-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
Description
The CodeBert-Preprocessor fails to preprocess javafiles containing characters starting with "",
such as "\u00".
The resulting jsonl has an unescaped "" and fails to be parsed.
To Reproduce
- Move in the CodeBert Preprocessing Folder
- Add the content of error.txt to the example java file
- Run the Preprocessing on the example java file using the docker-compose
- Inspect the altered_java.jsonl for \u00 characters
Expected behavior
The Character should be properly escaped as \u00.
In any way, the resulting json must be correct.
Additional context
This was needed for the GridExperiment, and has been currently addressed by removing the 3 datapoints that have a \u in them from the test-data.