This repository includes automated validation for all JSON files in the data/ directory to ensure they follow the expected structure.
The validator recognizes two types of JSON structures:
Used for individual character voicelines.
{
"voiceline_id": "string",
"timestamp": "string",
"segments": [
{
"start": number,
"end": number,
"text": "string",
"part": number
}
]
}Used for basic transcription files.
{
"file": "string",
"segments": [
{
"start": number,
"end": number,
"text": "string"
}
]
}To validate all JSON files in the data/ directory:
python3 validate_json.pyTo validate JSON files in a different directory:
python3 validate_json.py /path/to/data/directoryThe validation runs automatically on:
- Every push to the
mainormasterbranch that modifies JSON files - Every pull request that modifies JSON files
The workflow will fail if any JSON file:
- Is not valid JSON
- Does not match one of the three expected structures
- Has missing required fields
- Has fields with incorrect data types
When adding or modifying JSON files, make sure to:
- Follow one of the three supported structures above
- Ensure all required fields are present
- Use the correct data types for all fields
- Run
python3 validate_json.pylocally before submitting a PR
If the validation fails in CI, the error message will indicate which file(s) have issues and what needs to be fixed.