I recently learn that the TEI XML format is becoming popular in the linguistics community. In this format, texts are saved in small chunks with associated meta information (e.g. speaker), and, sometime, POS tags.
See:
https://tei-c.org/
https://tei-c.org/activities/projects/
https://dracor.org/