Cannot detect accurately text/plain. Tika (see branch) would perform better, but slower (38 sec vs. 6 sec)