would be much better than having 12M files...
I will probably try this as the number of files is a problem for me with cc12m (writing the 12M captions only takes 10min), so scaling to larger number of files simply won't work in this state
I might write a generic downloader in the process