I am a little unclear on the recommended workflow for generating simplitigs from a bacterial pangenome.
As pangenomes aim to encompass all potential variants providing multiple input files via "-i" is probably unsuitable as the intersection would only include core K-mers (in all isolates). Is this correct? If that is the case, should the workflow be:
a) Concatenate all files (reads/assemblies separately) into a single fasta file and then process the concatenate with prophasm (might incur high memory overhead)
or
b) Compute simplitigs for each file and then concatenate the simplitig files together before running prophasm again? (lower memory usage)
Are there any other methodological considerations for these approaches that I have overlooked?
Thank you for your help,
Sion