It's said in the MSAGPT Paper that about 200 sequence whose MSA fewer than 20 are used as test-set, however, despite those shown in Appendix, I'm wondering what are they. Since I've got large MSA group for each sequence from CAMEO & CASP dataset. This problem was mainly due to searching&selecting method of ground truth sequence families, I hope authors can specify them , maybe in a dict of IDs corresponded with each MSA-Sequences(less than 20).