The goal here should be: * [ ] List all the data sets that are needed * [ ] Give instructions on how to access the data * [ ] Give instructions on how to determine the file sizes for all the data sets