diff --git a/manifiest_file.md b/manifiest_file.md index cf712ca..a77e7f8 100644 --- a/manifiest_file.md +++ b/manifiest_file.md @@ -1,141 +1,141 @@ -#Manifest File +# Manifest File -##raw_data_README.md: +## raw_data_README.md: Readme file that describes how to access my raw data. -##Requirements.txt: +## Requirements.txt: All of the needed modules to create my final data file. -##final_institution_doi_reproducible_JupyterNotebook.ipynb: +## final_institution_doi_reproducible_JupyterNotebook.ipynb: Reproducible notebook that walks through joining my intermediate data files. -##final_institution_doi_assessment_report.pdf: +## final_institution_doi_assessment_report.pdf: An assment of the value of my final dataset. -##DataCite.md: +## DataCite.md: Markdown file that descirbes my DataCite data file. -##GRID.md: +## GRID.md: Markdown file that describes my GRID data file. -##Harvard_DataVerse.md: +## Harvard_DataVerse.md: Markdown file that describes my GRID data file. -##OpenAire.md: +## OpenAire.md: Markdown file that desicirbres my OpenAire data file. -##Wieker_summary_slide.pptx: +## Wieker_summary_slide.pptx: Powerpoint that describes the goals and processes of my dataset project. -##resume_excerpt.pdf: +## resume_excerpt.pdf: An excerpt of what this dataset looks like in my Resume. -##original_DataCite_data (folder): +## original_DataCite_data (folder): A folder that contains a set of folders, which each contain the json data for each institution from DataCite. -##DataCite_API.py: +## DataCite_API.py: Accessing DataCite API for institution pages 1-40. -##DataCite_API_pages(41-100).py: +## DataCite_API_pages(41-100).py: Accessing DataCite API for institution pages 41-100 if they existed. -##datacentersfull.txt: +## datacentersfull.txt: Supplemental file for DataCite_API_pages(41-100).py, which is used to access institution names -##DataCite_csvconvert.py: +## DataCite_csvconvert.py: Extract & convert DataCite data to a csv file. -##DataCite_Doi.csv: +## DataCite_Doi.csv: File that was created by DataCite_csvconvert.py; It is the elements that I extracted my raw data. -##curated_DataCite_doi: +## curated_DataCite_doi: The csv file that I created from my DataCite_agg.py script. -##curated_DataCite_doi_2.0.csv: +## curated_DataCite_doi_2.0.csv: I made some hand eidts of my curated_DataCite_doi csv file. I added a column next to institution codes, which was a list of institution names that corresponded to the codes. This is the data file that I used for my jupyter notebook. -##DataCite_agg.py: +## DataCite_agg.py: Py file to aggregate the DataCite_Doi.csv file by institution. -##grid.csv: +## grid.csv: Main GRID database file that describes institutions. -##types.csv +## types.csv supplemental GRID database file that describers institution types. -##grid_merge.py: +## grid_merge.py: Merge GRID database files, so I have a file to show institution location and type. -##grid_merge.csv: +## grid_merge.csv: Merged GRID database that shows institution location and type. -##grid_curated.csv: +## grid_curated.csv: Only contains institutions from the grid_merge.csv file that are from my collected data. -##DataVerse_screenshot_1: +## DataVerse_screenshot_1: Screenshot 1 of the query I did in DataVerse's repository to get dataset metadata. -##DataVerse_screenshot_2: +## DataVerse_screenshot_2: Screenshot 2 of the query I did in DataVerse's repository to get dataset metadata. -##DataVerse_doi.csv: +## DataVerse_doi.csv: I handcoded this file from the dataset 2017 publication information that was available on DataVerse's repository. -##curated_DataVerse_doi: +## curated_DataVerse_doi: I curated my DataVerse_doi.csv file, and this is the file that is read into my jupyter reproducible notebook. -##OpenAire_doi.txt: +## OpenAire_doi.txt: I copy pasted all of the dataset 2017 publication dataset data that is present on their search repository page. -##OpenAire_screenshot: +## OpenAire_screenshot: screenshot of what OpenAire looked like when I copy pasted the publication data. -##OpenAire_doi.csv: +## OpenAire_doi.csv: Handcoded csv from OpenAire_doi.txt. -##curated_OpenAire_doi.csv: +## curated_OpenAire_doi.csv: I curated my OpenAire_doi.csv file, and this is the file that is read into my jupyter reproducible notebook. -##institution_doi.csv: +## institution_doi.csv: Merged curated_DataCite_doi_2.0.csv, curated_DataVerse_doi.csv, curated_OpenAire_doi.csv, and grid_curated.csv into one data file. -##cleaned_institution_doi.csv: +## cleaned_institution_doi.csv: Cleaned up some duplicate rows from institution_doi.csv. -##final_institution_doi.csv: +## final_institution_doi.csv: Final dataset; did some more cleaning (removed incorrect license information and reset index).