Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 36 additions & 36 deletions manifiest_file.md
Original file line number Diff line number Diff line change
@@ -1,141 +1,141 @@
#Manifest File
# Manifest File

##raw_data_README.md:
## raw_data_README.md:

Readme file that describes how to access my raw data.

##Requirements.txt:
## Requirements.txt:

All of the needed modules to create my final data file.

##final_institution_doi_reproducible_JupyterNotebook.ipynb:
## final_institution_doi_reproducible_JupyterNotebook.ipynb:

Reproducible notebook that walks through joining my intermediate data files.

##final_institution_doi_assessment_report.pdf:
## final_institution_doi_assessment_report.pdf:

An assment of the value of my final dataset.

##DataCite.md:
## DataCite.md:

Markdown file that descirbes my DataCite data file.

##GRID.md:
## GRID.md:

Markdown file that describes my GRID data file.

##Harvard_DataVerse.md:
## Harvard_DataVerse.md:

Markdown file that describes my GRID data file.

##OpenAire.md:
## OpenAire.md:

Markdown file that desicirbres my OpenAire data file.

##Wieker_summary_slide.pptx:
## Wieker_summary_slide.pptx:

Powerpoint that describes the goals and processes of my dataset project.

##resume_excerpt.pdf:
## resume_excerpt.pdf:

An excerpt of what this dataset looks like in my Resume.

##original_DataCite_data (folder):
## original_DataCite_data (folder):

A folder that contains a set of folders, which each contain the json data for each institution from DataCite.

##DataCite_API.py:
## DataCite_API.py:

Accessing DataCite API for institution pages 1-40.

##DataCite_API_pages(41-100).py:
## DataCite_API_pages(41-100).py:

Accessing DataCite API for institution pages 41-100 if they existed.

##datacentersfull.txt:
## datacentersfull.txt:

Supplemental file for DataCite_API_pages(41-100).py, which is used to access institution names

##DataCite_csvconvert.py:
## DataCite_csvconvert.py:

Extract & convert DataCite data to a csv file.

##DataCite_Doi.csv:
## DataCite_Doi.csv:

File that was created by DataCite_csvconvert.py; It is the elements that I extracted my raw data.

##curated_DataCite_doi:
## curated_DataCite_doi:

The csv file that I created from my DataCite_agg.py script.

##curated_DataCite_doi_2.0.csv:
## curated_DataCite_doi_2.0.csv:

I made some hand eidts of my curated_DataCite_doi csv file. I added a column next to institution codes, which was a list of institution names that corresponded to the codes. This is the data file that I used for my jupyter notebook.

##DataCite_agg.py:
## DataCite_agg.py:
Py file to aggregate the DataCite_Doi.csv file by institution.

##grid.csv:
## grid.csv:

Main GRID database file that describes institutions.

##types.csv
## types.csv

supplemental GRID database file that describers institution types.

##grid_merge.py:
## grid_merge.py:

Merge GRID database files, so I have a file to show institution location and type.

##grid_merge.csv:
## grid_merge.csv:

Merged GRID database that shows institution location and type.

##grid_curated.csv:
## grid_curated.csv:

Only contains institutions from the grid_merge.csv file that are from my collected data.

##DataVerse_screenshot_1:
## DataVerse_screenshot_1:

Screenshot 1 of the query I did in DataVerse's repository to get dataset metadata.

##DataVerse_screenshot_2:
## DataVerse_screenshot_2:

Screenshot 2 of the query I did in DataVerse's repository to get dataset metadata.

##DataVerse_doi.csv:
## DataVerse_doi.csv:

I handcoded this file from the dataset 2017 publication information that was available on DataVerse's repository.

##curated_DataVerse_doi:
## curated_DataVerse_doi:

I curated my DataVerse_doi.csv file, and this is the file that is read into my jupyter reproducible notebook.

##OpenAire_doi.txt:
## OpenAire_doi.txt:

I copy pasted all of the dataset 2017 publication dataset data that is present on their search repository page.

##OpenAire_screenshot:
## OpenAire_screenshot:

screenshot of what OpenAire looked like when I copy pasted the publication data.

##OpenAire_doi.csv:
## OpenAire_doi.csv:

Handcoded csv from OpenAire_doi.txt.

##curated_OpenAire_doi.csv:
## curated_OpenAire_doi.csv:

I curated my OpenAire_doi.csv file, and this is the file that is read into my jupyter reproducible notebook.

##institution_doi.csv:
## institution_doi.csv:

Merged curated_DataCite_doi_2.0.csv, curated_DataVerse_doi.csv, curated_OpenAire_doi.csv, and grid_curated.csv into one data file.

##cleaned_institution_doi.csv:
## cleaned_institution_doi.csv:

Cleaned up some duplicate rows from institution_doi.csv.

##final_institution_doi.csv:
## final_institution_doi.csv:

Final dataset; did some more cleaning (removed incorrect license information and reset index).

Expand Down