I was able to read in, clean up, and visualize the real world project repository of Scala that spans data from a version control system (Git) as well as a project hosting site (GitHub). I found out who has had the most influence on its development and who are the experts. The dataset I used, which has been previously mined and extracted from GitHub, is comprised of three files:
- pulls_2011-2013.csv contains the basic information about the pull requests, and spans from the end of 2011 up to (but not including) 2014.
- pulls_2014-2018.csv contains identical information, and spans from 2014 up to 2018.
- pull_files.csv contains the files that were modified by each pull request.