Understanding the Personality Traits of Stack Overflow Users: Text Analysis with IBM Personality Insights
The human aspect of programming has become a popular area of study in the Software Engineering community in recent years. Industry professionals and researchers alike are becoming more aware of sentiment, emotion, and personality traits in software engineering by exploring advancements in text analysis tools The Big Five personality model has been considered as a reliable model for mapping human personality traits. IBM Cloud Watson services provides an implementation of the Big 5 model for analyzing personality facets on a body of text. Therefore, this project aims to analyze the posts of Stack Overflow users to find a link between successful users and certain personality traits. Posts from Stack Overflow Users are concatenated and sent to the IBM Personality Insights API. Results are programmatically aggregated. Statistical analysis is run on the results. Trends in both trial groups were discovered. Both displayed high Openness and Neuroticism as well as low Agreeableness.
The final report for the project is available here.
Here's a brief description of some of the subdirectories in the Repo:
| Directory Name | Description |
|---|---|
| First Run | Initial trial run of 9 of the highest scoring post authors |
| Second Run | Second preliminary run with 40 of the highest scoring post authors |
| HighScoreResults_json | Around 100 of the top scoring posts' authors analyses. This data was used in the final report |
| ReputationResults_json | Around 100 of the highest Reputation users' analyses. This data was used in the final report |
In addition to the Big Five personality facets provided, IBM Personality Insights also returns percentile scores for some sub-facets of each of the Big Five as well as a set of scores for needs and values. The additional information is available in the raw JSON format returned by Personality Insights or can be viewed here(Google Docs).
A poster with the project overview and results was also created. A digital version of the poster can be viewed here (PDF). Additional visual aides used are available here (Google Slides)
This project utilizes the SOTorrent dataset, which allows for SQL access on the Stack Overflow late 2018 Datadump. More info about SOTorrent
IBM Cloud Personality Insights: IBM provides a Personality Insights Java API which was used for this project. As of 2019, IBM provides a set of free requests to users who want to test Personality Insights and other services.