Skip to content

Build a large-scale dataset of sentiment and emotion on code review comments #11

@bcdasilv

Description

@bcdasilv

The initial attempt will be working with the Sourced dataset:
https://github.com/src-d/datasets/tree/master/ReviewComments

The overall process is: first, pre-process the comments (e.g. remove html tags and code snippets); second, adhere the text (review comments) to the sentiment/emotion analysis API in use (e.g. check comment length); third, make API calls and store the results.

In terms of the dataset, our plan B can be the GH API. So we would be writing a script to consume the GH API to gather code review comments and their metadata, and then follow the process listed above. Another option would be GH Archive (but this is the one used by Sourced).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions