Evaluating whether NLP of progressive and conservative Reddit threads can give support to the horseshoe theory of politics.
Horseshoe theory of politics, at a high level, asserts that those in the far right and far left are quite similar to each other. If this were true, we would expect the language of subreddits for the two poles of the political spectrum to be similar to one another.
Sophisticated NLP models that can distinguish between these groups would suggest their use of language or choice of topics can be used to distinguished them, and therefore we have some reason to doubt the horseshoe theory of politics, that these groups are highly alike.
However, if an unsophisticated NLP model can distinguish between these groups, there is more reason to doubt the horseshoe theory of politics. This, as well as time constraints, explains why I chose to do minimal preprocessing.
Since the best model correctly predicted source thread 98% of the time, we have reason to reject that NLP can be used to support the horseshoe theory of politics.