Nationwide Investigation of Federal Prosecutors

This is a Natural Language Processing (NLP) & Supervised Machine Learning (ML) problem to determine if there is a prosecutor misconduct involved in a federal prosecutor case. We have a dataset of 624 labeled cases (467 are "no misconduct" & 157 "misconduct" cases) of which we use 80% to train the model and 20% to validate the model respectfully. We use StratifiedKFold cross-validator to ensure an equal distribution of both "misconduct" and "no misonduct" cases in the training & testing process. We have tried Logistic Regression, Random Forest, Support Vector Machine (SVM), Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) to solve this problem. As the result of comparing, Logistic Regression model could achieve 80% accuracy which is the highest.

Implementations

Tools

Requirements

pandas >= 0.23.4
numpy >= 1.14.5
wordcloud >= 1.5.0
matplotlib >= 3.0.0
scikit_learn >= 0.20.1
download glove.6B.zip
install textract

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
ali_code		ali_code
spark_code		spark_code
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nationwide Investigation of Federal Prosecutors

Implementations

Tools

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nationwide Investigation of Federal Prosecutors

Implementations

Tools

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages