chirindaopensource / measuring_corruption_from_text_data Star 1 Code Issues Pull requests End-to-End Python implementation of Muço’s (2025) corruption measurement framework. Combines NLP pipeline (regex extraction, Porter stemming, TF-IDF), PCA-based dimensionality reduction, and fixed-effects OLS to quantify institutional quality from Brazilian audit reports. Includes supervised learning robustness checks and LOO sensitivity analysis. natural-language-processing text-mining text-classification scikit-learn nltk econometrics supervised-learning dimensionality-reduction principal-component-analysis fixed-effects political-economy text-as-data brazilian-data government-transparency portuguese-nlp research-replication corruption-measurement dictionary-based-classification institutional-quality audit-analysis Updated Dec 14, 2025 Jupyter Notebook