Financial institutions receive millions of consumer complaints across products such as credit cards, loans, mortgages, and digital banking services. This project analyzes a large real-world dataset of financial product complaints to uncover:
- What customers complain about the most
- Which companies and products show systemic issues
- How timely companies respond
- Key text-based themes behind complaints
- Actionable insights stakeholders can use to improve service delivery
The project simulates a real corporate analytics workflow, including:
β Data ingestion (KaggleHub + local CSV) β Cleaning & preparation β Exploratory data analysis β NLP on complaint text β Topic modeling (NMF) β Trend & correlation analysis β Company performance assessment β Recommendations & stakeholder-ready reporting
Consumers frequently report issues across financial services products, but organizations struggle to identify systemic failures, product-level pain points, and operational inefficiencies.
This project answers:
- What products drive the highest number of complaints?
- What issues appear most frequently and why?
- Which companies respond late or receive the most disputes?
- What themes emerge in unstructured complaint text?
- Are there geographic or state-level risk patterns?
- What operational recommendations can improve customer experience and compliance?
Languages & Libraries
- Python (Pandas, NumPy, Matplotlib, Seaborn)
- NLP: NLTK, Scikit-learn (TF-IDF, NMF topics)
- Visualization: Matplotlib, Seaborn, Plotly
- Data Ingestion: KaggleHub
Tools
- Jupyter Notebook
- Git / GitHub
consumer-complaints-analysis/
β
βββ Consumer_Complaints_End_to_End_Analysis.ipynb # Main notebook
βββ consumer_complaints_cleaned.csv # Output cleaned dataset
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
Supports both KaggleHub ingestion and offline CSV mode, mirroring real enterprise pipelines.
- Standardized column names
- Date parsing & validation
- Missing value treatment
- Text normalization (regex, stopwords, lowercasing)
- Duplicate detection
- Product-level complaint distribution
- Issue & sub-issue frequencies
- Company complaint volume
- Timeliness analysis
- Monthly time-series trends
- State-level complaint patterns
- Tokenization & text normalization
- TF-IDF vectorization
- Topic modeling (NMF)
- Keyword extraction (unigrams & bigrams)
- Insights into common complaint themes
KPIs include:
- % Timely responses
- Dispute rate
- Volume & severity clusters
- Root-cause patterns across issues
Executive-level output includes:
- Root causes behind recurring issues
- Operational improvement areas
- Product-specific recommendations
- Customer experience impact analysis
- SLA optimization strategies
(These will vary depending on dataset version)
- Mortgage and credit card issues dominate complaints
- Unauthorized transactions & billing disputes appear in top topics
- Some companies show consistently poor βTimely response?β performance
- State-level spikes correspond to population and specific product risks
- Topic modeling reveals themes such as fraud, delayed refunds, account access issues
git clone https://github.com/<your-username>/consumer-complaints-analysis.git
cd consumer-complaints-analysispip install -r requirements.txtOpen Jupyter or VS Code and run:
jupyter notebook Consumer_Complaints_End_to_End_Analysis.ipynb- Build a live dashboard (Power BI / Plotly Dash)
- Add supervised models to predict timeliness or dispute likelihood
- Add NER (Named Entity Recognition) for extracting merchants, amounts, and timelines
- Automate pipeline using Airflow or Prefect
- Connect to a relational database (PostgreSQL) for storage
Wonder Akwei Data Analyst β’ Machine Learning Researcher β’ Operations Analyst Focused on building analytics solutions that support financial services, risk management, and digital payments.