This repository contains a collection of end-to-end Natural Language Processing (NLP) and Machine Learning projects. It demonstrates a complete lifecycle of AI development—from raw text ingestion and advanced preprocessing to deep learning model architecture and production-ready deployment via APIs.
- Core Languages: Python (NumPy, Pandas, Matplotlib, Seaborn)
- NLP Frameworks: SpaCy, NLTK, TextBlob, Regular Expressions (Regex)
- Machine Learning: Scikit-Learn (Linear & Logistic Regression, SVM, KNN, Random Forest, Decision Trees)
- Deep Learning: TensorFlow, Keras (ANN, CNN, LSTM)
- Deployment & Tools: Flask (REST APIs), PyPi Package Management, Git, VS Code, Jupyter Notebooks
A modular approach to transforming unstructured text into machine-readable data:
- Text Cleaning: Automated pipelines for removing noise (URLs, mentions, stopwords) and expanding contractions.
- Linguistic Analysis: Tokenization, Part-of-Speech (POS) tagging, Lemmatization, and Stemming.
- Vectorization: Implementation of Bag-of-Words (BoW) and TF-IDF for statistical feature extraction.
- Embeddings: Utilizing Word2Vec and GloVe for semantic representation.
Classic machine learning architectures applied to real-world datasets:
- Sentiment Analysis: Benchmarking Logistic Regression and SVM on the IMDB Movie Reviews dataset.
- Spam Detection: High-accuracy binary classification of text communications.
- Multi-Label Tagging: Predicting multiple relevant tags for Stack Overflow technical posts.
Leveraging neural networks for complex sequence modeling:
- Sequence Modeling (LSTM): Generative AI model for automated poetry creation.
- Social Analytics (CNN): Deep learning classifiers for Hate Speech detection and Disaster Tweet categorization.
- Entity Recognition: Custom NER models using SpaCy for automated Resume (CV) parsing and information extraction.
- Python Packaging: Developed and published a custom text-processing library to PyPi.
- Model-as-a-Service: Deployed machine learning models as web applications using the Flask framework, allowing for real-time inference via API endpoints.
├── packages/ # Source code for custom PyPi packages
├── classic_ml/ # Regression and Classification implementations
├── deep_learning/ # ANN, CNN, and LSTM models
├── nlp_pipelines/ # SpaCy/NLTK preprocessing and NER scripts
├── deployment/ # Flask API and web application files
└── data/ # Dataset references and cleaning scripts
- Clone the repository:
git clone https://github.com/RutujaKumbhar17/Natural-Language-Processing-in-Python.git
- Install dependencies:
pip install -r requirements.txt
I am always open to collaborating on open-source projects or discussing new opportunities in AI/ML.
- Portfolio: rutujakumbhar.netlify.app
- LinkedIn: Rutuja Kumbhar
- GitHub: @RutujaKumbhar17
Developed with ❤️ by Rutuja Kumbhar