Welcome to Natural LangWiz!
Inspired from the book "Skip the Line", where James Altucher champions the idea that mastery comes from doing 10,000 experiments rather than just putting in 10,000 hours. This repository embraces that philosophy by treating each notebook and script as a distinct experiment in the vast and fascinating world of Natural Language Processing (NLP).
This collection contains 24 unique NLP experiments, each designed to build practical skills and deepen understanding. Let's get experimenting!
- Clone the repository:
git clone https://github.com/Asifdotexe/Natural-LangWiz.git
- Navigate to the project directory:
cd Natural-LangWiz - Install the required dependencies:
pip install -r requirements.txt
| Category | Concept & Notebook | Description |
|---|---|---|
| Fundamentals | Data Preprocessing | Covers essential text cleaning, tokenization, stop-word removal, stemming, and lemmatization. |
| Fundamentals | Vectorization | Demonstrates converting text to numbers using Bag-of-Words (BoW) and TF-IDF. |
| Fundamentals | N-Grams | Implements n-grams to analyze contiguous sequences of words in text. |
| Sentiment Analysis | AFINN Sentiment Analysis | Performs lexicon-based sentiment analysis using the AFINN library. |
| Sentiment Analysis | VADER Sentiment Analysis | Utilizes VADER for rule-based sentiment analysis on social media text. |
| Sentiment Analysis | Emotion Analysis (Transformer) | Uses a transformer model to detect specific emotions like joy, disgust, etc. |
| Transformer Models | Text Summarization | Generates concise summaries of long texts using a transformer pipeline. |
| Transformer Models | Text Generation | Leverages the GPT-2 model to generate coherent text from a given prompt. |
| Transformer Models | Question Answering | Implements a QA system with a RoBERTa model to find answers within a context. |
| Core Applications | Named Entity Recognition (NER) | Uses spaCy to identify and classify entities like people, organizations, and locations. |
| Core Applications | Spam Detection | Builds a model to classify SMS messages as spam or not spam (ham). |
| Core Applications | Topic Modelling | Discovers abstract topics in a corpus using Latent Dirichlet Allocation (LDA). |
| Text Comparison | Similarity Checker | Calculates semantic similarity between words and sentences using spaCy and WordNet. |
| Text Comparison | Fuzzy Matching | Implements fuzzy string matching to find similarities between non-identical strings. |
| Text Correction | Grammar Checking | Implements a grammar and spelling checker using language-tool-python. |
| Specialized Tools | Demojification | Handles emojis by either removing them or replacing them with text descriptions. |
| Specialized Tools | Translation | A simple script to translate text between languages using the Google Translate API. |
| Specialized Tools | Optical Character Recognition (OCR) | An experiment in extracting text from images using pytesseract. |
| API Integration | Python Gemini Integration | Shows how to interact with Google's Gemini API within a Python notebook. |
| API Integration | Gemini TKinter Script | A simple desktop GUI application to chat with the Gemini model. |
| Misc. Scripts | Review Insights Extractor | A practical script that analyzes product reviews to extract positive/negative aspects. |
| Misc. Scripts | Web Scraping | Extracts data from Wikipedia and Amazon using Beautiful Soup. |
| Misc. Scripts | Word Cloud | Creates a visual representation of text data based on word frequency. |
| Misc. Scripts | API Calling | Demonstrates how to interact with external APIs to retrieve and use text data. |
Happy Learning!