The Complete Natural Language Processing (NLP) Pipeline For Text Processing

Author - Shubha Mishra (@Shubha23)

The Complete Natural Language Processing (NLP) Pipeline For Text Processing

Project objective: To save you time and effort rewriting this script!

Text processing pipeline for NLP problems with ready-to-use functions and text classification models.

Description

Code file environment - Jupyter notebook
Programming language - Python (you can use any latest stable version)

Input dataset:

Data source - https://www.kaggle.com/team-ai/spam-text-message-classification

Data format - .csv

This is just sample data to show and test the code. You can replace it with ANY text dataset.
Note: This will not work on quantitative datasets or classification, regression, or clustering problems.

Getting Started

Dependencies

-> NLTK (Natural Language Toolkit) is the complete NLP package to build the full pipeline. Follow these steps for installations:

   Run pip install nltk in your terminal or command prompt

   If not installed, Python will throw a ModuleNotFoundError.

-> If you want to use an NLTK dataset, add the following to your script (replace with existing file import)

   import nltk
   nltk.download('dataset name') or 'all'

** I'd recommend using Gensim for large datasets, as NLTK tends to become slower as the input data size increases. **

The repository includes all other required files, and the code makes all necessary imports, including ScikitLearn.

Simply fork the repository and run it as any Python project.

----------------------- End of file ---------------------------------------------------------------------------------------------

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
NLP - Text processing pipeline.ipynb		NLP - Text processing pipeline.ipynb
README.md		README.md
text-processing-pipeline-classification-models.ipynb		text-processing-pipeline-classification-models.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Author - Shubha Mishra (@Shubha23)

The Complete Natural Language Processing (NLP) Pipeline For Text Processing

About

Uh oh!

Releases

Packages

Languages

Shubha23/Text-processing-NLP

Folders and files

Latest commit

History

Repository files navigation

Author - Shubha Mishra (@Shubha23)

The Complete Natural Language Processing (NLP) Pipeline For Text Processing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages