New Feature: Apply NLTK

## Problem

DsKit currently does not provide a high-level text data cleaning pipeline.
Although the full version installs NLTK, there is no built-in NLTK-based pre-processing utility for text/NLP workflows. This forces users to repeatedly implement custom cleaning logic outside the library.

## Proposed Solution

Introduce a flexible, high-level text cleaning function named apply_nltk that enables NLP/text preprocessing directly within DsKit.

**The function**

- Uses NLTK internally

- Offers fine-grained control to users (case handling, stopwords, token processing, etc.)

- Is configurable and reusable across NLP pipelines

- Reduces boilerplate code for common text-cleaning tasks

- This would significantly improve DsKit’s usability for NLP and text-heavy datasets.

**Current Progress**

✅ Feature apply_nltk has already been implemented

✅ A Pull Request is open

🔄 Open to feedback and ready to revise:

- Code structure

- API design

- Contribution-guideline compliance

- Complexity or performance concerns

### Why This Matters

- Makes DsKit more NLP-friendly out of the box

- Encourages standardized text preprocessing

- Reduces repetitive user-side implementations

- Aligns with DsKit’s goal of simplifying data preparation workflows

### Related

Pull Request: https://github.com/Programmers-Paradise/DsKit/pull/2#issue-3792858190

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Feature: Apply NLTK #6

Problem

Proposed Solution

Why This Matters

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New Feature: Apply NLTK #6

Description

Problem

Proposed Solution

Why This Matters

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions