Data Preprocessing

You have to follow the below-mentioned steps to process further :
i. Sampled 1M data points because of computing and memory limitations.
ii. Separated code-snippets from Body
iii. Removed Special characters from Question title and description (not in code)
iv. Removed stop words (Except ‘C’)
v. Removed HTML Tags using Regular Expressions
vi. Converted all the characters into small letters
vii. Used SnowballStemmer to stem the words
Below we can find the example questions after preprocessed.

And now you have to create a new database called ‘Processed.db’ and loaded the preprocessed data into it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Preprocessing #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Data Preprocessing #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions