Skip to content
This repository was archived by the owner on Sep 22, 2025. It is now read-only.

Conversation

@davchuks
Copy link
Collaborator

Description

[This PR introduces an exploratory data analysis (EDA) notebook for the emotion tagging project and sets up the foundation for model training.]

Key Changes

  • Data ingest & schema validation
  • Loads nurse_emotion.csv, inspects schema with df.info() and head()
  • Data cleaning pipeline
  • Lower-casing, punctuation removal, stop-word filtering, lemmatization
  • Label profiling
  • Class balance of emotionpolarity and emotionTags
  • Plots saved: emotion_polarity_distribution.png, emotion_tags_distribution.png
  • Text statistics & visualizations
  • Token counts, n-grams, word clouds
  • Boxplot: note length vs emotion tags → emotion_vs_note_length.png
  • Temporal profiling
  • Derived time-of-day buckets (Morning/Afternoon/Evening/Night)
  • Distribution plot saved as emotion_by_time_of_day.png

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant