DS (first half) by SarahRana · Pull Request #5 · Stephen-Cole267/Data_Science_Project_HR_Analytics

SarahRana · 2022-08-16T12:14:43Z

Part 1. Will upload second notebook with DS once completed.

alexnaylor1999 · 2022-08-23T15:39:17Z

Hey Sarah, firstly thanks for submitting the first half of the project. Just a few pointers/suggestions:

When creating the connection object, save your credentials as environment variables. You can then access them in the notebook using the os module. This is good practice as you don't want to be making your credentials public when dealing with sensitive data.
Really liked the additional checks after making the manipulations. Also loved the chart checking TotalWorkingYears is valid with Age.
Good suggestions for why there may be discrepancies in the dataset.
Good work with using histograms and boxplots to analyse MonthlyIncome, YearsSinceLastPromotion, TotalWorkingYears and YearsAtCompany distributions.
Overall, really clear and effective visuals.

Great stuff! Let me know when you submit the second half - I'll take a look over it.

Stephen-Cole267 · 2022-08-26T12:21:04Z

A lot of insight into the data without a lot of code! This is really good :) .

What I liked:

Like Alex has mentioned, I really like how you did additional checks after manipulating the data and made sure that every question about the data was answered.
Good display of function creation - percent_plot and percent_line
Analysed the distribution of each feature and their interaction with the attrition target by looking at percentages which makes it easy to understand
Easy to follow code
Changed the ordinal columns into their respective categoricals which would mean a lot more to the stakeholder than integers and also more representative within the model
Used a magic operator to see wall time - this is especially useful for code that could potentially run for a long time

Suggestions to improve:
Note: Not sure if these will be covered in the second half so please ignore points which are covered in the future

Could see if there were any nulls in the data and use your distribution plots to decide what to impute these features with
Remove any highly correlated features (depending on the model you are planning on using) as they will not contribute much to the resulting model. Can use your correlation and percent plots to decide which feature is removed.

Looking forward to the second half :) !

SarahRana added 7 commits August 12, 2022 17:07

Create sample.txt

85d3ea2

Add files via upload

9325b29

Delete sample.txt

471642d

Delete HR Analytics DA Project.ipynb

8959306

Add files via upload

8b8b75e

Delete HR Analytics DA Project.ipynb

0f52ca6

Add files via upload

1df9c6c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DS (first half)#5

DS (first half)#5
SarahRana wants to merge 7 commits intoStephen-Cole267:Data_Sciencefrom
SarahRana:Master

SarahRana commented Aug 16, 2022

Uh oh!

alexnaylor1999 commented Aug 23, 2022

Uh oh!

Stephen-Cole267 commented Aug 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SarahRana commented Aug 16, 2022

Uh oh!

alexnaylor1999 commented Aug 23, 2022

Uh oh!

Stephen-Cole267 commented Aug 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants