A Mystery in Two Parts

Background

As a new data engineer at Pewlett Hackard, a research project on employees of the corporation from 1980s and 1990s was assigned. There are 2 parts of this project; data engineering and data analysis. The process contains designing the tables to hold data in the CSVs, importing the CSVs into a SQL database, and answering questions about the data.

Data Engineering

Entity Relationship Diagram (ERD)

Many-to-one relationship

emp_no in employees table and emp_no in dept_emp tables
emp_no in employees table and emp_no in dept_manager tables
title_id in titles table and emp_title_id in employees table
dept_no in departments table and dept_no in dept_emp tables.
dept_no in departments table and dept_no in dept_manager tables

One-to-one relationship

emp_no in employees table and emp_no in salaries table

Database

Create employee_db database in PostgreSQL
Import schema into database created
Import csv files to tables created as the following order:
- titles
- employees
- departments
- dept_emp
- dept_manager
- salaries

Data Analysis

Including a query excecuted in order to answer each question.

Part I: Reserch Questions

List the following details of each employee: employee number, last name, first name, sex, and salary. (Query-1)

  SELECT e.emp_no, e.last_name, e.first_name, e.sex, s.salary
  FROM employees AS e
  LEFT JOIN salaries AS s
  ON e.emp_no = s.emp_no;

List first name, last name, and hire date for employees who were hired in 1986. (Query-2)

  SELECT first_name, last_name, hire_date
  FROM employees
  WHERE hire_date BETWEEN '1986-01-01' AND '1986-12-31';

List the manager of each department with the following information: department number, department name, the manager's employee number, last name, first name. (Query-3)

  SELECT dm.dept_no, d.dept_name, dm.emp_no, e.last_name, e.first_name
  FROM dept_manager AS dm
  JOIN departments AS d 
  ON dm.dept_no = d.dept_no
  JOIN employees AS e 
  ON dm.emp_no = e.emp_no;

List the department of each employee with the following information: employee number, last name, first name, and department name. (Query-4)

  SELECT e.emp_no, e.last_name, e.first_name, d.dept_name
  FROM employees AS e
  JOIN dept_emp AS de
  ON e.emp_no = de.emp_no
  JOIN departments AS d
  ON de.dept_no = d.dept_no;

List first name, last name, and sex for employees whose first name is "Hercules" and last names begin with "B." (Query-5)
```
  SELECT first_name, last_name, sex
  FROM employees 
  WHERE first_name = 'Hercules'
  AND last_name LIKE 'B%';
```

List all employees in the Sales department, including their employee number, last name, first name, and department name. (Query-6)

  SELECT e.emp_no, e.last_name, e.first_name, d.dept_name
  FROM employees AS e
  JOIN dept_emp AS de
  ON e.emp_no = de.emp_no
  JOIN departments AS d
  ON de.dept_no = d.dept_no
  WHERE d.dept_name = 'Sales';

List all employees in the Sales and Development departments, including their employee number, last name, first name, and department name. (Query-7)

  SELECT e.emp_no, e.last_name, e.first_name, d.dept_name
  FROM employees AS e
  JOIN dept_emp AS de
  ON e.emp_no = de.emp_no
  JOIN departments AS d
  ON de.dept_no = d.dept_no
  WHERE d.dept_name IN ('Sales', 'Development');

In descending order, list the frequency count of employee last names. (Query-8)

  SELECT last_name,
  COUNT(*) AS "Frequency of Last name"
  FROM employees
  GROUP BY last_name
  ORDER BY "Frequency of Last name" DESC;

Part I: Bonus

Import database into Jupyter Notebook to visualize the data

Bonus-I: The most common salary

The most common salary for Pewlett Hackard's employees is between $40,000 and $50,000. (See Jupyter Notebook)

Bonus-II: Average salary by title

In postgres: Calculation of average salary by title which is stored in view. (See Query)

  CREATE VIEW "avg_salary_title" AS
  SELECT t.title,
  ROUND(AVG(s.salary),2) AS "avg salary"
  FROM employees AS e 
  JOIN salaries AS s
  ON e.emp_no = s.emp_no
  JOIN titles AS t
  ON e.emp_title_id = t.title_id
  GROUP BY t.title;

  SELECT * FROM avg_salary_title;

In Jupyter Notebook: Import view to visualize data. (See Jupyter Notebook)

Epilogue: See Query

After giving a presentation to my boss, I heard the word, "Search your ID number." Then, I looked up for ID 499942 and found that the name and hire date were incorrect. That name was April Foolday and hire date is before I was born. Therefore, I updated first name, last name and hire date. Then, displying the information of employee number 499942 again. 😉

    SELECT * FROM employees
    WHERE emp_no = 499942;

    UPDATE employees
    SET (last_name, first_name, hire_date) = ('Christmas', 'Merry', '2020-12-25')
    WHERE emp_no = 499942;

    SELECT e.emp_no, e.last_name, e.first_name, e.hire_date, d.dept_name
    FROM employees AS e
    JOIN dept_emp AS de
    ON e.emp_no = de.emp_no
    JOIN departments AS d
    ON de.dept_no = d.dept_no
    WHERE e.emp_no = 499942;

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
EmployeeSQL		EmployeeSQL
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Mystery in Two Parts

Background

Data Engineering

Entity Relationship Diagram (ERD)

Database

Data Analysis

Part I: Reserch Questions

Part I: Bonus

About

Uh oh!

Releases

Packages

abpuccini/employee-database-research

Folders and files

Latest commit

History

Repository files navigation

A Mystery in Two Parts

Background

Data Engineering

Entity Relationship Diagram (ERD)

Database

Data Analysis

Part I: Reserch Questions

Part I: Bonus

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages