Skip to content

Applied SQL query to data analysis by importing CSV files. Also, imported database into Jupyter Notebook to visualize the data.

Notifications You must be signed in to change notification settings

abpuccini/employee-database-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 

Repository files navigation

A Mystery in Two Parts

Background

As a new data engineer at Pewlett Hackard, a research project on employees of the corporation from 1980s and 1990s was assigned. There are 2 parts of this project; data engineering and data analysis. The process contains designing the tables to hold data in the CSVs, importing the CSVs into a SQL database, and answering questions about the data.

Data Engineering

Entity Relationship Diagram (ERD)

Many-to-one relationship

  • emp_no in employees table and emp_no in dept_emp tables
  • emp_no in employees table and emp_no in dept_manager tables
  • title_id in titles table and emp_title_id in employees table
  • dept_no in departments table and dept_no in dept_emp tables.
  • dept_no in departments table and dept_no in dept_manager tables

One-to-one relationship

  • emp_no in employees table and emp_no in salaries table

Employee_ERD

Database


Data Analysis

Including a query excecuted in order to answer each question.

Part I: Reserch Questions

  • List the following details of each employee: employee number, last name, first name, sex, and salary. (Query-1)

      SELECT e.emp_no, e.last_name, e.first_name, e.sex, s.salary
      FROM employees AS e
      LEFT JOIN salaries AS s
      ON e.emp_no = s.emp_no;
    

  • List first name, last name, and hire date for employees who were hired in 1986. (Query-2)

      SELECT first_name, last_name, hire_date
      FROM employees
      WHERE hire_date BETWEEN '1986-01-01' AND '1986-12-31';
    

  • List the manager of each department with the following information: department number, department name, the manager's employee number, last name, first name. (Query-3)

      SELECT dm.dept_no, d.dept_name, dm.emp_no, e.last_name, e.first_name
      FROM dept_manager AS dm
      JOIN departments AS d 
      ON dm.dept_no = d.dept_no
      JOIN employees AS e 
      ON dm.emp_no = e.emp_no;
    

  • List the department of each employee with the following information: employee number, last name, first name, and department name. (Query-4)

      SELECT e.emp_no, e.last_name, e.first_name, d.dept_name
      FROM employees AS e
      JOIN dept_emp AS de
      ON e.emp_no = de.emp_no
      JOIN departments AS d
      ON de.dept_no = d.dept_no;
    

  • List first name, last name, and sex for employees whose first name is "Hercules" and last names begin with "B." (Query-5)

      SELECT first_name, last_name, sex
      FROM employees 
      WHERE first_name = 'Hercules'
      AND last_name LIKE 'B%';
    

  • List all employees in the Sales department, including their employee number, last name, first name, and department name. (Query-6)

      SELECT e.emp_no, e.last_name, e.first_name, d.dept_name
      FROM employees AS e
      JOIN dept_emp AS de
      ON e.emp_no = de.emp_no
      JOIN departments AS d
      ON de.dept_no = d.dept_no
      WHERE d.dept_name = 'Sales';
    

  • List all employees in the Sales and Development departments, including their employee number, last name, first name, and department name. (Query-7)

      SELECT e.emp_no, e.last_name, e.first_name, d.dept_name
      FROM employees AS e
      JOIN dept_emp AS de
      ON e.emp_no = de.emp_no
      JOIN departments AS d
      ON de.dept_no = d.dept_no
      WHERE d.dept_name IN ('Sales', 'Development');
    

  • In descending order, list the frequency count of employee last names. (Query-8)

      SELECT last_name,
      COUNT(*) AS "Frequency of Last name"
      FROM employees
      GROUP BY last_name
      ORDER BY "Frequency of Last name" DESC;
    

Part I: Bonus

Import database into Jupyter Notebook to visualize the data

Bonus-I: The most common salary

  • The most common salary for Pewlett Hackard's employees is between $40,000 and $50,000. (See Jupyter Notebook)


Bonus-II: Average salary by title

  • In postgres: Calculation of average salary by title which is stored in view. (See Query)

      CREATE VIEW "avg_salary_title" AS
      SELECT t.title,
      ROUND(AVG(s.salary),2) AS "avg salary"
      FROM employees AS e 
      JOIN salaries AS s
      ON e.emp_no = s.emp_no
      JOIN titles AS t
      ON e.emp_title_id = t.title_id
      GROUP BY t.title;
    
      SELECT * FROM avg_salary_title;
    
  • In Jupyter Notebook: Import view to visualize data. (See Jupyter Notebook)


Epilogue: See Query

After giving a presentation to my boss, I heard the word, "Search your ID number." Then, I looked up for ID 499942 and found that the name and hire date were incorrect. That name was April Foolday and hire date is before I was born. Therefore, I updated first name, last name and hire date. Then, displying the information of employee number 499942 again. 😉

    SELECT * FROM employees
    WHERE emp_no = 499942;

    UPDATE employees
    SET (last_name, first_name, hire_date) = ('Christmas', 'Merry', '2020-12-25')
    WHERE emp_no = 499942;

    SELECT e.emp_no, e.last_name, e.first_name, e.hire_date, d.dept_name
    FROM employees AS e
    JOIN dept_emp AS de
    ON e.emp_no = de.emp_no
    JOIN departments AS d
    ON de.dept_no = d.dept_no
    WHERE e.emp_no = 499942;

© Atcharaporn B Puccini

About

Applied SQL query to data analysis by importing CSV files. Also, imported database into Jupyter Notebook to visualize the data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published