Prescriptive Analytics and Program Evaluation

Client: Marvel Mart Department Store Chains (SQL, Python)

Created demographic and financial patient profiles from 50K+ records, identifying areas for improvement

Prescripted repurchase rates and costs by using feature engineering and ensemble models; identified best performing model through 10-fold cross validation and crafted strategies to control varieties utilization and cost

Author: Yinhui(Kami) Yang

Introduction

I have been recently hired by Marvel Mart, one of the world's leading department store chains, to be their new data analyst. I was hired because of your technical skills with Python. Immediately, they offer you a CSV file and ask you to provide specific business analytics based on the data. I am allowed required to use Python and the numPy and pandas libraries for this project. Use any other Python libraries that you would like as well. Take a few moments to familiarize myself with the CSV file called MM_Sales.csv Notice the columns, the headings, the format of the data in each column.

Marvel Mart has been providing both online and offline sales of a variety of products for many years. The provide services to countries all over the world and have stores in many countries. Marvel Mart divides their order up by an alphabetical priority labeling system:

C: Critical (most essential to be delivered quickly and accurately) 

H: High 

M: Medium 

L: Low

There are several columns of data for sales:

Unit Price: money collected for sale of 1 unit 

Unit Cost: money spent for purchase of 1 unit 

Total Revenue: money collected for sale of the collection of units 

Total Cost: money spent for purchase of the collection of units 

Total Profit: Total Cost - Total Revenue (profit)

Preliminary work

First and foremost, I'd want to add all of the csv, numpy, pandas, seaborn, and matplotlib.pyplot files to the project's beginning for individual coding. And I used the pd.set option script to prevent Python from performing substantial number sums with any of the floats in the scientific notation while I was working on the project.

Part 1: Cleaning the Data

Input initial ‘df’ data frame and read MM_Sales.csv file then follow the project requirements to find out any incorrect or missing data which is in the text type for that column in MM_Sales.csv file and then change it to the string "NULL" or number 0 (or 0.0 if it's a float). Writing code to clean the Order Priority, Item Type, individually by using the isnull() check out NULL and and filling any missing values with "NULL". Accordingly, for the required Country cleaning data which will need to apply the (lambda x: x if isinstance(x, str) else "NULL") into the code. Lastly, since the Order ID carries the numeric in the data columns then I am writing the errors='coerce' to avoid the invalid parsing errors which will be found in the data and set as NaN at the same time. Next, I am going to copy the data and double check on the NULL and 0 which I have completely cleaned out then output a new file to the MM_Sales_Clean.csv file for the rest of the project.

Part 2: Exploratory Data Analysis with Reports & Visualizations

Input the new MM_Sales_Clean.csv file for coding work, counting the sales transitions and identify the top 10 countries, next step create the new countries list carry the 'Trinidad and Tobago', 'Guinea', 'Maldives' to found out the shipping center build list for country from the country in top 10 countries but not in the countries list and then the solution written into the Rankings file. Next I am going to use the new MM_Sales_Clean.csv file for the online and offline sales channel count then write the solution written into the Rankings file. Input chart value group by and count from new MM_Sales_Clean.csv file, in order to get the pic chart. The particular reason for the circumstances is that I am doing similar logical coding work for the next two questions Order Priority and other Sales Channel Pie charts.

Import warnings and write the action='ignore' for the Part 2 Boxplot of Total Profits DISTRIBUTION avoid the error warnings.

Then comes to the part 2 Q3 I am using the value group by and cum from a clean new MM_Sales_Clean.csv file, in order to get a bar chart displaying the result. Using the previous data result, write into a dictionary and get the top 3 Item Types from Total profit then write the solution into the Randing file.

Finally, markdown the Boxplot of Total Profits DISTRIBUTION Discussion.

Discusstion

Boxplot is a standardized method for displaying data distribution based on five figures: minimum, first quartile, median, third quartile, and maximum. In the simplest Boxplot, the central rectangle spans from the first quartile to the third quartile. Segments in the rectangle show the middle value, and whiskers are displayed above and below the box, showing the positions of the minimum and maximum values. Accordingly, this boxplot image shows us the combinations with the Top 3 Total Profits by Item Type and the rectangle span are Cosmetic, Household, and Office Supplies. Segments in the rectangle show intermediate values of 0.9, 0.8, and 0.7, respectively.

Part 2 Q4 follows the project requirements determining the sum, average, maximum of Units Sold, Unit Cost, Total Revenue, Total Cost, Total Profit then write each solution into MM_Calc.txt file individually. Lastly, using the calculation data display the Line Plot.

Part 3: Cross-Reference Statistics

Import warnings and write the action='ignore' for the Part 3 dictionary data to avoid the error warnings. After inputting all of the dictionary data with a header for the data of the region and then convert the dictionary data to the CSV file, create a CSV file called Countries_By_Region.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Code Files		Code Files
Data Files		Data Files
Images Files		Images Files
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prescriptive Analytics and Program Evaluation

Introduction

Preliminary work

Part 1: Cleaning the Data

Part 2: Exploratory Data Analysis with Reports & Visualizations

Discusstion

Part 3: Cross-Reference Statistics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prescriptive Analytics and Program Evaluation

Introduction

Preliminary work

Part 1: Cleaning the Data

Part 2: Exploratory Data Analysis with Reports & Visualizations

Discusstion

Part 3: Cross-Reference Statistics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages