Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions Introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
This report summarizes the analysis of the Iowa Liquor Sales dataset (2012-2020). The objective was to identify the most popular item per zipcode and the percentage of sales per store from 2016-2019.
The analysis involved data extraction, transformation, and visualization using Python, Pandas, and Matplotlib.

Data Extraction
We queried the database to retrieve liquor sales data between 2016 and 2019 and exported the results to a CSV file for further analysis.

Data Transformation
Using Pandas, we performed the following transformations:
1. Filtering the data for the years 2016-2019.
2. Grouping and Aggregating to find the total bottles sold per item per zipcode.
3. Calculating Percentages of sales per store.

Data Visualization
Two primary visualizations were created:
1. Bar Plot: Most Popular Items per Zipcode
2. Pie Chart: Sales per Store

Conclusion
The analysis revealed the most popular liquor items per zipcode and the sales distribution across stores.
The visualizations effectively communicate the findings, providing insights into consumer preferences and sales performance in different areas.
62 changes: 62 additions & 0 deletions scratch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# Load the dataset
df = pd.read_csv(r"C:\Users\venic\Desktop\DIMITRIS\years16to19.csv")

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])

# Find the most popular item per zipcode
popular_items = df.groupby(['zip_code', 'item_description'])['bottles_sold'].sum().reset_index()
most_popular_items = popular_items.loc[popular_items.groupby('zip_code')['bottles_sold'].idxmax()]

# Calculate the percentage of sales per store
total_sales = df['sale_dollars'].sum()
sales_per_store = df.groupby('store_number')['sale_dollars'].sum().reset_index()
sales_per_store['percentage_of_sales'] = (sales_per_store['sale_dollars'] / total_sales) * 100

# Print results
print("Most popular items per zipcode:\n", most_popular_items)
print("\nPercentage of sales per store:\n", sales_per_store)

# Set the aesthetic style of the plots
sns.set_style("whitegrid")

# Bar plot of the most popular items per zipcode
plt.figure(figsize=(12, 6))
bar_plot = sns.barplot(data=most_popular_items, x='zip_code', y='bottles_sold', hue='zip_code', dodge=False, palette='viridis', legend=False)
# Adding title and labels
bar_plot.set_title('Most Popular Items per Zipcode (2016-2019)', fontsize=16)
bar_plot.set_xlabel('Zipcode', fontsize=14)
bar_plot.set_ylabel('Bottles Sold', fontsize=14)
# Rotate x-axis labels for better readability
plt.xticks(rotation=45)
# Display the plot
plt.show()


# Calculate explode values
explode_values = [0.1 if value > 10 else 0 for value in sales_per_store['percentage_of_sales']]
# Pie chart of sales per store
plt.figure(figsize=(10, 10))
pie_chart = sales_per_store.set_index('store_number')['percentage_of_sales'].plot(
kind='pie',
autopct='%1.1f%%',
startangle=140,
colors=sns.color_palette('viridis', len(sales_per_store)),
explode=explode_values,
legend=True
)
# Adding title
plt.title('Percentage of Sales per Store (2016-2019)', fontsize=16)
# Equal aspect ratio ensures that pie is drawn as a circle
plt.axis('equal')
# Display the legend outside the pie chart
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
# Remove y-axis label for cleaner look
plt.ylabel('')
# Display the plot
plt.show()
Loading