The dataset used in this project is based on the Online Retail II dataset, which contains transactional data from a UK-based online retail company.
Due to GitHub file size limitations and best practices in data science project structuring, the dataset is not included directly in this repository.
You can access the dataset from the official source below:
The dataset includes real-world transactional data with the following key attributes:
- Invoice: Unique transaction identifier
- StockCode: Product/item code
- Description: Product name
- Quantity: Number of items purchased
- InvoiceDate: Transaction timestamp
- Price: Unit price
- CustomerID: Unique customer identifier
- Country: Customer location
Before modeling, the dataset undergoes several preprocessing steps to ensure analytical quality:
- Removal of canceled transactions (Invoices containing "C")
- Exclusion of non-product entries (e.g., shipping codes such as "POST")
- Filtering out invalid values (e.g., non-positive prices)
- Handling missing values
- Outlier capping using IQR-based thresholds
After preprocessing, an Association Rule Learning (ARL)βbased recommendation system is developed using the Apriori Algorithm.
The goal is to identify products that are frequently purchased together and generate data-driven product recommendations.
The recommendation system follows these steps:
- Transform transactional data into a basket (invoice-product) matrix
- Convert quantities into a binary format (1 if purchased, 0 otherwise)
- Apply the Apriori algorithm to extract frequent itemsets
- Generate association rules using support, confidence, and lift metrics
- Rank rules based on lift to identify strong product relationships
- Support β Frequency of itemset occurrence
- Confidence β Probability of purchasing item B given item A
- Lift β Strength of association between products (higher = stronger relationship)
- For a given product:
- Find rules where the product appears in the antecedent (left-hand side)
- Retrieve the corresponding consequents (right-hand side)
- Sort by lift to prioritize stronger associations
- Return top-N recommended products
This system enables:
- Cross-selling opportunities (e.g., βCustomers who bought this also boughtβ¦β)
- Improved product bundling strategies
- Increased average order value (AOV)
- More personalized shopping experiences
If a customer purchases a specific product, the system can recommend complementary products based on historical co-purchase patterns.
This makes the solution directly applicable to:
- E-commerce recommendation engines
- Campaign targeting
- Product placement strategies