Skip to content

AshishSinha5/apriori

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apriori Algorithm

A simple python implplementation of Apriori Algorithm for frequent item set mining and association rule learning over relational databases and dataframes.
Here I aim to implement an impllroved version of the algorithm i.e. AprioriTID inspired from Agarwal,Srikant et al. [1]

Status: Active

Dataset

UCI Machine Learning Repository Bag of Words Dataset (http://archive.ics.uci.edu/ml/datasets/Bag+of+Words) by David Newman contains three collection of text documents

  • Enron Emails - It contains data from about 150 users, mostly senior management of Enron.
  • NIPS papers - Contains data from paoers appearing in NIPS conference
  • KOS blog entries - Data of KAS blog entries, predominantly stored info about political news

Getting Started

  • Downlowad the data from the link above.
  • Clone the repository to your local PC.
  • To extract the required data run the following command (see main.py for args help)

python main.py -d "data/docword.kos.txt" -v "data/vocab.kos.txt" -k 5 -ms 0.25 -o True

Inferences

KOS dataset was passed through apriori algorithm multiple times with minimum support of 0.1, 0.2, 0.25 and 0.3 whereas NIPS dataset had minumum support of 0.4, 0.45, 0.5 and 0.6.
Some of the interesting frequent itemsets in KOS datasets include - {'create', 'democrats', 'war'}, {'bush', 'general', 'republicans', 'split'} whereas NIPS data had {'abstract', 'algorithm', 'approach', 'information', 'neural'} and {'abstract', 'application', 'input', 'set'} with word abstract being present in all the frequent itemset which is expected since all the documents of NIPS data contains word abstract. As we kept increasing minimum support and length of itemset both datasets followed a rather characteristic trend in terms of number of frequent itemsets generated and the time taekn to generate those which is shown in the graphs below.

KOS dataset NIPS dataset
KOS ITEMSETS NIPS ITEMSETS
KOS TIME NIPS TIME

References

[1] Fast algorithms for mining association rules,1994, Agrawal, Rakesh and Srikant, Ramakrishnan and others

About

Apriori algorithm for frequent itemset mining and association rule learning.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages