Tuesday, March 29, 2011

Market Basket Analysis Example in Hadoop

Market Basket Analysis is one of the important approach to analyse the association in Data Mining. The basic idea is to find the associated pairs of items in a store when there are huge volumes of transaction data as follows:
trax1: cracker, icecream, beer
trax2: chicken, pizza, coke, bread
...

The following is the example code that I implemented on Hadoop 0.21.0, which takes the input "AssociationSP.txt" and generates the top 10 associated items that customers purchased together. After I complete a paper for conference with this example code, I will post more detailed info.

Donwload
- ItemCount.java Source file to have an idea how it looks like
- cloud9-csulaud-0.1.jar file to execute the code
- AssociationsSP.txt input file
- itemscount_sort2.txt and itemscount_sort4.txt sample outs for two- and four-pairs of items

(1) You need to create a dir "data" and upload the file to "data" on HDF:
> hadoop fs -mkdir data
> hadoop fs -put AssociationsSP.txt data/

(2) type in and run the example code (output dir: itemcount, 5 reducers, 2 pairs of association):
> hadoop jar cloud9-csulaud-0.1.jar edu.calstatela.hadoop.example.associations.ItemCount data/AssociationsSP.txt itemcount 5 2

(3) Type in the following to see the analysis:
> hadoop jar cloud9-csulaud-0.1.jar edu.calstatela.hadoop.utils.analysis.AnalyzeInputCount itemcount

Followers

Profile