Market Basket Analysis is one of the important approach to analyse the association in Data Mining. The basic idea is to find the associated pairs of items in a store when there are huge volumes of transaction data as follows:
trax1: cracker, icecream, beer
trax2: chicken, pizza, coke, bread
...
The following is the example code that I implemented on Hadoop 0.21.0, which takes the input "AssociationSP.txt" and generates the top 10 associated items that customers purchased together. After I complete a paper for conference with this example code, I will post more detailed info.
Donwload
- ItemCount.java Source file to have an idea how it looks like
- cloud9-csulaud-0.1.jar file to execute the code
- AssociationsSP.txt input file
- itemscount_sort2.txt and itemscount_sort4.txt sample outs for two- and four-pairs of items
(1) You need to create a dir "data" and upload the file to "data" on HDF:
> hadoop fs -mkdir data
> hadoop fs -put AssociationsSP.txt data/
(2) type in and run the example code (output dir: itemcount, 5 reducers, 2 pairs of association):
> hadoop jar cloud9-csulaud-0.1.jar edu.calstatela.hadoop.example.associations.ItemCount data/AssociationsSP.txt itemcount 5 2
(3) Type in the following to see the analysis:
> hadoop jar cloud9-csulaud-0.1.jar edu.calstatela.hadoop.utils.analysis.AnalyzeInputCount itemcount
Blog Archive
Tuesday, March 29, 2011
Thursday, March 24, 2011
Subscribe to:
Posts (Atom)
Followers
Profile
- Dalgual
- PhD, Consultant