The training codes of Cloudera learned in the class are at https://github.com/cloudera/cloudera-training and Shopzilla creates really cool "Hadoop In a Box" at https://github.com/shopzilla/hadoop-in-a-box, which simply launch stand-alone Hadoop to test Hadoop and its ecosystems.
Blog Archive
Showing posts with label hadoop example. Show all posts
Showing posts with label hadoop example. Show all posts
Sunday, April 07, 2013
Hadoop Big Data class at California State University Los Angeles with the Cloudera training codes and the test platform "Hadoop in a Box"
New course open for Hadoop Big Data (http://instructional1.calstatela.edu/jwoo5/classes/2013/spr/bigdata/) supported by Cloudera with its Educational Sponsorships at California State University Los Angeles. Isn't it the first class among universities of Southern California?
Tuesday, March 29, 2011
Market Basket Analysis Example in Hadoop
Market Basket Analysis is one of the important approach to analyse the association in Data Mining. The basic idea is to find the associated pairs of items in a store when there are huge volumes of transaction data as follows:
trax1: cracker, icecream, beer
trax2: chicken, pizza, coke, bread
...
The following is the example code that I implemented on Hadoop 0.21.0, which takes the input "AssociationSP.txt" and generates the top 10 associated items that customers purchased together. After I complete a paper for conference with this example code, I will post more detailed info.
Donwload
- ItemCount.java Source file to have an idea how it looks like
- cloud9-csulaud-0.1.jar file to execute the code
- AssociationsSP.txt input file
- itemscount_sort2.txt and itemscount_sort4.txt sample outs for two- and four-pairs of items
(1) You need to create a dir "data" and upload the file to "data" on HDF:
> hadoop fs -mkdir data
> hadoop fs -put AssociationsSP.txt data/
(2) type in and run the example code (output dir: itemcount, 5 reducers, 2 pairs of association):
> hadoop jar cloud9-csulaud-0.1.jar edu.calstatela.hadoop.example.associations.ItemCount data/AssociationsSP.txt itemcount 5 2
(3) Type in the following to see the analysis:
> hadoop jar cloud9-csulaud-0.1.jar edu.calstatela.hadoop.utils.analysis.AnalyzeInputCount itemcount
trax1: cracker, icecream, beer
trax2: chicken, pizza, coke, bread
...
The following is the example code that I implemented on Hadoop 0.21.0, which takes the input "AssociationSP.txt" and generates the top 10 associated items that customers purchased together. After I complete a paper for conference with this example code, I will post more detailed info.
Donwload
- ItemCount.java Source file to have an idea how it looks like
- cloud9-csulaud-0.1.jar file to execute the code
- AssociationsSP.txt input file
- itemscount_sort2.txt and itemscount_sort4.txt sample outs for two- and four-pairs of items
(1) You need to create a dir "data" and upload the file to "data" on HDF:
> hadoop fs -mkdir data
> hadoop fs -put AssociationsSP.txt data/
(2) type in and run the example code (output dir: itemcount, 5 reducers, 2 pairs of association):
> hadoop jar cloud9-csulaud-0.1.jar edu.calstatela.hadoop.example.associations.ItemCount data/AssociationsSP.txt itemcount 5 2
(3) Type in the following to see the analysis:
> hadoop jar cloud9-csulaud-0.1.jar edu.calstatela.hadoop.utils.analysis.AnalyzeInputCount itemcount
Tuesday, August 11, 2009
Set up Hadoop in Eclipse
Set up Hadoop in Eclipse
Hadoop on Windows with Eclipse
Hadoop Example: MyMaxTemperatureWithCombiner
Set up Hadoop in Eclipse
Hadoop on Windows with EclipseHadoop Example
MyMaxTemperatureWithCombiner.java,
MaxTemperatureMapper.java,
MaxTemperatureReducer.java
How to run the example codes:- You need to set up Hadoop as shown above (Set up Hadoop in Eclipse)
- make a directory named "tempIn" at your hadoop:
bin/hadoop fs -mkdir tempIn - copy input files 1901 and
1902 to your HDF:
bin/hadoop fs -cp 1901 tempIn/
bin/hadoop fs -cp 1902 tempIn/ - In the eclipse IDE, imports three java files above under package named "edu.calstatela.hipic.hadoop.util"
- Start Hadoop cluster as shown above (Set up Hadoop in Eclipse)
- In the eclipse IDE, Right click on
MyMaxTemperatureWithCombiner.java, choose "Run as" > "Run Hadoop
Application" - You will see the map/reduce results at the HDF folder "tempOut"
in DFS Location of eclipse IDE
Subscribe to:
Comments (Atom)
Followers
Profile
- Dalgual
- PhD, Consultant