Showing posts with label hadoop example. Show all posts
Showing posts with label hadoop example. Show all posts

Sunday, April 07, 2013

Hadoop Big Data class at California State University Los Angeles with the Cloudera training codes and the test platform "Hadoop in a Box"

New course open for Hadoop Big Data (http://instructional1.calstatela.edu/jwoo5/classes/2013/spr/bigdata/) supported by Cloudera with its Educational Sponsorships at California State University Los Angeles. Isn't it the first class among universities of Southern California?

The training codes of Cloudera learned in the class are at https://github.com/cloudera/cloudera-training and Shopzilla creates really cool "Hadoop In a Box" at https://github.com/shopzilla/hadoop-in-a-box, which simply launch stand-alone Hadoop to test Hadoop and its ecosystems.

Tuesday, March 29, 2011

Market Basket Analysis Example in Hadoop

Market Basket Analysis is one of the important approach to analyse the association in Data Mining. The basic idea is to find the associated pairs of items in a store when there are huge volumes of transaction data as follows:
trax1: cracker, icecream, beer
trax2: chicken, pizza, coke, bread
...

The following is the example code that I implemented on Hadoop 0.21.0, which takes the input "AssociationSP.txt" and generates the top 10 associated items that customers purchased together. After I complete a paper for conference with this example code, I will post more detailed info.

Donwload
- ItemCount.java Source file to have an idea how it looks like
- cloud9-csulaud-0.1.jar file to execute the code
- AssociationsSP.txt input file
- itemscount_sort2.txt and itemscount_sort4.txt sample outs for two- and four-pairs of items

(1) You need to create a dir "data" and upload the file to "data" on HDF:
> hadoop fs -mkdir data
> hadoop fs -put AssociationsSP.txt data/

(2) type in and run the example code (output dir: itemcount, 5 reducers, 2 pairs of association):
> hadoop jar cloud9-csulaud-0.1.jar edu.calstatela.hadoop.example.associations.ItemCount data/AssociationsSP.txt itemcount 5 2

(3) Type in the following to see the analysis:
> hadoop jar cloud9-csulaud-0.1.jar edu.calstatela.hadoop.utils.analysis.AnalyzeInputCount itemcount

Tuesday, August 11, 2009

Set up Hadoop in Eclipse

Set up Hadoop in Eclipse


Hadoop on Windows with Eclipse

Hadoop Example: MyMaxTemperatureWithCombiner


  1. Set up Hadoop in Eclipse

    Hadoop on Windows with Eclipse

  2. Hadoop Example

    MyMaxTemperatureWithCombiner.java,
    MaxTemperatureMapper.java,
    MaxTemperatureReducer.java


    How to run the example codes:


    1. You need to set up Hadoop as shown above (Set up Hadoop in Eclipse)

    2. make a directory named "tempIn" at your hadoop:

      bin/hadoop fs -mkdir tempIn

    3. copy input files 1901 and
      1902 to your HDF:

      bin/hadoop fs -cp 1901 tempIn/

      bin/hadoop fs -cp 1902 tempIn/

    4. In the eclipse IDE, imports three java files above under package named "edu.calstatela.hipic.hadoop.util"

    5. Start Hadoop cluster as shown above (Set up Hadoop in Eclipse)

    6. In the eclipse IDE, Right click on
      MyMaxTemperatureWithCombiner.java, choose "Run as" > "Run Hadoop
      Application"

    7. You will see the map/reduce results at the HDF folder "tempOut"
      in DFS Location of eclipse IDE

Followers

Profile