Cloud Computing and Hadoop: 2015

setup AWS keys

Follow how to setup Amzon AWS at [5]

export AWS_ACCESS_KEY_ID=xxxxxxxxx

export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxx

In case you need a spark [2]:

The latest release of Spark is Spark 1.3.1, released on April 17, 2015 (release notes) (git tag)

Choose a Spark release: 1.3.1 (Apr 17 2015)
Choose a package type
Choose a download type: Select Apache MirrorDirect Download
Download Spark: spark-1.3.1-bin-hadoop2.6.tgz

generate a keypairs at AWS as shown [1]

download it to the local, especially at ec2 directory; Any directory for ampcamp?

If not, SSH error will show up,
for example, under the following directory: osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/ec2

chmod 400 [key.pem] in amp3 but not working for EC2

or chmod 600 [key.pem]

Run instances

osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/ec2$ ./spark-ec2 --key-pair=ampcamp3 --identity-file=~/.ssh/ampcamp3.pem --region=us-east-1 --zone=us-east-1a --copy launch my-spark-cluster

notes: -s (or --slaves) for number of slaves

osboxes@osboxes:~/proj/ampcamp3/training-scripts$ ./spark-ec2 -i ampcamp3.pem -k ampcamp3 -z us-east-1b -s 3 --copy launch amplab-training

4. Log in to master node

You need to go to AWS to find the master instance. Select the instance and choose "Connect" that shows the shell command to connect it remotely

And, http://master_node:8080 should be a spark page

5. Run HDFS at EC2 when HDFS has your data file that Spark needs to read

root@ip-10-232-51-182$ cd /root/ephemeral-hdfs/bin

root@ip-10-232-51-182 bin]$ ./start-dfs.sh

root@ip-10-232-51-182 bin]$ ./hadoop fs -ls /
root@ip-10-232-51-182 bin]$ ./hadoop fs -put samplefile /user/myname

Note: security group for master node needs to open for TCP 7077

6. Run any example at the master node [4, 7]

6.a wordcount for samplefile

cd ~/spark-1.3.1-bin-hadoop2.6

osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/$ ./bin/spark-submit
spark> text_file = spark.textFile("/user/myname/samplefile")
spark> counts = text_file.flatMap(lambda line: line.split(" "))
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)spark> counts.saveAsTextFile("hdfs://...")

6.b Spark Example

cd ~/spark-1.3.1-bin-hadoop2.6

osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/$ ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://54.205.231.93:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
~/spark-1.3.1-bin-hadoop2.6/lib/spark-examples-1.3.1-hadoop2.6.0.jar 1000

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
--executor-memory 20G \
--total-executor-cores 100 \
~/spark/lib/spark-examples-1.3.1-hadoop1.0.4.jar 1000

Reference

http://ampcamp.berkeley.edu/3/exercises/launching-a-bdas-cluster-on-ec2.html
https://spark.apache.org/downloads.html
https://spark.apache.org/docs/latest/ec2-scripts.html
https://spark.apache.org/docs/latest/submitting-application
http://dal-cloudcomputing.blogspot.com/2013/04/create-aws-account-and-access-keys.html
https://spark.apache.org/examples.html

Cloud Computing and Hadoop

Blog Archive

Thursday, June 04, 2015

How to set up Spark on EC2

generate a keypairs at AWS as shown [1]

Run instances

Followers

Profile

Search This Blog

Labels

Stats