Cloud Computing and Hadoop: How to set up Spark on EC2

setup AWS keys

Follow how to setup Amzon AWS at [5]

export AWS_ACCESS_KEY_ID=xxxxxxxxx

export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxx

In case you need a spark [2]:

The latest release of Spark is Spark 1.3.1, released on April 17, 2015 (release notes) (git tag)

Choose a Spark release: 1.3.1 (Apr 17 2015)
Choose a package type
Choose a download type: Select Apache MirrorDirect Download
Download Spark: spark-1.3.1-bin-hadoop2.6.tgz

generate a keypairs at AWS as shown [1]

download it to the local, especially at ec2 directory; Any directory for ampcamp?

If not, SSH error will show up,
for example, under the following directory: osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/ec2

chmod 400 [key.pem] in amp3 but not working for EC2

or chmod 600 [key.pem]

Run instances

osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/ec2$ ./spark-ec2 --key-pair=ampcamp3 --identity-file=~/.ssh/ampcamp3.pem --region=us-east-1 --zone=us-east-1a --copy launch my-spark-cluster

notes: -s (or --slaves) for number of slaves

osboxes@osboxes:~/proj/ampcamp3/training-scripts$ ./spark-ec2 -i ampcamp3.pem -k ampcamp3 -z us-east-1b -s 3 --copy launch amplab-training

4. Log in to master node

You need to go to AWS to find the master instance. Select the instance and choose "Connect" that shows the shell command to connect it remotely

And, http://master_node:8080 should be a spark page

5. Run HDFS at EC2 when HDFS has your data file that Spark needs to read

root@ip-10-232-51-182$ cd /root/ephemeral-hdfs/bin

root@ip-10-232-51-182 bin]$ ./start-dfs.sh

root@ip-10-232-51-182 bin]$ ./hadoop fs -ls /
root@ip-10-232-51-182 bin]$ ./hadoop fs -put samplefile /user/myname

Note: security group for master node needs to open for TCP 7077

6. Run any example at the master node [4, 7]

6.a wordcount for samplefile

cd ~/spark-1.3.1-bin-hadoop2.6

osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/$ ./bin/spark-submit
spark> text_file = spark.textFile("/user/myname/samplefile")
spark> counts = text_file.flatMap(lambda line: line.split(" "))
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)spark> counts.saveAsTextFile("hdfs://...")

6.b Spark Example

cd ~/spark-1.3.1-bin-hadoop2.6

osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/$ ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://54.205.231.93:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
~/spark-1.3.1-bin-hadoop2.6/lib/spark-examples-1.3.1-hadoop2.6.0.jar 1000

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
--executor-memory 20G \
--total-executor-cores 100 \
~/spark/lib/spark-examples-1.3.1-hadoop1.0.4.jar 1000

Reference

http://ampcamp.berkeley.edu/3/exercises/launching-a-bdas-cluster-on-ec2.html
https://spark.apache.org/downloads.html
https://spark.apache.org/docs/latest/ec2-scripts.html
https://spark.apache.org/docs/latest/submitting-application
http://dal-cloudcomputing.blogspot.com/2013/04/create-aws-account-and-access-keys.html
https://spark.apache.org/examples.html

3 comments:

ICS Cyber SecurityJuly 20, 2016 at 4:21 AM

thus this blog is really good just i got more information to your blog thus it is really nice and very much interesting.ya it is highlighting many important messages so that i like your message.

cloud disaster recovery solutions
AnonymousOctober 26, 2016 at 2:48 AM
Somebody necessarily help to make severely posts I might state. This is the first time I frequented your website page and to this point? I surprised with the research you made to create this particular post extraordinary. Well done admin..

Cloud Computing Training in Chennai
APTRONMarch 14, 2022 at 3:21 AM
Cloud Computing Interview Questions and Answers

Cloud Computing and Hadoop

Blog Archive

Thursday, June 04, 2015

How to set up Spark on EC2

generate a keypairs at AWS as shown [1]

Run instances

3 comments:

Followers

Profile

Search This Blog

Labels

Stats