- setup AWS keys
export AWS_ACCESS_KEY_ID=xxxxxxxxx
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxx
In case you need a spark [2]:
- Choose a Spark release: 1.3.1 (Apr 17 2015)
- Choose a package type
- Choose a download type: Select Apache MirrorDirect Download
- Download Spark: spark-1.3.1-bin-hadoop2.6.tgz
generate a keypairs at AWS as shown [1]
- download it to the local, especially at ec2 directory; Any directory for ampcamp?
- If not, SSH error will show up,
- for example, under the following directory: osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/ec2
- chmod 400 [key.pem] in amp3 but not working for EC2
- or chmod 600 [key.pem]
Run instances
osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/ec2$ ./spark-ec2 --key-pair=ampcamp3 --identity-file=~/.ssh/ampcamp3.pem --region=us-east-1 --zone=us-east-1a --copy launch my-spark-cluster
notes: -s (or --slaves) for number of slaves
osboxes@osboxes:~/proj/ampcamp3/training-scripts$ ./spark-ec2 -i ampcamp3.pem -k ampcamp3 -z us-east-1b -s 3 --copy launch amplab-training
You need to go to AWS to find the master instance. Select the instance and choose "Connect" that shows the shell command to connect it remotely
root@ip-10-232-51-182$ cd /root/ephemeral-hdfs/bin
root@ip-10-232-51-182 bin]$ ./start-dfs.sh
root@ip-10-232-51-182 bin]$ ./hadoop fs -ls /
root@ip-10-232-51-182 bin]$ ./hadoop fs -put samplefile /user/myname
root@ip-10-232-51-182 bin]$ ./hadoop fs -put samplefile /user/myname
Note: security group for master node needs to open for TCP 7077
6. Run any example at the master node [4, 7]
6.a wordcount for samplefile
cd ~/spark-1.3.1-bin-hadoop2.6
6. Run any example at the master node [4, 7]
6.a wordcount for samplefile
cd ~/spark-1.3.1-bin-hadoop2.6
osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/$ ./bin/spark-submit
spark> text_file = spark.textFile("/user/myname/samplefile")
spark> counts = text_file.flatMap(lambda line: line.split(" "))
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)spark> counts.saveAsTextFile("hdfs://...")
spark> text_file = spark.textFile("/user/myname/samplefile")
spark> counts = text_file.flatMap(lambda line: line.split(" "))
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)spark> counts.saveAsTextFile("hdfs://...")
6.b Spark Example
cd ~/spark-1.3.1-bin-hadoop2.6
osboxes@osboxes:~/spark-1.3.1-bin-hadoop2.6/$ ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://54.205.231.93:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
~/spark-1.3.1-bin-hadoop2.6/lib/spark-examples-1.3.1-hadoop2.6.0.jar 1000
--class org.apache.spark.examples.SparkPi \
--master spark://54.205.231.93:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
~/spark-1.3.1-bin-hadoop2.6/lib/spark-examples-1.3.1-hadoop2.6.0.jar 1000
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
--executor-memory 20G \
--total-executor-cores 100 \
~/spark/lib/spark-examples-1.3.1-hadoop1.0.4.jar 1000
--class org.apache.spark.examples.SparkPi \
--master local[8] \
--executor-memory 20G \
--total-executor-cores 100 \
~/spark/lib/spark-examples-1.3.1-hadoop1.0.4.jar 1000
Reference
- https://spark.apache.org/docs/latest/submitting-application
- http://dal-cloudcomputing.blogspot.com/2013/04/create-aws-account-and-access-keys.html
- https://spark.apache.org/examples.html