In this short blog, I will explain how to setup Apache Spark 1.6 with YARN. I assume Hadoop is already installed.
1. Download Apache Spark
Go to spark.apache.org/downloads.html and choose the Spark release you want to download(1.6.0 is the default currently). Then under the package type choose the Spark release corresponding to your Hadoop version. Mine is 2.6 hence I choose Pre-built for Hadoop 2.6 and later
If you want, you can download the Source Code, navigate to the base folder and build it based on your Hadoop version using below command.
2. Set HADOOP_CONF_DIR
To run Spark in YARN mode, we need to set the HADOOP_CONF_DIR environment variable.
3. Start the Master
Run the start-master.sh which is located in the sbin folder of Spark. This will start the Spark master at IP:8080 check http://localhost:8080 or http://yourip:8080
From the Spark UI, copy the Master URL, which in my case is spark://Vishnus-MacBook-Pro.local:7077
4. Start the Slave
Run the start-slave.sh which is located in the sbin folder. Pass the Master URL copied in the previous step as argument to the start-slave.sh script. This will start the Slave/Worker.
Go back to your Spark UI and you can see that Alive Workers is now 1 and the worker details are displayed under Workers.
5. Start Spark Shell
Run the spark-shell.sh file located in the bin folder of Spark with ’–master’ as ‘yarn’
Spark setup is complete. Thanks for reading.