Friday 8 July 2016

Installing Spark on Ubuntu

Installing Spark on Ubuntu

STEPS

1. Install Virtualbox


2. Install Ubuntu on virtualbox


Setting up Spark

Spark is pretty simple to set up and get running on machine.
Assuming you already have Java and Python:
1       Visit the Spark downloads page
     Select the latest Spark release (1.2.0 at the time of this writing), a prebuilt package for Hadoop 2.4, and download directly.





Unizip the spark folder and rename it as spark.



PySpark find py4j.java_gateway?

export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH


Edit your BASH profile to add Spark to your PATH and to set the SPARK_HOME environment variable. These helpers will assist you on the command line. On Ubuntu, simply edit the ~/.bash_profile or ~/.profile files and add the following:






5. After you source your profile (or simply restart your terminal), you should now be able to run a pyspark interpreter locally. Execute the pyspark command, and you should see a result as follows:


















No comments:

Post a Comment