Installing Spark on Ubuntu
STEPS
1. Install Virtualbox
2. Install Ubuntu on virtualbox
Setting up Spark
Spark is pretty simple to set
up and get running on machine.
Assuming you already have Java and Python:
Select the latest Spark release (1.2.0
at the time of this writing), a prebuilt package for Hadoop 2.4, and download
directly.
Unizip the spark
folder and rename it as spark.
PySpark find py4j.java_gateway?
export
PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
Edit your BASH profile to add
Spark to your
PATH and to set
the SPARK_HOME environment
variable. These helpers will assist you on the command line. On Ubuntu, simply
edit the ~/.bash_profile or ~/.profile files and
add the following:
5. After you source your
profile (or simply restart your terminal), you should now be able to run a
pyspark interpreter
locally. Execute the pyspark command,
and you should see a result as follows:




No comments:
Post a Comment