Installing Spark on Ubuntu
STEPS
1. Install Virtualbox
2. Install Ubuntu on virtualbox
Setting up Spark
Spark is pretty simple to set
up and get running on machine.
Assuming you already have Java and Python:
Select the latest Spark release (1.2.0
at the time of this writing), a prebuilt package for Hadoop 2.4, and download
directly.
Unizip the spark
folder and rename it as spark.
PySpark find py4j.java_gateway?
export
PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
Edit your BASH profile to add
Spark to your
PATH
and to set
the SPARK_HOME
environment
variable. These helpers will assist you on the command line. On Ubuntu, simply
edit the ~/.bash_profile
or ~/.profile
files and
add the following:
5. After you source your
profile (or simply restart your terminal), you should now be able to run a
pyspark
interpreter
locally. Execute the pyspark
command,
and you should see a result as follows:
No comments:
Post a Comment