Skip to content

Instantly share code, notes, and snippets.

@tommycarpi
Last active September 3, 2021 10:14
Show Gist options
  • Save tommycarpi/f5a67c66a8f2170e263c to your computer and use it in GitHub Desktop.
Save tommycarpi/f5a67c66a8f2170e263c to your computer and use it in GitHub Desktop.
Link Apache Spark with IPython Notebook

How to link Apache Spark 1.6.0 with IPython notebook (Mac OS X)

Tested with

Python 2.7, OS X 10.11.3 El Capitan, Apache Spark 1.6.0 & Hadoop 2.6

Download Apache Spark & Build it

Download Apache Spark and build it or download the pre-built version.

I suggest to download the pre-built version with Hadoop 2.6.

Install Anaconda

Download and install Anaconda.

Install Jupyter

Once you have installed Anaconda open your terminal and type

conda install jupyter
conda update jupyter

Link Spark with IPython Notebook

Open terminal and type

echo "export PATH=$PATH:/path_to_downloaded_spark/spark-1.6.0/bin" >> .profile
echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profile

Now you can source it to make changes available in this terminal

source .profile

or Cmd+Q your terminal and reopen it.

Run IPython Notebook

Now, using your terminal, go in whatever folder you want and type pyspark. For example

cd Documents/my_spark_folder
pyspark

Now the IPython notebook should open in your browser.

To check whether Spark is correctly linked create a new Python 2 file inside IPython Notebook, type sc and run that line. You should see something like this

In [1]: sc
Out[1]: <pyspark.context.SparkContext at 0x1049bdf90>
@tommycarpi
Copy link
Author

Have you run source .profile or closed and reopened the terminal? Otherwise try updating conda and jupyter

@hamedhsn
Copy link

@Nomii5007
pass the packages when you run pyspark like:
pyspark --packages com.databricks:spark-avro_2.11:3.0.0,com.databricks:spark-redshift_2.11:2.0.1,com.databricks:spark-csv_2.11:1.5.0,com.amazonaws:aws-java-sdk-s3:1.11.73,com.amazonaws:aws-java-sdk-core:1.11.73

@vkannanaen
Copy link

I've spark-2.1.1, hadoop2.7, Python 36, Java etc..Still the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment