Skip to content

Instantly share code, notes, and snippets.

@canimus
Created March 22, 2023 15:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save canimus/4ebe545bce86976715790c079f6ce9a1 to your computer and use it in GitHub Desktop.
Save canimus/4ebe545bce86976715790c079f6ce9a1 to your computer and use it in GitHub Desktop.
An initializer for PySpark reading from S3
from pyspark.sql import SparkSession
from pyspark import SparkConf
conf = (
SparkConf()
.setAppName("Connect AWS")
.setMaster("local[*]")
)
conf.set("spark.jars.packages","org.apache.hadoop:hadoop-aws:3.3.2")
conf.set("spark.hadoop.fs.s3a.access.key", "XXX")
conf.set("spark.hadoop.fs.s3a.secret.key", "XXX")
conf.set("spark.driver.maxResultSize", "2g"))
conf.set("spark.driver.memory", "8g")
conf.set("
spark = (
SparkSession
.builder
.config(conf=conf)
.getOrCreate()
)
spark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment