Spark Session:

SPARK 2.0.0 onwards, SparkSession provides a single point of entry to interact with underlying Spark functionality and
allows programming Spark with DataFrame and Dataset APIs. All the functionality available with sparkContext are also available in sparkSession.

SparkSession은 스파크 2.0 버전 이후부터 사용할 수 있게 됨. SparkSession은 Spark의 기초가 되는 function 들과 상호작용할 수 있게 해주는 접촉점이며, DataFrame 과 Dataset APIs 를 사용하여 Spark 프로그래밍을 할 수 있도록 해 줌. sparkContext에서 사용 가능했던 모든 function 들을 SparkSession 역시 사용 가능.

In order to use APIs of SQL, HIVE, and Streaming, no need to create separate contexts as sparkSession includes all the APIs.

Once the SparkSession is instantiated, we can configure Spark’s run-time config properties.

Example:

Creating Spark session:
val spark = SparkSession
.builder
.appName(“WorldBankIndex”)
.getOrCreate()

Configuring properties:
spark.conf.set(“spark.sql.shuffle.partitions”, 6)
spark.conf.set(“spark.executor.memory”, “2g”)

Spark 2.0.0 onwards, it is better to use sparkSession as it provides access to all the spark Functionalities that sparkContext does. Also, it provides APIs to work on DataFrames and Datasets.

 

 

 

 

Spark Context:
Prior to Spark 2.0.0 sparkContext was used as a channel to access all spark functionality.
The spark driver program uses spark context to connect to the cluster through a resource manager (YARN orMesos..).
sparkConf is required to create the spark context object, which stores configuration parameter like appName (to identify your spark driver), application, number of core and memory size of executor running on worker node.

In order to use APIs of SQLHIVE, and Streaming, separate contexts need to be created.

 

 

 

 

https://data-flair.training/forums/topic/sparksession-vs-sparkcontext-in-apache-spark/

 

 

 

+ Recent posts