[Spark] SparkSession 이란?

눈가락 2020. 2. 21. 12:12

2020. 2. 21. 12:12

Spark Session:

SPARK 2.0.0 onwards, SparkSession provides a single point of entry to interact with underlying Spark functionality and
allows programming Spark with DataFrame and Dataset APIs. All the functionality available with sparkContext are also available in sparkSession.

SparkSession은 스파크 2.0 버전 이후부터 사용할 수 있게 됨. SparkSession은 Spark의 기초가 되는 function 들과 상호작용할 수 있게 해주는 접촉점이며, DataFrame 과 Dataset APIs 를 사용하여 Spark 프로그래밍을 할 수 있도록 해 줌. sparkContext에서 사용 가능했던 모든 function 들을 SparkSession 역시 사용 가능.

In order to use APIs of SQL, HIVE, and Streaming, no need to create separate contexts as sparkSession includes all the APIs.

Once the SparkSession is instantiated, we can configure Spark’s run-time config properties.

Example:

Creating Spark session:
val spark = SparkSession
.builder
.appName(“WorldBankIndex”)
.getOrCreate()

Configuring properties:
spark.conf.set(“spark.sql.shuffle.partitions”, 6)
spark.conf.set(“spark.executor.memory”, “2g”)

Spark 2.0.0 onwards, it is better to use sparkSession as it provides access to all the spark Functionalities that sparkContext does. Also, it provides APIs to work on DataFrames and Datasets.

Spark Context:
Prior to Spark 2.0.0 sparkContext was used as a channel to access all spark functionality.
The spark driver program uses spark context to connect to the cluster through a resource manager (YARN orMesos..).
sparkConf is required to create the spark context object, which stores configuration parameter like appName (to identify your spark driver), application, number of core and memory size of executor running on worker node.

In order to use APIs of SQL, HIVE, and Streaming, separate contexts need to be created.

https://data-flair.training/forums/topic/sparksession-vs-sparkcontext-in-apache-spark/

저작자표시 비영리 동일조건 (새창열림)

'Spark' 카테고리의 다른 글

[Spark Streaming] HDFS 를 이용하여 Data pipeline 만들어보기 (0)	2020.03.20
[Spark] 기술 질문 대비 적어두는 것들 (0)	2020.03.07
[Spark] executor 개수 늘리는 방법 (0)	2020.02.15
[Spark] MongoDB connector for Apache Spark 웨비나 링크 (0)	2020.02.05
[Spark] binary 형태의 데이터 읽는 방법 (0)	2020.01.31

눈가락★

[Spark] SparkSession 이란?

'Spark' 카테고리의 다른 글

+ Recent posts

티스토리툴바