Spark
[PySpark] 컬럼의 합 구하는 방법
눈가락
2021. 6. 16. 10:00
sql function 의 sum 을 groupby + agg 에서 사용
from pyspark.sql import Row from pyspark.sql.functions import sum as _sum df = sqlContext.createDataFrame( [Row(owner=u'u1', a_d=0.1), Row(owner=u'u2', a_d=0.0), Row(owner=u'u1', a_d=0.3)] ) df2 = df.groupBy('owner').agg(_sum('a_d').alias('a_d_sum')) df2.show() # +-----+-------+ # |owner|a_d_sum| # +-----+-------+ # | u1| 0.4| # | u2| 0.0| # +-----+-------+ |
아래 stackoverflow 답변 참고
https://stackoverflow.com/a/36719760
Sum operation on PySpark DataFrame giving TypeError when type is fine
I have such DataFrame in PySpark (this is the result of a take(3), the dataframe is very big): sc = SparkContext() df = [Row(owner=u'u1', a_d=0.1), Row(owner=u'u2', a_d=0.0), Row(owner=u'u1', a_d=...
stackoverflow.com