欢迎您访问365答案网,请分享给你的朋友!
生活常识 学习资料

pyspark去重dropDuplicates、distinct;withColumn、lit、col;unionByName、groupBy

时间:2023-05-02
1、去重dropDuplicates、distinct

ff =d.select(['dnum']).dropDuplicates()ff.count()ff.show()fff =d.select(['dnum']).distinct()

2、withColumn、lit、col

withColumn增加一列
lit 指定列
col 选择列

import pyspark.sql.functions as Ftemp_df = temp_df.withColumn("date", F.lit(target_date))movie_feature_df = movie_feature_df.withColumn('tags', regexp_replace(col('tags'), "[", ""))

3、unionByName、groupBy

play_video_df = Nonefor i in range(args.range): t = target_date - datetime.timedelta(days=i) temp_df = spark.sql( "select * from ***album where year=%s and month=%s and day=%s" % (t.year, t.month, t.day)) temp_df = temp_df.withColumn("date", F.lit(target_date)) if play_video_df == None: play_video_df = temp_df else: play_video_df = play_video_df.unionByName(temp_df)target_df = play_video_dftarget_groupped_movie_df = target_movie_df.groupBy("dnum", "aid").agg(F.max("finish_rate").alias("finish_rate"))

Copyright © 2016-2020 www.365daan.com All Rights Reserved. 365答案网 版权所有 备案号:

部分内容来自互联网,版权归原作者所有,如有冒犯请联系我们,我们将在三个工作时内妥善处理。