欢迎您访问365答案网,请分享给你的朋友!
生活常识 学习资料

SparkLowlevelAPIRDD学习笔记:key

时间:2023-05-12

常规操作

# 创建myCollection = "Spark The Definitive Guide : Big Data Processing Made Simple".split(" ")words = spark.sparkContext.parallelize(myCollection, 2)keyword = words.keyBy(lambda word:word.lower()[0])keyword.mapValues(lambda word: word.upper()).collect()[('s', 'SPARK'),('t', 'THE'),('d', 'DEFINITIVE'),('g', 'GUIDE'),(':', ':'),('b', 'BIG'),('d', 'DATA'),('p', 'PROCESSING'),('m', 'MADE'),('s', 'SIMPLE')]# look up the result for particular keykeyword.lookup("s")['Spark', 'Simple']#sampleByKey#sample an RDD by a set of keys#RDD.sampleByKey(withReplacement, fractions, seed=None)[source]# 第一个是是否有放回,第二个是概率,第三个是随机数种子# 这个没法确认返回子集的大小import random## extract characters in wordsdistinctChars = words.flatMap(lambda word:list(word.lower()) .distinct() .collect() sampleMap = dict(map(lambda c:(c, random.random()), distinctChars))words.map(lambda word: (word.lower()[0], word)) .sampleByKey(True, sampleMap, 6) .collect()

Copyright © 2016-2020 www.365daan.com All Rights Reserved. 365答案网 版权所有 备案号:

部分内容来自互联网,版权归原作者所有,如有冒犯请联系我们,我们将在三个工作时内妥善处理。