我需要从一个
case class Rating(val uid: Int, val mid: Int, val score: Double, val timestamp: Int)
表中取出用户最近看过的10个电影的数据
难点:需要的不是一个用户是所有用户
case class介绍样例类(case class)适合用于不可变的数据。它是一种特殊的类,能够被优化以用于模式匹配。
首先创建两个样例类
case class RentlyMovie(val mid: Int, val score: Double, val timestamp: Int)
case class UserRentlyMovie(val uid: Int, recs: Seq[RentlyMovie])
编写业务逻辑
import spark.implicits._ val redisData = spark.read .option("uri", config("mongo.url")) .option("collection", MONGODB_RATING_COLLECTION) .format("com.mongodb.spark.sql") .load() .as[Rating] .rdd.groupBy(x => x.uid).map { x => UserRentlyMovie(x._1, x._2.toList.map((x => RentlyMovie(x.mid, x.score, x.timestamp))).sortBy(x => x.timestamp)(Ordering.Int.reverse).take(10)) } .toDF().persist()
sortBy()函数介绍:
*.sortBy(x => x.timestamp)(Ordering.Int.reverse)
表示降序排列
*.sortBy(x => x.timestamp)(Ordering.Int)
表示升序排列