欢迎您访问365答案网,请分享给你的朋友!
生活常识 学习资料

Hive练习题之流量统计

时间:2023-06-16

需求: 统计每个用户各时间段的流量总计(相隔不超过10分钟的算在一起)

数据源:

1,2020-02-18 14:20:30,2020-02-18 14:46:30,201,2020-02-18 14:47:20,2020-02-18 15:20:30,301,2020-02-18 15:37:23,2020-02-18 16:05:26,401,2020-02-18 16:06:27,2020-02-18 17:20:49,501,2020-02-18 17:21:50,2020-02-18 18:03:27,602,2020-02-18 14:18:24,2020-02-18 15:01:40,202,2020-02-18 15:20:49,2020-02-18 15:30:24,302,2020-02-18 16:01:23,2020-02-18 16:40:32,402,2020-02-18 16:44:56,2020-02-18 17:40:52,503,2020-02-18 14:39:58,2020-02-18 15:35:53,203,2020-02-18 15:36:39,2020-02-18 15:24:54,30

建表

CREATE TABLE IF NOT EXISTS test.tb_flow(uid int,start_time timestamp,end_time timestamp,flow double)ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

创建并导入数据

vim /doit16/flow.txt LOAD DATA LOCAL INPATH "/doit16/flow.txt" INTO TABLE test.tb_flow;

解法

select uid, min(start_time) as start_time, max(end_time) as end_time, sum(flow) as total_flow from ( select uid, start_time, end_time, flow, sum(cn) over(partition by uid order by start_time) as flag from ( select uid, start_time, end_time, flow, if ((unix_timestamp(start_time) - unix_timestamp(lag_time) ) / 60 > 10 ,1,0) as cn from ( select uid, start_time, end_time, flow, lag(end_time,1,start_time) over(partition by uid order by start_time) as lag_time from tb_flow) t1 ) t2 ) t3group by uid, flag

Copyright © 2016-2020 www.365daan.com All Rights Reserved. 365答案网 版权所有 备案号:

部分内容来自互联网,版权归原作者所有,如有冒犯请联系我们,我们将在三个工作时内妥善处理。