https://github.com/apache/hudi/releases
2.1 需要改flink 版本为13.5根目录下面的pom 文件
mvn clean package -DskipTests# 或者指定scala 版本 #编译后的包包的路径在packaging/hudi-flink-bundle/target/hudi-flink-bundle_2.12-*.*.*-SNAPSHOT.jar
2.3编译遇到一个错误[INFO] ------------------------------------------------------------------------[INFO] Reactor Summary for Hudi 0.10.0:[INFO] [INFO] Hudi ..............................................、SUCCESS [ 1.642 s][INFO] hudi-common .......................................、SUCCESS [ 9.808 s][INFO] hudi-aws ..........................................、SUCCESS [ 1.306 s][INFO] hudi-timeline-service .............................、SUCCESS [ 1.623 s][INFO] hudi-client .......................................、SUCCESS [ 0.082 s][INFO] hudi-client-common ................................、SUCCESS [ 8.027 s][INFO] hudi-hadoop-mr ....................................、SUCCESS [ 2.825 s][INFO] hudi-spark-client .................................、SUCCESS [ 13.891 s][INFO] hudi-sync-common ..................................、SUCCESS [ 0.718 s][INFO] hudi-hive-sync ....................................、SUCCESS [ 3.027 s][INFO] hudi-spark-datasource .............................、SUCCESS [ 0.066 s][INFO] hudi-spark-common_2.12 ............................、SUCCESS [ 7.706 s][INFO] hudi-spark2_2.12 ..................................、SUCCESS [ 9.535 s][INFO] hudi-spark_2.12 ...................................、SUCCESS [ 25.923 s][INFO] hudi-utilities_2.12 ...............................、FAILURE [ 2.638 s][INFO] hudi-utilities-bundle_2.12 ........................、SKIPPED[INFO] hudi-cli ..........................................、SKIPPED[INFO] hudi-java-client ..................................、SKIPPED[INFO] hudi-flink-client .................................、SKIPPED[INFO] hudi-spark3_2.12 ..................................、SKIPPED[INFO] hudi-dla-sync .....................................、SKIPPED[INFO] hudi-sync .........................................、SKIPPED[INFO] hudi-hadoop-mr-bundle .............................、SKIPPED[INFO] hudi-hive-sync-bundle .............................、SKIPPED[INFO] hudi-spark-bundle_2.12 ............................、SKIPPED[INFO] hudi-presto-bundle ................................、SKIPPED[INFO] hudi-timeline-server-bundle .......................、SKIPPED[INFO] hudi-hadoop-docker ................................、SKIPPED[INFO] hudi-hadoop-base-docker ...........................、SKIPPED[INFO] hudi-hadoop-namenode-docker .......................、SKIPPED[INFO] hudi-hadoop-datanode-docker .......................、SKIPPED[INFO] hudi-hadoop-history-docker ........................、SKIPPED[INFO] hudi-hadoop-hive-docker ...........................、SKIPPED[INFO] hudi-hadoop-sparkbase-docker ......................、SKIPPED[INFO] hudi-hadoop-sparkmaster-docker ....................、SKIPPED[INFO] hudi-hadoop-sparkworker-docker ....................、SKIPPED[INFO] hudi-hadoop-sparkadhoc-docker .....................、SKIPPED[INFO] hudi-hadoop-presto-docker .........................、SKIPPED[INFO] hudi-integ-test ...................................、SKIPPED[INFO] hudi-integ-test-bundle ............................、SKIPPED[INFO] hudi-examples .....................................、SKIPPED[INFO] hudi-flink_2.12 ...................................、SKIPPED[INFO] hudi-kafka-connect ................................、SKIPPED[INFO] hudi-flink-bundle_2.12 ............................、SKIPPED[INFO] hudi-kafka-connect-bundle .........................、SKIPPED[INFO] ------------------------------------------------------------------------[INFO] BUILD FAILURE[INFO] ------------------------------------------------------------------------[INFO] Total time: 01:29 min[INFO] Finished at: 2022-02-06T17:59:02+08:00[INFO] ------------------------------------------------------------------------[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.10.0: The following artifacts could not be resolved: io.confluent:kafka-avro-serializer:jar:5.3.4, io.confluent:common-config:jar:5.3.4, io.confluent:common-utils:jar:5.3.4, io.confluent:kafka-schema-registry-client:jar:5.3.4: Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.4 in aliyunmaven (https://maven.aliyun.com/repository/public) -> [Help 1][ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.[ERROR] Re-run Maven using the -X switch to enable full debug logging.[ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles:[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException[ERROR] [ERROR] After correcting the problems, you can resume the build with the command[ERROR] mvn -rf :hudi-utilities_2.12
以上错误需要手动下载包后添加本地仓库
mvn install:install-file -Dfile=/opt/myjar/common-config-5.3.0.jar -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.0 -Dpackaging=jarmvn install:install-file -Dfile=/opt/myjar/common-utils-5.3.0.jar -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.0 -Dpackaging=jarmvn install:install-file -Dfile=/opt/myjar/kafka-avro-serializer-5.3.0.jar -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.0 -Dpackaging=jarmvn install:install-file -Dfile=/opt/myjar/kafka-schema-registry-client-5.3.0.jar -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.0 -Dpackaging=jar
[INFO] Reactor Summary for Hudi 0.10.0:[INFO] [INFO] Hudi ..............................................、SUCCESS [ 1.370 s][INFO] hudi-common .......................................、SUCCESS [ 10.813 s][INFO] hudi-aws ..........................................、SUCCESS [ 1.394 s][INFO] hudi-timeline-service .............................、SUCCESS [ 1.404 s][INFO] hudi-client .......................................、SUCCESS [ 0.072 s][INFO] hudi-client-common ................................、SUCCESS [ 7.295 s][INFO] hudi-hadoop-mr ....................................、SUCCESS [ 2.848 s][INFO] hudi-spark-client .................................、SUCCESS [ 15.158 s][INFO] hudi-sync-common ..................................、SUCCESS [ 0.681 s][INFO] hudi-hive-sync ....................................、SUCCESS [ 2.856 s][INFO] hudi-spark-datasource .............................、SUCCESS [ 0.054 s][INFO] hudi-spark-common_2.12 ............................、SUCCESS [ 7.296 s][INFO] hudi-spark2_2.12 ..................................、SUCCESS [ 10.521 s][INFO] hudi-spark_2.12 ...................................、SUCCESS [ 26.299 s][INFO] hudi-utilities_2.12 ...............................、SUCCESS [ 11.262 s][INFO] hudi-utilities-bundle_2.12 ........................、SUCCESS [01:39 min][INFO] hudi-cli ..........................................、SUCCESS [ 15.297 s][INFO] hudi-java-client ..................................、SUCCESS [ 2.267 s][INFO] hudi-flink-client .................................、SUCCESS [01:06 min][INFO] hudi-spark3_2.12 ..................................、SUCCESS [ 6.117 s][INFO] hudi-dla-sync .....................................、SUCCESS [ 6.830 s][INFO] hudi-sync .........................................、SUCCESS [ 0.061 s][INFO] hudi-hadoop-mr-bundle .............................、SUCCESS [ 8.565 s][INFO] hudi-hive-sync-bundle .............................、SUCCESS [ 1.131 s][INFO] hudi-spark-bundle_2.12 ............................、SUCCESS [ 11.139 s][INFO] hudi-presto-bundle ................................、SUCCESS [ 38.706 s][INFO] hudi-timeline-server-bundle .......................、SUCCESS [ 8.251 s][INFO] hudi-hadoop-docker ................................、SUCCESS [ 1.166 s][INFO] hudi-hadoop-base-docker ...........................、SUCCESS [ 0.649 s][INFO] hudi-hadoop-namenode-docker .......................、SUCCESS [ 0.649 s][INFO] hudi-hadoop-datanode-docker .......................、SUCCESS [ 0.627 s][INFO] hudi-hadoop-history-docker ........................、SUCCESS [ 0.659 s][INFO] hudi-hadoop-hive-docker ...........................、SUCCESS [ 7.320 s][INFO] hudi-hadoop-sparkbase-docker ......................、SUCCESS [ 0.731 s][INFO] hudi-hadoop-sparkmaster-docker ....................、SUCCESS [ 0.638 s][INFO] hudi-hadoop-sparkworker-docker ....................、SUCCESS [ 0.667 s][INFO] hudi-hadoop-sparkadhoc-docker .....................、SUCCESS [ 0.671 s][INFO] hudi-hadoop-presto-docker .........................、SUCCESS [ 0.704 s][INFO] hudi-integ-test ...................................、SUCCESS [ 36.320 s][INFO] hudi-integ-test-bundle ............................、SUCCESS [01:47 min][INFO] hudi-examples .....................................、SUCCESS [ 8.120 s][INFO] hudi-flink_2.12 ...................................、SUCCESS [ 38.207 s][INFO] hudi-kafka-connect ................................、SUCCESS [ 19.832 s][INFO] hudi-flink-bundle_2.12 ............................、SUCCESS [ 27.658 s][INFO] hudi-kafka-connect-bundle .........................、SUCCESS [ 14.287 s][INFO] ------------------------------------------------------------------------[INFO] BUILD SUCCESS[INFO] ------------------------------------------------------------------------[INFO] Total time: 10:30 min[INFO] Finished at: 2022-02-06T20:29:29+08:00[INFO] ------------------------------------------------------------------------[root@node01 hudi-0.10.0]#
3、编译的包目录[root@node01 packaging]# pwd/opt/module/hudi/hudi-0.10.0/packaging[root@node01 packaging]# ll总用量 4drwxr-xr-x 4 501 games 46 2月 6 20:41 hudi-flink-bundledrwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-hadoop-mr-bundledrwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-hive-sync-bundledrwxr-xr-x 4 501 games 46 2月 6 20:39 hudi-integ-test-bundledrwxr-xr-x 4 501 games 46 2月 6 20:41 hudi-kafka-connect-bundledrwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-presto-bundledrwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-spark-bundledrwxr-xr-x 4 501 games 101 2月 6 20:38 hudi-timeline-server-bundledrwxr-xr-x 4 501 games 46 2月 6 20:37 hudi-utilities-bundle-rw-r--r-- 1 501 games 2206 12月 8 10:38 README.md[root@node01 packaging]#
4.flink 整合hudi 所需要的jar 包 主要是
hudi-flink-bundle_2.12-0.10.0.jar
hudi-hadoop-mr-bundle-0.10.0.jar
[root@node01 lib]# pwd/opt/module/flink/flink-1.13.5/lib[root@node01 lib]# ll总用量 316964-rw-r--r-- 1 root root 7802399 1月 1 08:27 doris-flink-1.0-SNAPSHOT.jar-rw-r--r-- 1 root root 249571 12月 27 23:32 flink-connector-jdbc_2.12-1.13.5.jar-rw-r--r-- 1 root root 359138 1月 1 10:17 flink-connector-kafka_2.12-1.13.5.jar-rw-r--r-- 1 hive 1007 92315 12月 15 08:23 flink-csv-1.13.5.jar-rw-r--r-- 1 hive 1007 106535830 12月 15 08:29 flink-dist_2.12-1.13.5.jar-rw-r--r-- 1 hive 1007 148127 12月 15 08:23 flink-json-1.13.5.jar-rw-r--r-- 1 root root 43317025 2月 6 20:51 flink-shaded-hadoop-2-uber-2.8.3-10.0.jar-rw-r--r-- 1 hive 1007 7709740 12月 15 06:57 flink-shaded-zookeeper-3.4.14.jar-rw-r--r-- 1 hive 1007 35051557 12月 15 08:28 flink-table_2.12-1.13.5.jar-rw-r--r-- 1 hive 1007 38613344 12月 15 08:28 flink-table-blink_2.12-1.13.5.jar-rw-r--r-- 1 root root 62447468 2月 6 20:44 hudi-flink-bundle_2.12-0.10.0.jar-rw-r--r-- 1 root root 17276348 2月 6 20:51 hudi-hadoop-mr-bundle-0.10.0.jar-rw-r--r-- 1 root root 1893564 1月 1 10:17 kafka-clients-2.0.0.jar-rw-r--r-- 1 hive 1007 207909 12月 15 06:56 log4j-1.2-api-2.16.0.jar-rw-r--r-- 1 hive 1007 301892 12月 15 06:56 log4j-api-2.16.0.jar-rw-r--r-- 1 hive 1007 1789565 12月 15 06:56 log4j-core-2.16.0.jar-rw-r--r-- 1 hive 1007 24258 12月 15 06:56 log4j-slf4j-impl-2.16.0.jar-rw-r--r-- 1 root root 724213 12月 27 23:23 mysql-connector-java-5.1.9.jar[root@node01 lib]#
5、进入到flink sql 中./sql-client.sh embedded shell# 在SQL Cli设置分析结果展示模式set execution.result-mode=tableau;
6、建表语句CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20))PARTITIONED BY (`partition`)WITH ( 'connector' = 'hudi', 'path' = 'hdfs://192.168.1.161:8020/hudi-warehouse/hudi-t1', 'write.tasks' = '1', 'compaction.tasks' = '1', 'table.type' = 'MERGE_ON_READ');
6.1 插入数据INSERT INTO t1 VALUES('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1');INSERT INTO t1 VALUES('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');## 展示Flink SQL> INSERT INTO t1 VALUES> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');[INFO] Submitting SQL update statement to the cluster...[INFO] SQL update statement has been successfully submitted to the cluster:Job ID: d6c70e43969b0f2b5124104468c5e065Flink SQL> select * from t1;+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+| op | uuid | name | age | ts | partition |+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+| +I | id6 | Emma | 20 | 1970-01-01 00:00:06.000 | par3 || +I | id5 | Sophia | 18 | 1970-01-01 00:00:05.000 | par3 || +I | id8 | Han | 56 | 1970-01-01 00:00:08.000 | par4 || +I | id7 | Bob | 44 | 1970-01-01 00:00:07.000 | par4 || +I | id2 | Stephen | 33 | 1970-01-01 00:00:02.000 | par1 || +I | id1 | Danny | 28 | 1970-01-01 00:00:01.000 | par1 || +I | id4 | Fabian | 31 | 1970-01-01 00:00:04.000 | par2 || +I | id3 | Julian | 53 | 1970-01-01 00:00:03.000 | par2 |+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+Received a total of 8 rows
7 更新 操作 更新就是需要从新插入数据 将年龄更改为18
INSERT INTO t1 VALUES(‘id1’,‘Danny’,18,TIMESTAMP ‘1970-01-01 00:00:01’,‘par1’);
查询如下
Flink SQL> select * from t1;+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+| op | uuid | name | age | ts | partition |+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+| +I | id8 | Han | 56 | 1970-01-01 00:00:08.000 | par4 || +I | id7 | Bob | 44 | 1970-01-01 00:00:07.000 | par4 || +I | id4 | Fabian | 31 | 1970-01-01 00:00:04.000 | par2 || +I | id3 | Julian | 53 | 1970-01-01 00:00:03.000 | par2 || +I | id2 | Stephen | 33 | 1970-01-01 00:00:02.000 | par1 || +I | id1 | Danny | 18 | 1970-01-01 00:00:01.000 | par1 || +I | id6 | Emma | 20 | 1970-01-01 00:00:06.000 | par3 || +I | id5 | Sophia | 18 | 1970-01-01 00:00:05.000 | par3 |+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+Received a total of 8 rowsFlink SQL>
8.flink 中的任务