最近接手个CDH6.3.1版本的大数据集群,以前我搭建的都是apache原生Hadoop集群,通过编辑器调试sparkSQL读取hive很容易。现在遇到CDH整合后的集群还是有点不习惯,找到cdh环境中的hive-site.xml里面配置基本没用。网上找了许多感觉没有正解,我按照原生apache的hive-site.xml配置更改了下,可以实现本地idea调试启动spark程序读取hive表。
项目结构
hive-site.xml配置
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?> hive.metastore.uris thrift://开启metastore主机ip:9083 hive.server2.thrift.port 10000 javax.jdo.option.ConnectionURL jdbc:mysql://hive使用mysql库的ip:3306/hive javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword password hive.zookeeper.quorum cdh-06.prod.ycsInsight.yonyou.com,cdh-02.prod.ycsInsight.yonyou.com,cdh-08.prod.ycsInsight.yonyou.com hive.metastore.warehouse.dir /user/hive/warehouse fs.defaultFS hdfs://namenode节点IP:8020 hive.metastore.schema.verification false datanucleus.autoCreateSchema true datanucleus.autoStartMechanism checked
测试代码:
object HiveTest { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession .builder .master("local[*]") .appName("Java Spark Hive Example") .enableHiveSupport .getOrCreate spark.sql("show databases").show() spark.sql("use databases").show() spark.sql("show tables").show()// spark.sql("select * from person").show() spark.stop() }}
pom.xml
<?xml version="1.0" encoding="UTF-8"?> 4.0.0 org.example spark-test 1.0-SNAPSHOT 2.4.0 2.1.1 2.11.12 org.scala-lang scala-library ${scala.version} org.apache.spark spark-core_2.11 ${spark.version} org.apache.spark spark-sql_2.11 ${spark.version} org.apache.spark spark-hive_2.11 ${spark.version} hive-metastore org.spark-project.hive hive-exec org.spark-project.hive org.apache.hive hive-exec ${hive.version} * * org.apache.hive hive-jdbc 1.1.0 mysql mysql-connector-java 5.1.38 net.alchim31.maven scala-maven-plugin 3.2.2 testCompile org.apache.maven.plugins maven-assembly-plugin 3.1.0 jar-with-dependencies make-assembly package single
执行结果:
有问题欢迎留言