使用虚拟机上的sparksql连接hive
1、将hive的hive-site.xml拷贝到spark-local的conf目录下
2、将mysql的驱动包,拷贝到spark的jars目录下
3、测试1:使用spark-sql脚本
start-dfs.sh
../spark-local/bin/spark-sql
就可以正常写sql语句了
spark-sql (default)> show tables;
spark-sql (default)>select * from emp;
spark-sql (default)> select deptno,max(sal) from emp group by deptno;
测试2: 使用beeline工具
start-dfs.sh
(1) 先启动服务项, ../spark-local/sbin/start-thriftserver.sh
(2) 使用beeline指令连接:
../spark-local/bin/beeline -u jdbc:hive2://localhost:10000 -n root
然后就是可以show databases; show tables; 以及select * from emp;
在idea上连接hive
tips:还是在虚拟机上开启hdfs比较妥当(hive可以不用开启)
将虚拟机上的hive中的conf中hive-site.xml拷贝到idea中resources和target的classes文件夹中各一份
另一个环境问题就是要添加mysql的驱动依赖 (版本),hive其实也依靠mysql
pom.xml中的依赖
<?xml version="1.0" encoding="UTF-8"?>
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
4.0.0
org.example
spark_sz2102
1.0-SNAPSHOT
1.8
1.8
UTF-8
2.11.8
2.2.3
2.7.6
2.11
org.scala-lang
scala-library
${scala.version}
org.apache.spark
spark-core_2.11
${spark.version}
org.apache.hadoop
hadoop-client
${hadoop.version}
org.apache.spark
spark-sql_2.11
${spark.version}
org.apache.spark
spark-hive_2.11
${spark.version}
mysql
mysql-connector-java
5.1.25
代码案例
package day03
import org.apache.spark.sql.{Dataframe, SparkSession}
object Test04_hive {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local[*]")
.appName("saveMode")
.enableHiveSupport() //开启hive支持,
.getOrCreate()
val df: Dataframe = spark.table("sz2103.emp")
df.createTempView("emp")
val sqlText =
"""
|select deptno,
| max(sal),
| min(sal),
| count(1)
|from emp
| group by deptno
|""".stripMargin
spark.sql(sqlText).show()
spark.stop()
}
}
hive的写数据案例
package day03import org.apache.spark.sql.{Dataframe, SaveMode, SparkSession}object Test05_hive_write { def main(args: Array[String]): Unit = { System.setProperty("HADOOP_USER_NAME","root") val spark = SparkSession.builder() .master("local[*]") .appName("saveMode") .enableHiveSupport() //开启hive支持, .getOrCreate() val df: Dataframe = spark.read.option("sep", "|").option("header","true").csv("data/student.csv") df.write.mode(SaveMode.Ignore).saveAsTable("sz2103.student") spark.stop() readTest } def readTest(): Unit = { val spark = SparkSession.builder() .master("local[*]") .appName("saveMode") .enableHiveSupport() //开启hive支持, .getOrCreate() val df: Dataframe = spark.table("sz2103.student") println(df.count()) spark.stop() }}