安装对应版本scala
spark和scala的版本需要匹配,不然安装后启动会报错
官网下载源码包
http://spark.apache.org/downloads.html
其他版本可以去https://archive.apache.org/dist/spark/下载
解压后修改pom.xml中对应的scala、Hadoop版本
去dev/make-distribution.sh修改对应版本号,编译时会快一点
改为:
下载之后在pom.xml目录点击git bash here使用git编译 编译过程会有点久
./dev/make-distribution.sh --name “hadoop321-without-hive” --tgz “-Pyarn,hadoop-provided,hadoop-3.2.1”
指定hadoop版本,不带hive编译
编译成功后会生成一个安装包
解压安装包后修改配置文件
cd $SPARK_HOME/conf
mv spark-env.sh.template spark-env.sh
mv spark-defaults.conf.template spark-defaults.conf
vim spark-env.sh
加上配置:
export JAVA_HOME=/opt/jdk1.8.0_181
export SCALA_HOME=/opt/scala-2.11.12
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_LAUNCH_WITH_SCALA=0
export SPARK_MASTER_IP=10.241.19.7
export SPARK_LIBRARY_PATH=/opt/spark-2.4.4-bin-hadoop321-without-hive/lib
export SPARK_MASTER_WEBUI_PORT=8082
export SPARK_WORKER_DIR=/opt/spark-2.4.4-bin-hadoop321-without-hive/work
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_PORT=7078
export SPARK_LOG_DIR=/opt/spark-2.4.4-bin-hadoop321-without-hive/log
export SPARK_PID_DIR=/opt/spark-2.4.4-bin-hadoop321-without-hive/run
export SPARK_DIST_CLASSPATH=$(/opt/hadoop/bin/hadoop classpath)
vim spark-defaults.conf
加上配置:
spark.master yarn-cluster
spark.home /opt/spark-2.4.4-bin-hadoop321-without-hive
spark.eventLog.enabled true
spark.eventLog.dir hdfs://10.××.××.7:9000/spark-log
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.executor.memory 2g
spark.driver.memory 2g
spark.executor.cores 2
spark.cores.max 2
spark.default.parallelism 36
spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers=“one two three”
spark.executor.extraClassPath /opt/spark-2.4.4-bin-hadoop321-without-hive/jars/*
spark.dirver.extraClassPath /opt/spark-2.4.4-bin-hadoop321-without-hive/jars/*
cd $SPARK_HOME/sbin
启动/停止 spark命令:
start-all.sh/stop.all.sh
访问 spark-env.sh配置的ui地址可以查看spark运行情况
然后再hive-site.xml 配置spark作为引擎
cd $HIVE_HOME/conf
vim hive-site.xml