22/02/22 10:24:20 INFO Client: Application report for application_1642757441712_0012 (state: FAILED)22/02/22 10:24:20 INFO Client: client token: N/A diagnostics: Application application_1642757441712_0012 failed 2 times due to AM Container for appattempt_1642757441712_0012_000002 exited with exitCode: -1000For more detailed output, check application tracking page:http://bigdata-dataos-001:18088/cluster/app/application_1642757441712_0012Then, click on links to logs of each attempt.Diagnostics: File file:/root/.sparkStaging/application_1642757441712_0012/__spark_libs__7954752360413169627.zip does not existjava.io.FileNotFoundException: File file:/root/.sparkStaging/application_1642757441712_0012/__spark_libs__7954752360413169627.zip does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611) at org.apache.hadoop.fs.RawLocalFileSystem.getFilelinkStatusInternal(RawLocalFileSystem.java:824) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
解决修改conf/spark-env.sh
export HADOOP_HOME=/mnt/hadoop-2.7.7export HADOOP_CONF_DIR=/mnt/hadoop-2.7.7/etc/hadoop
报错Exception in thread "main" java.io.IOException: Port 9000 specified in URI hdfs://hacluster:9000/eventLogs but hster' is a logical (HA) namenode and does not use port information. at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:526) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:171) at org.apache.hadoop.hdfs.DFSClient.
修改 conf/spark-defaults.conf
#我这里端口用的是8020需要改一下spark.eventLog.dir hdfs://hacluster:8020/eventLogs
问题22/02/22 10:56:07 WARN hive.HiveUtils: Hive jar path '/mnt/spark-2.4.3-bin-hadoop2.7/standalone-metastore/*' does not exist.22/02/22 10:56:07 INFO hive.HiveUtils: Initializing HivemetastoreConnection version 2.0 usingException in thread "main" java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf when creating Hive client using classpath:Please make sure that jars for your version of hive and hadoop are included in the paths passed to spark.sql.hive.metastore.jars. at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:277) at org.apache.spark.sql.hive.HiveUtils$.newClientFormetadata(HiveUtils.scala:384) at org.apache.spark.sql.hive.HiveUtils$.newClientFormetadata(HiveUtils.scala:286) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214) at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114) at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
查看jars
修改
conf/spark-defaults.conf
spark.sql.hive.metastore.jars=/mnt/spark-2.4.3-bin-hadoop2.7/jars/*spark.sql.hive.metastore.version=1.2.1
完整配置 conf/spark-defaults.confspark.driver.extraLibraryPath=/mnt/hadoop-2.7.7/lib/native:/mnt/hadoop-2.7.7/lib/native/Linux-amd64-64spark.executor.extraJavaOptions=-XX:+UseNUMAspark.executor.extraLibraryPath=/mnt/hadoop-2.7.7/lib/native:/mnt/hadoop-2.7.7/lib/native/Linux-amd64-64spark.history.provider=org.apache.spark.deploy.history.FsHistoryProviderspark.history.store.path=/var/lib/spark2/shs_dbspark.io.compression.lz4.blockSize=128kbspark.master=yarnspark.shuffle.file.buffer=1mspark.shuffle.io.backLog=8192spark.shuffle.io.serverThreads=128spark.shuffle.unsafe.file.output.buffer=5mspark.sql.autoBroadcastJoinThreshold=26214400spark.sql.hive.convertmetastoreOrc=truespark.sql.hive.metastore.jars=/mnt/spark-2.4.3-bin-hadoop2.7/jars/*spark.sql.hive.metastore.version=1.2.1spark.sql.orc.filterPushdown=truespark.sql.orc.impl=nativespark.sql.statistics.fallBackToHdfs=truespark.unsafe.sorter.spill.reader.buffer.size=1mspark.yarn.historyServer.address=10.19.32.30:18081spark.yarn.queue=defaultspark.eventLog.enabled truespark.eventLog.dir hdfs://hacluster:8020/eventLogsspark.eventLog.compress truespark.driver.cores 1spark.driver.memory 800mspark.executor.cores 1spark.executor.memory 1000mspark.executor.instances 1spark.sql.warehouse.dir hdfs://hacluster/user/hive/warehouse
conf/spark-env.shexport SPARK_DAEMON_MEMORY="2048m"export SPARK_DRIVER_MEMORY="10240m"export SPARK_EXECUTOR_CORES="4"export SPARK_EXECUTOR_MEMORY="4096m"# A string representing this instance of spark.(Default: $USER)SPARK_IDENT_STRING=$USER# The scheduling priority for daemons、(Default: 0)SPARK_NICENESS=0export HADOOP_HOME=/mnt/hadoop-2.7.7export HADOOP_CONF_DIR=/mnt/hadoop-2.7.7/etc/hadoop# The java implementation to use.export JAVA_HOME=/opt/jdk1.8.0_201
注意连接hive是要把hive-site.xml放在conf下