想要直接获取docker 镜像直接到文章末尾。
下面的是部署的步骤和方法:
1、 新建容器 docker run -itd --name spark -v /Users/lvhaiyang/workspace/docker/data:/root/data/ ubuntu:18.04 进入容器 docker exec -it -u root spark bash
2 换源 apt-get update apt-get install vim cp /etc/apt/sources.list /etc/apt/sources.list_bak vi /etc/apt/sources.list 写入以下内容 deb http://mirrors.aliyun.com/ubuntu/ bionic main multiverse restricted universe deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main multiverse restricted universe deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main multiverse restricted universe deb http://mirrors.aliyun.com/ubuntu/ bionic-security main multiverse restricted universe deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main multiverse restricted universe deb-src http://mirrors.aliyun.com/ubuntu/ bionic main multiverse restricted universe deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main multiverse restricted universe deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main multiverse restricted universe deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main multiverse restricted universe deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main multiverse restricted universe 更新源apt-get update
3 安装SSH 并免密登陆 ssh localhost apt-get install openssh-server /etc/init.d/ssh start apt-get install ssh 设置免密登陆 ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys apt-get install pdsh vi /etc/profile 文件末尾加入export PDSH_RCMD_TYPE=ssh source /etc/profile
4 安装Java环境 jdk-8u281-linux-x64.tar.gz tar -xvf jdk-8u281-linux-x64.tar.gz -C /opt mv jdk1.8.0_281/ jdk 配置环境变量,末尾加入下面的语句 vim /etc/profile export JAVA_HOME=/opt/jdk export JRE_HOME=/opt/jdk/jre export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH 环境变量生效 source /etc/profile 验证 java -version
5 安装 Hadoop,参考 https://hadoop.apache.org/docs/r3.2.2/hadoop-project-dist/hadoop-common/SingleCluster.html 下载 Hadoop wget https://dlcdn.apache.org/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz 解压到/opt tar -xvf hadoop-3.2.2.tar.gz -C /opt $HADOOP_HOME/etc/hadoop/hadoop-env.sh 文件末尾加入 export JAVA_HOME=/opt/jdk export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root 启动 hadoop /opt/hadoop-3.2.2/sbin/start-dfs.sh
6 安装 spark 参考 https://spark.apache.org/docs/latest/spark-standalone.html 下载spark wget https://downloads.apache.org/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz 解压到/opt tar -xvf spark-3.2.1-bin-hadoop3.2.tgz -C /opt 在Spark项目,sbin目录下的spark-config.sh文件下,最后添加JAVA_HOME的索引 export JAVA_HOME=/opt/jdk 要在 Spark 集群上运行应用程序,只需将spark://IP:PORTmaster 的 URL 传递给SparkContext constructor即可。 要从 Spark 访问 Hadoop 数据,只需使用 hdfs:// URL(通常是hdfs://
7 安装 conda https://www.anaconda.com/products/individual https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh 直接运行安装脚本 Anaconda3-2021.11-Linux-x86_64.sh 设置国内pip源 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple8 安装 jupyterlab pip install jupyterlab -i http://pypi.douban.com/simple --trusted-host pypi.douban.com pip install pyspark-3.2.1 启动 jupyter-lab --allow-root --ip=0.0.0.0
如果想要直接部署可以使用命令下载镜像docker push wuchenlhy/jupyter_spark_hadoop:2.0启动镜像命令docker run -itd--name jupyter_spark_hadoop-p 8888:8888-p 8080:8080-v ${The path to be mounted}:/root/data/wuchenlhy/jupyter_spark_hadoop:2.0可以通过 http://ip:8888 访问 jupyterLab可以通过 http://ip:8080 访问 spark