欢迎您访问365答案网,请分享给你的朋友!
生活常识 学习资料

docker搭建jupyter+spark+hadoop环境详细版

时间:2023-05-29

想要直接获取docker 镜像直接到文章末尾。

下面的是部署的步骤和方法:

1、 新建容器 docker run -itd --name spark -v /Users/lvhaiyang/workspace/docker/data:/root/data/ ubuntu:18.04 进入容器 docker exec -it -u root spark bash

2 换源 apt-get update apt-get install vim cp /etc/apt/sources.list /etc/apt/sources.list_bak vi /etc/apt/sources.list 写入以下内容 deb http://mirrors.aliyun.com/ubuntu/ bionic main multiverse restricted universe deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main multiverse restricted universe deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main multiverse restricted universe deb http://mirrors.aliyun.com/ubuntu/ bionic-security main multiverse restricted universe deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main multiverse restricted universe deb-src http://mirrors.aliyun.com/ubuntu/ bionic main multiverse restricted universe deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main multiverse restricted universe deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main multiverse restricted universe deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main multiverse restricted universe deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main multiverse restricted universe 更新源apt-get update

3 安装SSH 并免密登陆 ssh localhost apt-get install openssh-server /etc/init.d/ssh start apt-get install ssh 设置免密登陆 ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys apt-get install pdsh vi /etc/profile 文件末尾加入export PDSH_RCMD_TYPE=ssh source /etc/profile

4 安装Java环境 jdk-8u281-linux-x64.tar.gz tar -xvf jdk-8u281-linux-x64.tar.gz -C /opt mv jdk1.8.0_281/ jdk 配置环境变量,末尾加入下面的语句 vim /etc/profile export JAVA_HOME=/opt/jdk export JRE_HOME=/opt/jdk/jre export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH 环境变量生效 source /etc/profile 验证 java -version

5 安装 Hadoop,参考 https://hadoop.apache.org/docs/r3.2.2/hadoop-project-dist/hadoop-common/SingleCluster.html 下载 Hadoop wget https://dlcdn.apache.org/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz 解压到/opt tar -xvf hadoop-3.2.2.tar.gz -C /opt $HADOOP_HOME/etc/hadoop/hadoop-env.sh 文件末尾加入 export JAVA_HOME=/opt/jdk export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root 启动 hadoop /opt/hadoop-3.2.2/sbin/start-dfs.sh

6 安装 spark 参考 https://spark.apache.org/docs/latest/spark-standalone.html 下载spark wget https://downloads.apache.org/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz 解压到/opt tar -xvf spark-3.2.1-bin-hadoop3.2.tgz -C /opt 在Spark项目,sbin目录下的spark-config.sh文件下,最后添加JAVA_HOME的索引 export JAVA_HOME=/opt/jdk 要在 Spark 集群上运行应用程序,只需将spark://IP:PORTmaster 的 URL 传递给SparkContext constructor即可。 要从 Spark 访问 Hadoop 数据,只需使用 hdfs:// URL(通常是hdfs://:9000/path,但您可以在 Hadoop Namenode 的 Web UI 上找到正确的 URL) 启动spark,在 /opt/spark-3.2.1-bin-hadoop3.2/sbin/start-master.sh

7 安装 conda https://www.anaconda.com/products/individual https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh 直接运行安装脚本 Anaconda3-2021.11-Linux-x86_64.sh 设置国内pip源 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple8 安装 jupyterlab pip install jupyterlab -i http://pypi.douban.com/simple --trusted-host pypi.douban.com pip install pyspark-3.2.1 启动 jupyter-lab --allow-root --ip=0.0.0.0

如果想要直接部署可以使用命令下载镜像docker push wuchenlhy/jupyter_spark_hadoop:2.0启动镜像命令docker run -itd--name jupyter_spark_hadoop-p 8888:8888-p 8080:8080-v ${The path to be mounted}:/root/data/wuchenlhy/jupyter_spark_hadoop:2.0可以通过 http://ip:8888 访问 jupyterLab可以通过 http://ip:8080 访问 spark

  

Copyright © 2016-2020 www.365daan.com All Rights Reserved. 365答案网 版权所有 备案号:

部分内容来自互联网,版权归原作者所有,如有冒犯请联系我们,我们将在三个工作时内妥善处理。