世界上并没有完美的程序,但是我们并不因此而沮丧,因为写程序就是一个不断追求完美的过程。
首先注意一点,搭建过程中可能会出现异常,可以关注公众号查看异常的解决方式
docker-ssh Dockerfile
FROM centos:base MAINTAINER hbw RUN yum -y install passwd openssl openssh-server \ && ssh-keygen -q -t rsa -b 2048 -f /etc/ssh/ssh_host_rsa_key -N '' \ && ssh-keygen -q -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key -N '' \ && ssh-keygen -t dsa -f /etc/ssh/ssh_host_ed25519_key -N '' \ && sed -i "s/#UsePrivilegeSeparation.*/UsePrivilegeSeparation no/g" /etc/ssh/sshd_config \ && sed -i "s/UsePAM.*/UsePAM no/g" /etc/ssh/sshd_config EXPOSE 22 CMD ["/usr/sbin/sshd", "-D"]构建
docker build -t centos-ssh-1:base .docker-ssh-jdk: jdk-8u111-linux-x64.tar.gz Dockerfile
FROM centos-ssh-1:base ADD jdk-8u111-linux-x64.tar.gz /home RUN mv /home/jdk1.8.0_111 /home/jdk8 ENV JAVA_HOME /home/jdk8 ENV PATH $JAVA_HOME/bin:$PATH构建
docker build -t centos-ssh-jdk:base .docker-ssh-jdk-hadoop Dockerfile
FROM centos-ssh-jdk:base ADD hadoop-3.2.1.tar.gz /home/ RUN mv /home/hadoop-3.2.1 /home/hadoop ENV HADOOP_HOME /home/hadoop ENV PATH $HADOOP_HOME/bin:$PATH构建
docker build -t docker-ssh-jdk-hadoop:base .接下来创建docker-ssh-jdk-hadoop容器并进入 hadoop的配置:
vi hadoop-env.sh export JAVA_HOME=/home/jdk8 vi mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> </configuration> vi core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> vi hdts-site.xml <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration> vi yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>启动:
启动 ssh: 需要设置免密登录 /usr/sbin/sshd 启动 hadoop: 首先格式化hdfs bin: ./hdfs namenode -format sbin ./start-all.sh启动时会报一些异常,到ex-docker-hadoop查看解决
运行docker-local容器搭建集群
docker run --name hadoop-1 --hostname hnode-1 -d -P -p 9870:9870 -p 8088:8088 hadoop-local:1.0.0 /bin/sh -c "while true; do echo hello world; sleep 1; done" docker run --name hadoop-2 --hostname hnode-2 -d -P hadoop-local:1.0.0 /bin/sh -c "while true; do echo hello world; sleep 1; done" docker run --name hadoop-3 --hostname hnode-3 -d -P hadoop-local:1.0.0 /bin/sh -c "while true; do echo hello world; sleep 1; done"然后查看每个容器的ip,并为每一个容器设置/etc/hosts
172.17.0.6 hnode-1 172.17.0.11 hnode-2 172.17.0.12 hnode-3然后在hadoop-1上修改以下配置
hadoop-env.sh export JAVA_HOME=/home/jdk8 core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hnode-1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> </property> </configuration> hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration> yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>hnode-1</value> </property> </configuration> mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> </configuration>然后配置在 hadoop/etc/hadoop/slaves中配置从服务器
hnode-2 hnode-3然后将配置好的hadoop复制到其他两个容器,注意启动sshd
scp -rq /home/hadoop hnode-1:/home scp -rq /home/hadoop hnode-2:/home然后依次格式化并启动每个hadoop服务
hdfs namenode -format ./start-all.sh查看启动的进程 主节点:jps
3344 NodeManager 2726 SecondaryNameNode 3014 ResourceManager 2503 DataNode 4967 Jps 2329 NameNode从节点:jps
2936 NodeManager 2509 SecondaryNameNode 2286 DataNode 4255 Jps查看hadoop集群基本情况:
hdfs fsck /可以看到数据节点数为3
运行测试:/home/hadoop/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-3.2.1.jar pi 3 5wordcount测试:/home/hadoop/share/hadoop/mapreduce
vi test.txt hello hello good good good hdfs dfs -put test.txt /input/test.txt hdfs dfs -ls /input hadoop jar hadoop-mapreduce-examples-3.2.1.jar wordcount /input/test.txt /output hdfs dfs -ls /output hdfs dfs -text /output/part-r-00000 输出结果: good 5 hello 3 name 2然后生成镜像:
docker commit hadoop-1 hadoop:node然后保存成文件
docker save hadoop:node -o hadoop-node.tar操作过程中的异常处理,请关注公众号