Centos7 Hadoop单节点部署

it2025-12-30  2

前言

记录一下单节点部署hadoop服务

下载安装

安装JDK1.8下载CDH的Hadoop——hadoop-2.6.0-cdh5.15.1.tar.gz,并传输到虚拟机解压安装 tar -zxvf hadoop-2.6.0-cdh5.15.1.tar.gz -C /opt/ 环境变量 # 配置环境变量 vim /etc/profile # HADOOP_HOME export HADOOP_HOME=/opt/hadoop-2.6.0-cdh5.15.1 export PATH=$PATH:$HADOOP_HOME/bin source /etc/profile # 验证 hadoop version Hadoop 2.6.0-cdh5.15.1 Subversion http://github.com/cloudera/hadoop -r 2d822203265a2827554b84cbb46c69b86ccca149 Compiled by jenkins on 2018-08-09T16:23Z Compiled with protoc 2.5.0 From source with checksum 96bc735f7d923171f18968309fa3c477 This command was run using /opt/hadoop-2.6.0-cdh5.15.1/share/hadoop/common/hadoop-common-2.6.0-cdh5.15.1.jar ssh无密码登陆

参考通用配置里有免密登录

单节点伪分布式HDFS配置 修改HDFS配置文件 /opt/hadoop-2.6.0-cdh5.15.1/etc/hadoop路径下面 hadoop-env.sh export JAVA_HOME=/usr/java/jdk1.8.0_261 core-site.xml <property> <name>fs.defaultFS</name> <value>hdfs://主机名:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/tmp</value> #创建一个tmp文件夹用来存储临时文件 </property> hdfs-site.xml <property> <name>dfs.replication</name> <value>1</value> </property> slaves 主机名 启动HDFS #一次执行的时候一定要格式化文件系统,不要重复执行: hdfs namenode -format

显示如下

20/10/22 14:01:21 INFO common.Storage: Storage directory /opt/hadooptmp/dfs/name has been successfully formatted. 20/10/22 14:01:21 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hadooptmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 20/10/22 14:01:21 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadooptmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds . 20/10/22 14:01:21 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 20/10/22 14:01:21 INFO util.ExitUtil: Exiting with status 0 20/10/22 14:01:21 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1 ************************************************************/ 启动集群 ./start-dfs.sh #注意 start/stop-dfs.sh与hadoop-daemons.sh的关系 start-dfs.sh = hadoop-daemons.sh start namenode hadoop-daemons.sh start datanode hadoop-daemons.sh start secondarynamenode stop-dfs.sh = .... #jps查看一下 6978 NameNode 7125 DataNode 7305 SecondaryNameNode

启动过程中遇到下面问题:

The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.

则:

vi /etc/ssh/ssh_config #在最后添加如下两行配置: StrictHostKeyChecking no UserKnownHostsFile /dev/null

启动过程中哪个节点没启动起来,去/opt/hadoop-2.6.0-cdh5.15.1/logs路径下,cat对应节点的.log文件。

hadoop-root-datanode-node2.log hadoop-root-secondarynamenode-node2.log hadoop-root-datanode-node2.out hadoop-root-secondarynamenode-node2.out hadoop-root-namenode-node2.log SecurityAuth-root.audit hadoop-root-namenode-node2.out 启动时,会出现一个Warning,如下: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

原因是: 从java.library.path处没有找到libhadoop.so,而libhadoop.so是存放在/opt/hadoop-2.6.0-cdh5.15.1/lib/native下的,进入路径查看,发现没有。 解决:

# 下载native编译版本并解压 http://dl.bintray.com/sequenceiq/sequenceiq-bin/ 找到对应hadoop版本的native下载 传输到虚拟机 # 解压到native tar -xvf hadoop-native-64-2.7.0.tar -C /opt/hadoop-2.6.0-cdh5.15.1/lib/native [root@master native]# ls libhadoop.a libhadoop.so libhadooputils.a libhdfs.so libhadooppipes.a libhadoop.so.1.0.0 libhdfs.a libhdfs.so.0.0.0 # native有了之后,配置环境变量 vi /etc/profile # 添加 export JAVA_LIBRARY_PATH=/opt/hadoop-2.6.0-cdh5.15.1/lib/native source /etc/profile # 针对spark-shell的警告,需要在Spark安装目录下的conf/spark-env.sh引入JAVA_LIBRAY_PATH export LD_LIBRARY_PATH=$JAVA_LIBRARY_PATH

启动服务就不会出现Warning了

最后,查看下Hadoop的WebUI:主机ip:50070默认端口

YARN配置 修改YARN配置文件 /opt/hadoop-2.6.0-cdh5.15.1/etc/hadoop路径下面 yarn-env.sh export JAVA_HOME=/usr/java/jdk1.8.0_261 yarn-site.xml <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定YARN的ResourceManager的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.44.12</value> </property> YARN启动 ./start-yarn.sh jps ResourceManager NodeManager webUI ip:8088 YARN架构和组件 YARN架构 Client、ResourceManager、NodeManager、ApplicationMaster master/slave: RM/NM Client: 向RM提交任务、杀死任务等 ApplicationMaster: 每个应用程序对应一个AM AM向RM申请资源用于在NM上启动对应的Task 数据切分 为每个task向RM申请资源(container) NodeManager通信 任务的监控 NodeManager: 多个 干活 向RM发送心跳信息、任务的执行情况 接收来自RM的请求来启动任务 处理来自AM的命令 ResourceManager:集群中同一时刻对外提供服务的只有1个,负责资源相关 处理来自客户端的请求:提交、杀死 启动/监控AM 监控NM 资源相关 container:任务的运行抽象 memory、cpu.... task是运行在container里面的 可以运行am、也可以运行map/reduce task YARN执行流程 Client向RM提交作业RM指定一个NM启动一个container这个Node的NM在container中跑AMAM向RM注册,并申请资源,RM向AM分配资源AM指定若干NM启动container被指定的NM在container中跑task

这篇很全面 可参考

最新回复(0)