我使用zookeeper搭建了一个HA高可用集群,然后有三台主机分别为node1,node2,node3,其中node1和node2配置的NameNode,node2和node3配置的DataNode,并且三台机器都配置了zookeeper和journalnode。当我尝试使用start-dfs.sh启动的时候,NameNode和DataNode都启动了,但是journalnode和zookeeper没启动起来,并且提示如下异常:
ERROR: Attempting to operate on hdfs journalnode as root
ERROR: but there is no HDFS_JOURNALNODE_USER defined. Aborting operation.
Starting ZK Failover Controllers on NN hosts [node1 node2]
ERROR: Attempting to operate on hdfs zkfc as root
ERROR: but there is no HDFS_ZKFC_USER defined. Aborting operation.
然后我网上搜了搜解决方案,尝试在start-dfs.sh文件里面添加HDFS_JOURNALNODE_USER =root和HDFS_ZKFC_USER=root,但启动还是报同样错误,求大佬帮忙解答!
另外附上一张启动图和hdfs-site.sh和core-site.sh配置:
-------------------hdfs-site.xml--------------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- DataNode副本数 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
<!--指定虚拟名称到物理路径的映射-->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- node1下面有一个NameNode,node2下面有一个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node1:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node1:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node2:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node2:50070</value>
</property>
<!-- 指定NameNode的edits元数据的共享存储位置。也就是JournalNode列表
该url的配置格式:qjournal://host1:port1;host2:port2;host3:port3/journalId
journalId推荐使用nameservice,默认端口号是:8485 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485:node3:8485/mycluster</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop/hadoop-3.2.0/ha/jn</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 确定哪个NameNode处于Active-->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--用于在故障期间屏蔽Active-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id__dsa</value>
</property>
</configuration>
---------core-site.xml---------------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 元数据文件存放位置,真正使用的时候会被加载到内容中 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/hadoop-3.2.0/ha</value>
</property>
<!--默认访问路径-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!--配置zookeeper集群-->
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
</configuration>
以上就是我的HA配置文件,辛苦大佬帮忙分析下原因,鲜花!