首页 新闻 会员 周边

resourcemanager启动报错,两个resourcemanager都报这个错,进程起来了,但是两个rm既不是standby也不是active

1
悬赏园豆:20 [已解决问题] 解决于 2017-03-14 22:35

两个resourcemanager都报这个错,进程起来了,但是两个rm既不是standby也不是active

2016-12-27 14:27:08,942 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Loading allocation file /etc/hadoop/conf.empty/fair-scheduler.xml
2016-12-27 14:27:08,944 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Bad element in allocations file: poolMaxJobsDefault
2016-12-27 14:27:08,944 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Bad element in allocations file: userMaxJobsDefault
2016-12-27 14:27:08,946 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn     OPERATION=transitionToActive    TARGET=RMHAProtocolService RESULT=FAILURE   DESCRIPTION=Exception transitioning to active   PERMISSIONS=All users are allowed
2016-12-27 14:27:08,946 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:124)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:291)
        at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:122)
        ... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException: Application with id application_1482728774024_0001 is already present! Cannot add a duplicate!
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:924)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:965)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:961)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:961)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:282)
        ... 5 more
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Application with id application_1482728774024_0001 is already present! Cannot add a duplicate!
        at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:310)
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:427)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1126)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:501)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        ... 13 more
2016-12-27 14:27:08,947 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2016-12-27 14:27:08,951 INFO org.apache.zookeeper.ZooKeeper: Session: 0x1593a905d6f13a0 closed

记忆残留的主页 记忆残留 | 初学一级 | 园豆:168
提问于:2016-12-27 15:00
< >
分享
最佳答案
1

已解决,yarn.resourcemanager.recovery.enabled改为false

这个属性的作用是:当yarn中有任务在跑时,如果rm宕机,设置成ture,rm重启时会恢复原来没有跑完的application,

日志提示Application with id application_1482728774024_0001 is already present! Cannot add a duplicate!

application_1482728774024_0001不存在,无法恢复,不知道为什么这个application被删除了,但是还保存了它的状态。很诡异!

记忆残留 | 初学一级 |园豆:168 | 2017-01-04 12:56
其他回答(1)
0

是不是你scheduler的配置有问题啊

收获园豆:20
让我发会呆 | 园豆:2929 (老鸟四级) | 2016-12-27 15:09

这是我的yarn-site的配置,能帮忙看看吗?


<configuration>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value>
</property>

<property>
<name>yarn.log.aggregation-enable</name>
<value>true</value>
</property>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data1/yarn/local,file:///data2/yarn/local,file:///data3/yarn/local</value>
</property>

<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///data1/yarn/logs,file:///data2/yarn/logs,file:///data3/yarn/logs</value>
</property>

<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs:/var/log/hadoop-yarn/apps</value>
</property>

<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>34816</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>17408</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>34816</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>27852</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>

<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>


<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>myclusterrm</value>
</property>

<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>

<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>


<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node04</value>
</property>

<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node05</value>
</property>

<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node01:2181,node02:2181,node03:2181</value>
</property>

<property>
<name>yarn.web-proxy.address</name>
<value>node04</value>
</property>

<property>
<name>yarn.resourcemanager.zk-acl</name>
<value>world:anyone:rwcda</value>
</property>

<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>node04:8032</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>node05:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>node04:8030</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>node05:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>node04:8031</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>node05:8031</value>
</property>
</configuration>

支持(0) 反对(0) 记忆残留 | 园豆:168 (初学一级) | 2016-12-27 15:12

@记忆残留: 嘻嘻   hadoop我不熟的,只是看到org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Loading allocation file /etc/hadoop/conf.empty/fair-scheduler.xml
2016-12-27 14:27:08,944 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Bad element in allocations file: poolMaxJobsDefault觉得配置的参数可能有问题。

支持(0) 反对(0) 让我发会呆 | 园豆:2929 (老鸟四级) | 2016-12-27 15:41
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册