hdfs异常记录

背景

我的hadoop先配置的普通模式启动测试没问题,后来改为HA高可用模式,启动没有问题,进程都在,50070端口访问显示正常,Namenode节点active,secondnamenode节点standby,通过网址http://sj-node1:50070/explorer.html#/访问也正常显示,但是在命令行做文件操作的时候一直报错 这个180.168.41.175ip网上搜索了下发现是电信对不能解析的域名自动给出的ip,比如:ping sdfsdfasdf,有点想不通怎么去访问网络去了。

分析

appcity是自定义的集群名,不应该去访问网络的,所以应该还是配置的问题

  • core-site.xml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
     <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>ipc.client.connect.max.retries</name>
    <value>100</value>
    <description>Indicates the number of retries a client will make to establish
    a server connection.
    </description>
    </property>
    <property>
    <name>ipc.client.connect.retry.interval</name>
    <value>1000</value>
    <description>Indicates the number of milliseconds a client will wait for
    before retrying to establish a server connection.
    </description>
    </property>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://appcity</value>
    </property>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://appcity</value>
    </property>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/soft/hadoop-2.5.1/data</value>
    </property>
    <property>
    <name>ha.zookeeper.quorum</name>
    <value>sj-node2:2181,sj-node3:2181,sj-node4:2181</value>
    </property>
    </configuration>

  • hdfs-site.xml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    <property>
    <name>dfs.nameservices</name>
    <value>appcity</value>
    </property>
    <property>
    <name>dfs.ha.namenodes.appcity</name>
    <value>nn1,nn2</value>
    </property>
    <property>
    <name>dfs.namenode.rpc-address.appcity.nn1</name>
    <value>sj-node1:8020</value>
    </property>
    <property>
    <name>dfs.namenode.rpc-address.appcity.nn2</name>
    <value>sj-node2:8020</value>
    </property>
    <property>
    <name>dfs.namenode.http-address.appcity.nn1</name>
    <value>sj-node1:50070</value>
    </property>
    <property>
    <name>dfs.namenode.http-address.appcity.nn2</name>
    <value>sj-node2:50070</value>
    </property>
    <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://sj-node2:8485;sj-node3:8485;sj-node4:8485/appcity</value>
    </property>
    <property>
    <name>dfs.client.failover.proxy.provider.mycluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
    </property>
    <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_dsa</value>
    </property>
    <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/opt/soft/hadoop-2.5.1/data/jn</value>
    </property>
    <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
    </property>
    </configuration>

  • 虽然配置文件检查了好几遍也没发现哪不对,最终要的是控制台启动的时候也不抱什么错误,只有在文件操作的时候才报错。网上也搜索不到相关的问题,及解决方案,自己把集群格式化了几遍也没效果,实在没办法,只能逐条配置属性逐条的看最后发现不合理的地方:

    1
    2
    3
    4
     <property>
    <name>dfs.client.failover.proxy.provider.mycluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

  • 前面一直坚持value,这里的name里有还有个mycluster,我的集群名叫appcitymycluster,应该是官网demo上的名称,忘记改了,把它改成我的集群名,重启,果然文件解决