背景
我的hadoop先配置的普通模式启动测试没问题,后来改为HA高可用模式,启动没有问题,进程都在,50070端口访问显示正常,Namenode节点active,secondnamenode节点standby,通过网址http://sj-node1:50070/explorer.html#/访问也正常显示,但是在命令行做文件操作的时候一直报错
这个180.168.41.175
ip网上搜索了下发现是电信对不能解析的域名自动给出的ip,比如:ping sdfsdfasdf,有点想不通怎么去访问网络去了。
分析
appcity是自定义的集群名,不应该去访问网络的,所以应该还是配置的问题
-
core-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>ipc.client.connect.max.retries</name>
<value>100</value>
<description>Indicates the number of retries a client will make to establish
a server connection.
</description>
</property>
<property>
<name>ipc.client.connect.retry.interval</name>
<value>1000</value>
<description>Indicates the number of milliseconds a client will wait for
before retrying to establish a server connection.
</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://appcity</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://appcity</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/soft/hadoop-2.5.1/data</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>sj-node2:2181,sj-node3:2181,sj-node4:2181</value>
</property>
</configuration> -
hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.nameservices</name>
<value>appcity</value>
</property>
<property>
<name>dfs.ha.namenodes.appcity</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.appcity.nn1</name>
<value>sj-node1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.appcity.nn2</name>
<value>sj-node2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.appcity.nn1</name>
<value>sj-node1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.appcity.nn2</name>
<value>sj-node2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://sj-node2:8485;sj-node3:8485;sj-node4:8485/appcity</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_dsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/soft/hadoop-2.5.1/data/jn</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration> -
虽然配置文件检查了好几遍也没发现哪不对,最终要的是控制台启动的时候也不抱什么错误,只有在文件操作的时候才报错。网上也搜索不到相关的问题,及解决方案,自己把集群格式化了几遍也没效果,实在没办法,只能逐条配置属性逐条的看最后发现不合理的地方:
1
2
3
4<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property> -
前面一直坚持value,这里的name里有还有个
mycluster
,我的集群名叫appcity
,mycluster
,应该是官网demo上的名称,忘记改了,把它改成我的集群名,重启,果然文件解决