Exadata上网卡参数的优化(主要是IB的MTU不同于一般值)

联系:QQ(5163721)

标题:Exadata上网卡参数的优化(主要是IB的MTU不同于一般值)

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

对比了一下普通主机和Exadata,发现主要的区别在于组播的配置,这个跟Exadata上使用IB的整个网络环境有关系(Infiniband card,IB Switch等等):

1,db和cell的以太网(eth0)的组播都是缺省值:1500
.
2,cell节点的IB的组播是2044:
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1    --------------cell节点的IB的MTU:2044
.
3,db节点的IB的组播是7000:
          UP BROADCAST RUNNING MULTICAST  MTU:7000  Metric:1    -------------  db节点IB的MTU是:7000
.          

MTU是Maximum Transmission Unit的缩写。意思是网络上传送的最大数据包。
最大传输单元这个参数通常与通信接口有关(网络接口卡、串口等),MTU也不是越大越好,因为MTU越大,传送一个数据包的延迟也越大;并且MTU越大,数据包中 bit位发生错误的概率也越大。因此,需要针对网络來进行最佳化。
MTU的单位是字节。一般来说,如果本机的MTU比网关的MTU大,大的数据包就会被拆开来传送,这样会产生很多数据包碎片,增加丢包率,降低网络速度。
把本机的MTU设成比网关的MTU小或相同,就可以减少丢包。
一般普通的机器缺省配置组播是缺省值1500,这个跟以太网的帧的设计有关系。
以前,Ethernet一般把数据分割为一定大小的帧(frame)的单位来进行传送接收,但在规格上帧的尺寸被定为1,518字节。
但是随着通讯器材的发展,现在的万兆网等都支持大帧(jumbo frames),帧的尺寸根据机器各种各样,大部分对应9,000~16,000字节左右。
.
.
要修改MTU的方法很简单(尽管很多人在RAC环境不正确的修改这个值导致了很多问题):
ifconfig eth0 mtu xxxx(你需要设置的值),比如: ifconfig eth0 mtu 9000
修改后,使用 netstat -i 或者ifconfig |grep MTU来查看既可以。
目前,Oracle支持在私有网络(interconnect)使用超过1500的组播(具体设置也要根据前面说的,看环境,不是越大越好。通常没有好的设计,一般不改)。
.
.
对于多播(MULTICAST),RAC要求必须开启,这个在Oracle官方的最佳实践中有明确说明:

对于Grid Infrastructure版本 11.2.0.2,私网必须在子网 230.0.1.0支持多播传输(multicast)。
Oracle Grid Infrastructure 11.2.0.2 的 Patch: 9974223  
(包含在 GI PSU 11.2.0.2.1 和更高版本中)对私网上的 224.0.0.251 多播地址启用了多播功能。
必须允许这两个地址中的其中一个所对应的私网使用多播(假设已应用补丁程序)。
Document 1212703.1 中提供了测试多播功能的其他信息和程序。

对于多播的检测,Oracle也提供了详细的方法:

# perl mcasttest.pl -n <node1>,<node2>,<node_n...> -i <interface1>,<interface2><interface_n...>
.
具体参见:
Grid Infrastructure Startup During Patching, Install or Upgrade May Fail Due to Multicasting Requirement (Doc ID
1212703.1)
.

类似下面的,就是多播检测失败的情况:

Test for Multicast address 230.0.1.0
Nov 8 09:05:33 | Multicast Failed for eth1 using address 230.0.1.0:42000
Nov 8 09:05:34 | Multicast Failed for eth2 using address 230.0.1.0:42001
Test for Multicast address 224.0.0.251
Nov 8 09:05:35 | Multicast Succeeded for eth1 using address 224.0.0.251:42002
Nov 8 09:05:36 | Multicast Succeeded for eth2 using address 224.0.0.251:42003

.
关于组播,在普通环境(非Exadata)有一些注意事项:
1,一般就采用缺省的1500,如果超过这个值,需要特殊的配置,具体请参考:
CRS root.sh Script Failing on Second Node When MTU Larger than 1500 (Doc ID 1085885.1)
.
2,如果节点之间MTU设置不同,那么可能造成实例不能启动,具体请参考:
Instances Unable To Start If MTU Size Is Different for Cluster_interconnect (Doc ID 300388.1)
.
下面是db节点的ifconfig的具体信息和说明:

[root@dm01db01 ~# ifconfig
bondeth0  Link encap:Ethernet  HWaddr 00:10:E0:68:B7:C1    --------encap:Ethernet表示 网卡连接网络类型Ethernet(以太网);HWaddr是网卡物理地址(MAC地址
          inet addr:10.45.11.50  Bcast:10.45.11.255  Mask:255.255.252.0  ---inet addr是绑定IP地址,Bcast是这个IP对应的广播地址,Mask收集子网掩码
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1   ---UP(打开) BROADCAST(广播) MULTICAST(组播) RUNNING(运行)。MTU是最大传输单元,以太网缺省是1500
          RX packets:7610626 errors:0 dropped:1950540 overruns:0 frame:0   ---RX packets 接受packets数目,errors表示错误packets数目,dropped表示丢弃packets数目
          TX packets:3719744 errors:0 dropped:0 overruns:0 carrier:0  ---TX packets 传输packets数目,errors表示错误packets数目,dropped表示丢弃packets数目
          collisions:0 txqueuelen:0            ----------txqueuelen是发送packets的queue长度上限.0应该是无限,以太网是1000
          RX bytes:544097236 (518.8 MiB)  TX bytes:230272474 (219.6 MiB)   ----RX bytes时接收字节,TX bytes是发送字节

bondeth0:1 Link encap:Ethernet  HWaddr 00:10:E0:68:B7:C1  
          inet addr:10.45.11.51  Bcast:10.45.11.255  Mask:255.255.252.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bondeth0:2 Link encap:Ethernet  HWaddr 00:10:E0:68:B7:C1  
          inet addr:10.45.11.54  Bcast:10.45.11.255  Mask:255.255.252.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

eth0      Link encap:Ethernet  HWaddr 00:10:E0:68:B7:C0  
          inet addr:10.45.1.191  Bcast:10.45.1.255  Mask:255.255.255.0
          inet6 addr: fd0b:dddd:b7d:0:210:e0ff:fe68:b7c0/64 Scope:Global
          inet6 addr: fe80::210:e0ff:fe68:b7c0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:215294 errors:0 dropped:0 overruns:0 frame:0
          TX packets:103056 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:39010029 (37.2 MiB)  TX bytes:15403532 (14.6 MiB)

eth1      Link encap:Ethernet  HWaddr 00:10:E0:68:B7:C1  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:5660114 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3719744 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:395715879 (377.3 MiB)  TX bytes:230272474 (219.6 MiB)

eth2      Link encap:Ethernet  HWaddr 00:10:E0:68:B7:C1  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:1950512 errors:0 dropped:1950510 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:148381357 (141.5 MiB)  TX bytes:0 (0.0 b)

eth3      Link encap:Ethernet  HWaddr 00:10:E0:68:B7:C3  
          inet addr:192.168.18.11  Bcast:192.168.19.255  Mask:255.255.254.0
          inet6 addr: fe80::210:e0ff:fe68:b7c3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:81 errors:0 dropped:0 overruns:0 frame:0
          TX packets:221 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:12056 (11.7 KiB)  TX bytes:31357 (30.6 KiB)

Ifconfig uses the ioctl access method to get the full address information, which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are displayed correctly.
Ifconfig is obsolete! For replacement check ip.
ib0       Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00   -----encap:InfiniBand 表示 网卡连接网络类型Ethernet(以太网);HWaddr是网卡物理地址(MAC地址)
          inet addr:192.168.10.1  Bcast:192.168.11.255  Mask:255.255.252.0  ---inet addr是绑定IP地址,Bcast是这个IP对应的广播地址,Mask收集子网掩码
          inet6 addr: fe80::210:e000:15a:e611/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:7000  Metric:1    ---UP(打开) BROADCAST(广播) MULTICAST(组播) RUNNING(运行)。MTU是最大传输单元,db节点IB的MTU是:7000
          RX packets:843743 errors:0 dropped:0 overruns:0 frame:0  ---RX packets 接受packets数目,errors表示错误packets数目,dropped表示丢弃packets数目
          TX packets:113542 errors:0 dropped:0 overruns:0 carrier:0  ---TX packets 传输packets数目,errors表示错误packets数目,dropped表示丢弃packets数目
          collisions:0 txqueuelen:1024             ----------txqueuelen是发送packets的queue长度上限.0应该是无限,IB网络是1024
          RX bytes:188246789 (179.5 MiB)  TX bytes:7920860 (7.5 MiB)   ----RX bytes时接收字节,TX bytes是发送字节

Ifconfig uses the ioctl access method to get the full address information, which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are displayed correctly.
Ifconfig is obsolete! For replacement check ip.
ib0:1     Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:169.254.102.112  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:7000  Metric:1

Ifconfig uses the ioctl access method to get the full address information, which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are displayed correctly.
Ifconfig is obsolete! For replacement check ip.
ib1       Link encap:InfiniBand  HWaddr 80:00:00:49:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:192.168.10.2  Bcast:192.168.11.255  Mask:255.255.252.0
          inet6 addr: fe80::210:e000:15a:e612/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:7000  Metric:1
          RX packets:1141936 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1965335 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1024 
          RX bytes:894917258 (853.4 MiB)  TX bytes:878056699 (837.3 MiB)

Ifconfig uses the ioctl access method to get the full address information, which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are displayed correctly.
Ifconfig is obsolete! For replacement check ip.
ib1:1     Link encap:InfiniBand  HWaddr 80:00:00:49:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:169.254.237.6  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:7000  Metric:1

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2204010 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2204010 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:191767284 (182.8 MiB)  TX bytes:191767284 (182.8 MiB)

[root@dm01db01 ~# 
此条目发表在 体系架构 分类目录,贴了 , , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注