We have found out excessive frame errors in ifconfig output and a lot of rx_crc_errors in ethtool output.
What made things more interesting is that it was observed on two different servers connected to the same switch (which gave us a clue that there might be something wrong with the switch itself).
Layer 1 issues
Most people advise to check cables or hardware (NIC/switch) as rx_crc_errors indicates layer 1 issues. It might be the case if you have problems on one hosts only but having the same issue on different hosts from the same subnet made the switch guilty from the very beginning.ifconfig and ethtool outputs
From ifconfig errors and frame counter were raising:eth0 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx
...
RX packets:277593775 errors:12013 dropped:0 overruns:0 frame:11763
I've started monitoring this using:
# for i in `seq 1 100`; do ifconfig eth0 | grep frame; sleep 1; done
RX packets:277593775 errors:12128 dropped:0 overruns:0 frame:11877
RX packets:277593775 errors:12135 dropped:0 overruns:0 frame:11884
RX packets:277593775 errors:12143 dropped:0 overruns:0 frame:11892
(...)
When checking ethtool -S eth0 rx_crc_errors were raising in the same rate.
Verify NIC settings
Run ethtool eth0 to see the current speed and duplex:
# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: umbg
Wake-on: d
Current message level: 0x00000007 (7)
Link detected: yes
You may also check dmesg to find out if there were any changes for eth0:
# dmesg | grep eth0
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
8021q: adding VLAN 0 to HW filter on device eth0
e1000: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX
Check switch settings
Finally I've asked network guys to verify switch setup.
It appeared that they've replaced old switch recently and put new one with slightly different settings. All the ports on the switch were set to auto:auto (speed & duplex).
They have found out that somehow switch has negotiated 100Mbit half duplex instead of 100 full duplex for all the servers' connections.
We have fixed this issue by setting up 100Mbit full duplex on all required port on the switch.
How to reset ifconfig counters?
After this issue we had a lot of errors logged on the interfaces. Unfortunately resetting these counters may be done only in two ways:
- reload NIC drivers module (modprobe -r module; modprobe module)
- reboot the box
If you're not sure what module to unload check ethtool -i eth0 output:
# ethtool -i eth0
driver: e1000
version: 7.3.21-k4-3-NAPI
No comments:
Post a Comment