无论是硬件还是软件,即使在设计、测试、制造、检验等过程中都执行了严格的质量控制规范,仍有可能会出现一些瑕疵。下面介绍的几个硬件或软件的BUG,是实践中经常碰到的,给大家做个参考,以便在您碰到时能够迅速地识别和处理。
1.Cisco 7200路由器I/O控制模块硬件Bug可能会引起系统启动时报错:
Warning: monitor nvram area is corrupt ... using default values
environment checksum in NVRAM failed
C7200 platform with 262144 Kbytes of main memory
路由器进Rommon模式。该问题据说出现的概率大约为10的负19次方,处理方法是返修I/O控制模块。
2.Cisco 7200、7500、GSR 平台上的提示:
*Nov 30 00:00:40.771:WARNING: Enviro Monitor Reference Voltage was ZERO !
*Nov 30 00:00:41.771:WARNING: Enviro Monitor Reference Voltage was ZERO !
*Nov 30 00:00:42.771:WARNING: Enviro Monitor Reference Voltage was ZERO !
*Nov 30 00:00:43.771:WARNING: Enviro Monitor Reference Voltage was ZERO !
*Nov 30 00:00:44.771: %ENVM-0-SHUT: Environmental Monitor initiated shutdown
Buffered messages:
System Bootstrap, Version 12.2(4r)B2, RELEASE SOFTWARE (fc2)
TAC Support: http://www.cisco.com/tac
Copyright (c) 2002 by cisco Systems, Inc.
以上提示看似电源毛病,其实也是一个硬件的Bug。表现为启动正常,启动完毕后重复出现告警,然后自动掉电(两个电源同时开着也这样),只有手动开机。show env 结果为:
Router#sh env
All measured values are normal
Router#sh env last
I/O Cont Inlet previously measured at 24C/75F
I/O Cont Outletpreviously measured at 24C/75F
NPE Inletpreviously measured at 25C/77F
NPE Outlet previously measured at 25C/77F
+3.45 Vis unmeasured
+5.15 Vis unmeasured
+12.15 V is unmeasured
-11.95 V is unmeasured
last shutdown reason - critical voltage
Router#sh env all
Power Supplies:
Power Supply 1 is unmeasured.
Power Supply 2 is unmeasured.
Temperature readings:
I/O Cont Inlet measured at 24C/75F
I/O Cont Outletmeasured at 25C/77F
NPE Inletmeasured at 26C/78F
NPE Outlet measured at 26C/78F
Voltage readings:
+3.45 V is unmeasured
+5.15 V is unmeasured
+12.15 Vis unmeasured
-11.95 Vis unmeasured
Envm stats saved 0 time(s) since reload
处理办法:返修I/O Controller
3.Cisco Catalyst 6500 SUP720的引擎,IOS 12.2.14软件版本,假如做NAT,当NAT转换条目达到约6000条以上时,就会出现如下提示:
*Mar2 01:19:56.738: %SYS-3-CPUHOG: Task ran for 2048 msec (39/2), process = IP NAT Ager, PC = 4021C300.
-Traceback= 4021C308 40EFB5CC 40EFBA64 40EFBF44
*Mar2 01:20:06.974: %SYS-3-CPUHOG: Task ran for 2172 msec (44/2), process = IP NAT Ager, PC = 4021C300.
-Traceback= 4021C308 40EFB5CC 40EFBA64 40EFBF44
系统隔一段时间就会重启一次,在Bootflash中记录Crash信息。 这个时候show proce cpu看到ip nat进程占CPU相当大:
------------------ show process cpu ------------------
CPU utilization for five seconds: 44%/8%; one minute: 43%; five minutes: 44%
PID Runtime(ms) InvokeduSecs 5Sec 1Min 5Min TTY Process
72 2006178 320.00%0.00%0.00% 0 Spanning Tree
73 0 200.00%0.00%0.00% 0 Const MPLS RP pr
74 1465884 75048881953.18%3.25%3.65% 0 IP Input
75248061414030.00%0.00%0.00% 0 CDP Protocol
76 0 100.00%0.00%0.00% 0 PPPATM Session d
77 0 200.00%0.00%0.00% 0 PASVC create VA
78 7856368 4520305 1738 28.58% 30.23% 30.71% 0 IP NAT Ager
7932 393 810.00%0.00%0.00% 0 HWIF QoS Process
后来查到是软件的Bug,还是思科内部的Bug,用CCO帐号也看不到。
后来升级到12.2.17就好了,同样数量的NAT条目,IP NAT Ager进程只占0.16%。