This last Monday we experienced two more pages for downed events for one of our switches one at 8am and one at 5pm. This did not impact service but is troubling as we want everything to be healthy all the time in our environment. For a description of the problem we are seeing take a look at my earlier blog post and its follow up.
I put in a support call to HP and referenced the older ticket and the repeat of the problem. Support requested a copy of the output from each switch from the command show tech all. I dumped the output and sent it off to the helpful support person. Later that day the support person called back and asked about why were were on such a new version of the firmware! So I pointed out that it was their support whom gave us the copy of the firmware and told us to run it. At the end of this support call HP has come back with two changes. They would like us to add loop protect on the ports that feed our blade centers. They would also like us to reconfigure both switch2s in each site so that their trunk ports are statically defined instead of auto-detected/dynamic.
So our next maintenance window is this Saturday and we will perform the following changes:
All switches will get:
config loop-protect mode port loop-protect a2 write mem exit
Each switch2 will get (where ? is 1 for site1 and 2 for site2):
config no interface <port list> lacp trunk <port list> trk? lacp write mem exit