We made the changes requested of us in Part 2. However we are still experiencing the occasional ping/fail on one of the switches. We have not seen an loop detected. Interesting to have loop protection turned on and doesn’t hurt anything but the blade centers and flex10s do not seem to be the problem.

So where do we go now?! Another call into hp support and they have directed us to perform the following steps:

  1. Re-seat all components in the switch that keeps paging us.
  2. Enable syslog on all switches and see if they say anything.
  3. Add a timezone offset to the ntp configuration.

Hopefully this will show us what is happening as we still do not have a resolution. Or is it time to start replacing hardware? Is the sfp unit in the switch bad? is the 10g module bad? is the switch bad? lots of questions and no real firm answers as to why we get woken up in the middle of the night yet all seems fine except for a quick down/up event.

Part 1 | Part 1 follow up | Part 2 | Part 3 | Part 4

** Maintenance Announcement – No service interruption anticipated **

We will be applying a configuration change to our iSCSI switches that support our StoreVirtual SAN. This is the storage network that back’s our VMware infrastructure.

We do not anticipate any service interruption. Our switching is redundant, we will only change the switches one at a time, and the changes should not be service interrupting.

Start: 06/01/2013 10:00 PM

End: 06/01/2013 11:00 PM

If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.

On May 25th I will be applying vmware updates to ESXi host in DEV cluster. This is a rolling maintenance and will not create any outages for running guest vm’s.  As they will be live migrated off one blade at time as its put into maintenance mode and patches applied.

Regular remote access mechanisms like ssh or remote desktop to the VMs will be unaffected. All VMs and their services will continue to run as normal. There should be no customer impact.

Start: 05/25/2013 9:00 PM

End: 05/25/2013 11:30 PM

If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.

This last Monday we experienced two more pages for downed events for one of our switches one at 8am and one at 5pm. This did not impact service but is troubling as we want everything to be healthy all the time in our environment. For a description of the problem we are seeing take a look at my earlier blog post and its follow up.

I put in a support call to HP and referenced the older ticket and the repeat of the problem. Support requested a copy of the output from each switch from the command show tech all. I dumped the output and sent it off to the helpful support person. Later that day the support person called back and asked about why were were on such a new version of the firmware! So I pointed out that it was their support whom gave us the copy of the firmware and told us to run it. At the end of this support call HP has come back with two changes. They would like us to add loop protect on the ports that feed our blade centers. They would also like us to reconfigure both switch2s in each site so that their trunk ports are statically defined instead of auto-detected/dynamic.

So our next maintenance window is this Saturday and we will perform the following changes:

All switches will get:

config
loop-protect mode port
loop-protect a2
write mem
exit

Each switch2 will get (where ? is 1 for site1 and 2 for site2):

config
no interface <port list> lacp
trunk <port list> trk? lacp
write mem
exit

Part 1 | Part 1 follow up | Part 2 | Part 3 | Part 4

** Maintenance Announcement – No service interruption anticipated **

We will be applying a configuration change to our iSCSI switches that support our StoreVirtual SAN. This is the storage network that back’s our VMware infrastructure.

We do not anticipate any service interruption. Our switching is redundant, we will only change the switches one at a time, and the changes should not be service interrupting.

Start: 05/18/2013 10:00 PM

End: 05/18/2013 11:00 PM

If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.

** Maintenance Announcement – NO service interruption anticipated **

The first change will be to deny non-secure SSL renegotiation, to address the vulnerability described in RFC 5746.  This will not have an impact on legitimate users, nor will they notice any changes.

The second change will be to replace the self-signed certificate used for logging into the management interface and changing the address (url) used for management.  This change will only impact those that log into the netscaler to manage it.   An email will be sent to those that will be affected.

The last change will to implement a fix for an asymmetric routing issue we’ve been dealing with.  This will not have a noticeable impact on existing services or users.  We have to make this change to allow “cross talk” among services.

Start: 5/18/2013  2200

End: 5/18/2013 2300


If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.

We will be working with NOC to add Vlan418 to our blade centers trunk. This will be done to one trunk port at a time, and one data center at a time.

Start Time: 5/11/13 at 9:00 pm

End Time: 5/11/13 at 9:30 pm

If you have any questions or concerns about this maintenance, please contact OSU-SIG ( at ) oregonstate.edu or call 7-help

** Maintenance Announcement – NO service interruption anticipated **

 

In conjunction with NOC we will be adding vlan 479 onto the port channel of the Netscalers.  This change will not result in any interruption of service nor will it be noticeable to end users.

The standby device will have the vlan added first, then made to be primary.  It will then have the vlan defined on it.  Once the addition had been verified to not have any negative effects the vlan will be added to the now standby Netscaler trunk port.  It will then be made primary again and verified.

 Start: 5/4/2013  2200

End: 5/4/2013 2300

 

If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.

** Maintenance Announcement – NO service interruption anticipated **

On Tuesday April 30th at 2200 (10:00pm) we will be modifying the SSL cert for secure.oregonstate.edu.  This is not a cert renewal, it is a new cert that has a Subject Alternative Name (SAN) for oregonstate.edu.  End users will not be affected.

 

Start: 04/30/2013 at 2200

End:  04/30/2013 at 2300

If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.

Part1.A Post Config Follow up:

We applied the sNTP change last night and none of the switches will pull time. After a quick searching on the internet we found this: 2910al-48G-can-not-get-time-from-W2K3-NTP-server

From this thread we learn that the firmware (W.14.38) we are running has a bug talking to the management host which also serves as the ntp server. So while the sNTP config we have now is good, the firmware is not. On this coming Saturday maintenance when we install the new firmware we will hopefully fix sNTP and get good timestamps on logs!

Spanning tree configuration went as expected and we saw the changes take place as we made them. We have not had a recurrence of spanning tree flapping, but we have had several instances where we would go a day or two with out an event. So we are still in a wait and see game.

Part1.B Pre-firmware questions to HP Support:

I sent the following response in on our open ticket with HP:

HP Support recommended firmware version W.15.10.0010 (and gave us a copy)
I see the current HP stable version is W.15.08.0012
And there is a early release version of W.15.12.0006
(see 2910al firmware download) Is there a particular reason support sent us a version in the middle of these two? Which would be the best version to load on the switches?

HP support then replied back with the response:

Hello

This email is regarding the case 4************, for the 2910al also the version W.15.10.0010 is stable but havent been posted on the website, and earliest availability version W.15.12.0006 you are right seems to be a new one but I dont see it under the list of 2910al software release versions that’s why suggest to use the W.15.10.0010

If HP support is going to recommend this, considers it stable, and will provide support to us running it then that is what we will do. We just want to be running in a current supported configuration. So we will apply W.15.10.0010 during this Saturday’s maintenance window as planned.

Part 1 | Part 1 follow up | Part 2 | Part 3 | Part 4