Our Problem:

We have an HP StoreVirtual / LeftHand OS multi-site SAN. Part of this SAN is it’s switching infrastructure, which is built out of 4 2910al-24g switches. Each switch has a 10-gig add on module providing 2 10gig ports.

Each site has 2 switches joined together via a lacp trunk. Between each site we have 2 10gig fiber pairs linking the sites. Hanging off this switching core we have a blade center per site, as well as 6 x p4500g2 StoreVirtual nodes per site. The blade centers have a pair of  10gig uplinks (1 per switch per side) in a active/passive configuration. Each p4500g2 node has a pair of 1gig uplinks (1 per switch per side) in a alb configuration. So our SAN network looks like this:

The configuration on these switches was setup by a vendor for us. At the time we were very new to the StoreVirtual world and needed the help. All was well for over an year! Then about a week ago we started to get a pages and notifications that a switch was down. This was disconcerting but you jump on the switches and all seemed well. We have never seen any traffic problems or anything wrong at all. This week we started to get multiple pages per night. We were not too happy. My colleague Josh put a call into HP support and they noticed we have a Spanning tree problem. Which switch thinks its the root node is flapping around.

Looking into this problem and others has illustrated what seems to be a miss configuration on our switches. Switch1 in each location is configured. Switch2 in each location is auto detecting its world and has no configuration set other then its local ip address.

So here we are! Part 1 of switch RE-configuration. Lets see if we can try and get these switches configured optimally. Our part 1 strategy is to just mitigate the spanning tree flapping. We do not know if this is indicating an hardware error or if each switch having the same priority is causing the flapping. We also discovered that no ntp was set so the logs out of the switches are less then useful.

Part1.A Configuration changes on our next Tuesday maintenance window:

All switches will get:

config
timesync sntp
sntp unicast
sntp server priority 1 ***.***.***.1
show sntp
write mem
exit

Then each switch will get a Spanning Tree priority set. We are going to try and be minimally disruptive as possible so we will be keeping the present/most often winning switch as the spanning tree root. Where X is 1, 2, 3, or 4 depending on which switch it is.

config
spanning-tree clear-debug-counters
spanning-tree priority X
show spanning-tree
write mem
exit

Part1.B Firmware upgrade to latest on our next Saturday maintenance window:

We will be applying the latest firmware to each switch. This is also a bit interesting so we will follow up with hp support to answer the question… Which firmware should we apply?

Current stable: W.15.08.0012
Support provided: W.15.10.0010
Early Available: W.15.12.0006

Part 1 | Part 1 follow up | Part 2 | Part 3 | Part 4

** Maintenance Announcement – DEV VM service interruption anticipated **

We are upgrading our 2910al-24g switches to the latest firmware. During the upgrade of the switch that provides the storage for the DEV VMware cluster will be unavailable, as such the Dev VMware cluster will also be shut down.

Production SANs are redundantly connected and maintenance will have no noticeable effect on these SANs, meaning that the OSU systems used by students, staff, and faculty will not experience a service interruption.

Start: 04/20/2013 at 9:00 PM

End:  04/20/2013 at 11:59 PM

If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.

** Maintenance Announcement – No service interruption anticipated **

We will be applying a configuration change to our iSCSI switches that support our StoreVirtual SAN. This is the storage network that back’s our VMware infrastructure.

We do not anticipate any service interruption. Our switching is redundant, we will only change the switches one at a time, and the changes should not be service interrupting.

Start: 04/16/2013 10:00 PM

End: 04/16/2013 11:00 PM

If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.

We will be working with NOC to add vlan 414 to our blade centers trunk. This will be done to one trunk port at a time, and one data center at a time.

Start Time: 4/06/13 at 9:00 pm

End Time: 4/06/13 at 9:30 pm

If you have any questions or concerns about this maintenance, please contact OSU-SIG ( at ) oregonstate.edu or call 7-help

 

 

This guide contains instructions for enabling LDAP authentication in Zenoss Core 4.2+ on a relatively clean install of CentOS 6 (64-bit).

Assumptions

  • you are running CentOS 6
  • you have installed Zenoss Core 4.2+ using the autodeploy script

Before You Begin

It’s recommended that you backup your Zenoss configuration, either through a VM snapshot (if that’s an option) or via the backup tool (Advanced -> Backups). You may also want to back up your acl_users settings as follows:

  1. Go to https://YOUR_ZENOSS_SERVER/zport/manage and log in as admin.
  2. Click acl_users in the tree view on the left side of the page.
  3. Click Import/Export.
  4. Leave “Export object id” blank, select dumpfile location, then click Export.

Install Required Auth Plugins

Download LDAPMultiPlugins, LDAPUserFolder, and python-ldap. The versions used as of time of writing this guide are as follows:

  • LDAPMultiPlugins 1.14
  • LDAPUserFolder 2.24
  • python-ldap 2.4.10

Copy the downloaded tarballs to the Zenoss server.

Next, install the prerequisite packages.

# yum install gcc python-devel openssl-devel openldap-devel

Then, use easy_install to install the three packages you downloaded above. (Note: You must use the easy_install tool if you installed Zenoss using the autodeploy script.)

# su - zenoss
zenoss@zenprod:~$ su
Password:
# cd ~/build
# easy_install Products.LDAPMultiPlugins-1.14.tar.gz
...
# easy_install Products.LDAPUserFolder-2.24.tar.gz
...
# easy_install python-ldap-2.4.10.tar.gz
...

Restart Zope.

zenoss@zenprod:~$ zopectl restart

Configure the LDAP Multi Plugin

  1. Go to https://YOUR_ZENOSS_SERVER/zport/manage and log in as admin.
  2. Click acl_users in the tree view on the left side of the page.
  3. Select LDAP Multi Plugin from the dropdown list and click Add.
  4. Configure the plugin. (Note: your configuration may vary depending on what you want to do, i.e. if you will be assigning roles based on LDAP groups or not.)

ID: <enter an ID>
Title: <enter a title>
LDAP Server: YOUR_LDAP_SERVER
check Use SSL if necessary
check Read-only
Login Name Attribute, User ID Attribute, RDN Attribute: UID (uid)
Users Base DN: YOUR_BASE_DN
select Groups not stored on LDAP server
Groups Base DN: <blank>
Manager DN: <blank>
User password encryption: SHA
Default User Roles: <blank>

  1. Click acl_users then click the LDAP config you just created from the list.
  2. Check the boxes next to “Authentication”, “User_Enumeration”, and “Role_Enumeration”.

At this point, you should be able to log in to Zenoss using credentials from LDAP.

Configure Authorization

To configure Zenoss role mappings from LDAP groups, please see this post: http://community.zenoss.org/message/30124#30124

Restricting Zenoss access to a subset of specific users

  1. Go to https://YOUR_ZENOSS_SERVER/zport/manage and log in as admin.
  2. Click acl_users in the tree view on the left side of the page.
  3. Click roleManager.
  4. Click Add a Role and enter “ZenNone” for the ID, then save.
  5. Click acl_users in the tree view on the left side of the page.
  6. Click your LDAP config.
  7. Select the Contents tab.
  8. Click acl_users in the list.
  9. Change Default User Roles to “ZenNone” and apply changes.
  10. Click acl_users in the tree view on the left side of the page.
  11. Click roleManager.
  12. Select the Security tab.
  13. Check all the checkboxes under Manager, Owner, and ZenManager. (IMPORTANT! If you do not do this step, you will lock your admin account out of the system!)
  14. Uncheck all the checkboxes under Acquire permission settings?
  15. Check the checkboxes for “Access contents information” and “View” under ZenUser.
  16. Click Save Changes.

When finished, users who are in LDAP are given restricted access (via the ZenNone role) by default, unless they have been granted a different Zenoss role. You can edit Zenoss role assignments via Zope manager -> acl_users -> roleManager.

Last night we upgraded from HP StoreVirtual LeftHand OS 10.0.00.1896 to 10.5.00.0149

When it came time for the Central Management Console (CMC) to reboot each node our Linux and Windows hosts noticed their respective gateway connection disappear. Each host retried once and got a new gateway connection from one of the remaining nodes in the cluster and all was well. This manifested in the logs of the affected hosts as follows:

Windows host:

3/26/2013 11:59:54 PM – Error Event ID 20 – iScsiPrt – Connection to the target was lost. The initiator will attempt to retry the connection.
3/26/2013 11:59:55 PM – Error Event ID 1 – iScsiPrt – Initiator failed to connect to the target. Target IP address and TCP Port number are given in dump data.
3/26/2013 11:59:59 PM – Informational Event ID 34 – iScsiPrt – A connection to the target was lost, but Initiator successfully reconnected to the target. Dump data contains the target name.

Linux Host:

03/27 00:08:15 iscsid: connection3:0 is operational after recovery (1 attempts)
03/27 00:08:14 kernel: [22973221.827841] connection3:0: detected conn error (1020)
03/27 00:08:12 iscsid: Kernel reported iSCSI connection 3:0 error (1020) state (3)
03/27 00:08:12 kernel: [22973219.322744] connection3:0: detected conn error (1020)

Several of our hosts were unlucky and randomly received a new gateway connection on a node that had yet to reboot as part of the LeftHand OS update. They then had a second event where the same thing happened again when it was time for the new node to reboot, leading them to receive yet another gateway connection.

What is interesting is our VMware ESXi 5.1 hosts did not notice their respective gateway connections drop or disappear through out the reboots of each StoreVirtual cluster.

Throughout the entire LeftHand OS upgrade no customer affecting service was impacted and all hosts kept on serving.

On Thursday the 28th we will be working with NOC to try and resolve an issue that arose when the 15 subnet was placed behind the new firewall.  Starting at 10pm NOC will re-enable the other member of the VPC and then implement the fix.  We will then verify that things are working as expected.  The estimate is for about 10 minutes total.  During this 10 minute window mail will not be sent, but queued on the webheads.  Those that use proxy.oregonstate.edu will not be able to view the Homepage or any other CWS hosted sites.

Start: 3/28/2013 @ 2200

End: 3/28/2013 @2230

If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.

** Maintenance Announcement – No service interruption anticipated **

On Saturday the 30th at 10pm we will be upgrading the Netscalers to version 10.  Since they are in HA mode no outages or downtime are expected.  In the unlikly event of problems changes will be rolled back and the maintenance will be scheudled for a later date.

Start: 03/30/2013 2200

End: 03/30/2013 2259

If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.

We will be working with NOC to add vlan 1140 and remove vlan 3817 to our blade centers trunk. This will be done to one trunk port at a time, and one data center at a time.

Start Time: 3/23/13 at 9:00 pm

End Time: 3/23/13 at 9:30 pm

If you have any questions or concerns about this maintenance, please contact OSU-SIG ( at ) oregonstate.edu or call 7-help

** Maintenance Announcement – No service interruption anticipated **

We will be moving our Milne Blade Center from mccnet101 to nexus gear. We will migrate non-redundant VMs to KAD b210. During the migration we expect to drop 1~2 packets when we cut over from the active fiber pair on mccnet101 to the nexus gear. Though we will do not expect a service interruption as all critical VMs will be moved to our other data center first.

Start: 4/27/2013 10pm

End: 4/27/2013 10:30pm

If you have questions or concerns about this maintenance, please contact the Shared Infrastructure Group at osu-sig (at) oregonstate.edu or call 737-7SIG.