HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: Path lost during 3parOS upgrade
PostPosted: Tue Dec 11, 2018 4:43 am 

Joined: Sun Feb 12, 2017 2:06 pm
Posts: 9
Hi all,

we had an unpleasant experience during a FW upgrade on our 20800: we lost FC paths during node reboot.
The OS was upgraded by HPE to 3.3.1 MU3, all the machines logged on the box lost paths during the activity.
Luckily almost all the machines had solid multipath and zoning configuration, so only logged the event of the single path failure during the alternated node reboot:

Nov  12 11:20:15 srv0001 multipathd: 129:80: mark as failed
Nov  12 11:20:15 srv0001 multipathd: remaining active paths: 3
Nov  12 11:20:15 srv0001 kernel: [86171.322400] sd 4:0:4:160: rejecting I/O to offline device
Nov  12 11:20:15 srv0001 kernel: [86171.322421] sd 4:0:4:160: [sdet] killing request
Nov  12 11:20:15 srv0001 kernel: [86171.322428] sd 4:0:4:160: rejecting I/O to offline device
Nov  12 11:20:15 srv0001 kernel: [86171.322463] sd 4:0:4:160: [sdet]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov  12 11:20:15 srv0001 kernel: [86171.322475] sd 4:0:4:160: [sdet] CDB: Read(10): 28 00 00 01 00 03 00 00 01 00
Nov  12 11:20:15 srv0001 kernel: [86171.322502] end_request: I/O error, dev sdet, sector 65539
Nov  12 11:20:15 srv0001 kernel: [86171.322540] device-mapper: multipath: Failing path 129:80.
Nov  12 11:20:15 srv0001 multipathd: sdet - tur checker reports path is down


We had some reassurance from HPE team that with port persistence we wouldn't detect any issue on the server themselves during the upgrade, any of you had similar issues?


Top
 Profile  
Reply with quote  
 Post subject: Re: Path lost during 3parOS upgrade
PostPosted: Tue Dec 11, 2018 7:11 am 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 1570
Location: Europe
Never had that issue but not upgraded anything to 3.3.1 MU3 either...

Was it the first upgrade of the box? If that was the case the SAN switches need to have NPIV enabled for persistent ports to work. For Brocade that is enabled by default, for Cisco it is disabled by default (and it needs to be enabled before the port goes up).

_________________
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.


Top
 Profile  
Reply with quote  
 Post subject: Re: Path lost during 3parOS upgrade
PostPosted: Tue Dec 11, 2018 7:57 pm 
Site Admin
User avatar

Joined: Tue Aug 18, 2009 10:35 pm
Posts: 1328
Location: Dallas, Texas
Spit balling some ideas:

You may need to validate that your 3PAR host ports divided up between SAN-A and SAN-B inline with port persistence best practice. Also, in AIX (and possibly other flavors of *NIX), the installation guide calls for enabling DynamicTracking and Fastfail on the HBA.

_________________
Richard Siemers
The views and opinions expressed are my own and do not necessarily reflect those of my employer.


Top
 Profile  
Reply with quote  
 Post subject: Re: Path lost during 3parOS upgrade
PostPosted: Wed Dec 12, 2018 11:42 am 

Joined: Sun Feb 12, 2017 2:06 pm
Posts: 9
MammaGutt wrote:
Never had that issue but not upgraded anything to 3.3.1 MU3 either...

Was it the first upgrade of the box? If that was the case the SAN switches need to have NPIV enabled for persistent ports to work. For Brocade that is enabled by default, for Cisco it is disabled by default (and it needs to be enabled before the port goes up).



Yes it was the first upgrade of this box, We have Brocade DCX so, for what I've been able to assess, we have NPIV enabled.

We are calling HPE support to analyze our logs, at first glance all seems correct


Top
 Profile  
Reply with quote  
 Post subject: Re: Path lost during 3parOS upgrade
PostPosted: Wed Dec 12, 2018 11:48 am 

Joined: Sun Feb 12, 2017 2:06 pm
Posts: 9
Richard Siemers wrote:
Spit balling some ideas:

You may need to validate that your 3PAR host ports divided up between SAN-A and SAN-B inline with port persistence best practice. Also, in AIX (and possibly other flavors of *NIX), the installation guide calls for enabling DynamicTracking and Fastfail on the HBA.


Hi Richard, that was our first suspect but we checked and found correct cabling, zoning and masking redundancies.
All machines gets to the box via separate fabric and land on different nodes following HPE best practices.

We don't have enterprise machines but only Linux hosts.


Top
 Profile  
Reply with quote  
 Post subject: Re: Path lost during 3parOS upgrade
PostPosted: Wed Dec 12, 2018 11:59 am 

Joined: Wed Nov 09, 2011 12:01 pm
Posts: 392
Did you see a path reconnect in the logs too, if so how much time was between the two events?

Some of our Solaris hosts saw paths disappear and reappear during node reboots but with less then a second between events. You can't have two port WWNs the same so I'd expect some brief break between the rebooting node ports going offline and the partner node ports presenting the other WWNs.

This was on a 3.3.1MU2 to 3.3.1MU3 upgrade.


Top
 Profile  
Reply with quote  
 Post subject: Re: Path lost during 3parOS upgrade
PostPosted: Mon Dec 17, 2018 9:16 am 

Joined: Sun Feb 12, 2017 2:06 pm
Posts: 9
ailean wrote:
Did you see a path reconnect in the logs too, if so how much time was between the two events?

Some of our Solaris hosts saw paths disappear and reappear during node reboots but with less then a second between events. You can't have two port WWNs the same so I'd expect some brief break between the rebooting node ports going offline and the partner node ports presenting the other WWNs.

This was on a 3.3.1MU2 to 3.3.1MU3 upgrade.


Hi,
between path lost and path reinstated passed more than a minute, I recon that happened only because the node that rebooted came back online.
We are looking at our multipath timeout configuration.

TY


Top
 Profile  
Reply with quote  
 Post subject: Re: Path lost during 3parOS upgrade
PostPosted: Mon Dec 17, 2018 9:30 am 

Joined: Wed Nov 09, 2011 12:01 pm
Posts: 392
master3117 wrote:
ailean wrote:
Did you see a path reconnect in the logs too, if so how much time was between the two events?

Some of our Solaris hosts saw paths disappear and reappear during node reboots but with less then a second between events. You can't have two port WWNs the same so I'd expect some brief break between the rebooting node ports going offline and the partner node ports presenting the other WWNs.

This was on a 3.3.1MU2 to 3.3.1MU3 upgrade.


Hi,
between path lost and path reinstated passed more than a minute, I recon that happened only because the node that rebooted came back online.
We are looking at our multipath timeout configuration.

TY


I think our MU3 update took around 10mins for each node reboot, so when working you should see a sub second blip for port persistence to take over the rebooting node paths, then 10mins of node restart, then another sub second blip as the paths handed back to the node and then typically the engineer will pause while you check hosts before started the next reboot.

If it's around 60 seconds that sounds like a host timeout value, they are often configured to 30 or 60 seconds by default on a lot of hosts.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 


Who is online

Users browsing this forum: Google [Bot] and 65 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt