3PAR Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 9 posts ] 
Author Message
 Post subject: 3Par 3.3.1 - Crashed and completed Cold Reboot
PostPosted: Thu Sep 07, 2017 1:02 am 

Joined: Mon Aug 07, 2017 8:18 pm
Posts: 3
Hi all,

We had a frightning experience on Tuesday night with our brand new (2 weeks into production) 3par 8400 running 3.3.1 MU4, crashing and then subsequently it did a cold reboot on all four nodes. This then caused the array to start doing integrity checking on the 25TB's of data which were inaccessible for the entire 6 hour process.

HPE have told us that they haven't seen this before in 3.3.1 and have escalated it to their Engineering team to determine what went wrong and hopefully for all of us, provide a fix to prevent it from happening again..

Logs:

Event ID: 918446 Node 0 Customer Alert - No, Service Alert - Yes
Severity: Critical
Event time: Tue Sep 05 19:46:14 2017
Event type: Configuration Lock Hold Time
Alert ID: null
Msg ID: e001c
Component: System Manager
Event string: lock hold seconds: 0, virtual volume lock count: 1, ioctl request count: 7, mcall active count: 12, mcall request waiting count: 0, mcall request blocked count: 0, mcalls (msec/name/pid): 984589336/MC_NEVER_RETURN/30015 338092/MCKV_OKV_QUERY/40619 550011/MCKV_OKV_QUERY/40535 154533/MCKV_OKV_QUERY/40578 520083/MCKV_OKV_QUERY/40623 429607/MCKV_OKV_QUERY/40625 161511/MCKV_OKV_QUERY/40644 373667/MCKV_OKV_QUERY/40645 245012/MCKV_OKV_QUERY/40651 64181/MCKV_OKV_QUERY/40744 53980/MCVL_REMOVE/40745 42161/MCVL_MAKE/40756.
Notification key: 0x00e001c

-->
Event ID: 919637 Node 0 Customer Alert - No, Service Alert - Yes
Severity: Critical
Event time: Tue Sep 05 20:05:01 2017
Event type: Process Event Handling Appears Unresponsive
Alert ID: null
Msg ID: 3f0003
Component: Node 0
Event string: sysmgr event handling appears to be unresponsive.
Notification key: 0x03f0003

-->

35 minutes later it completed an automatic Cold Reboot.
Just thought people should be informed as to what is being seen in the field..
Still feeling very nervous.

The only answer I have back from them so far is as below:

We have reviewed the logs and could see a few IO control commands outstanding causing the array to not respond, the outcome of this is, the array wouldn’t respond to commands in CLI and also in SSMC resulting in crashing the node to access the array again.

Will let you know what is found, once I know myself.

Andrew.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par 3.3.1 - Crashed and completed Cold Reboot
PostPosted: Thu Sep 07, 2017 6:16 am 

Joined: Tue Mar 29, 2016 9:32 pm
Posts: 32
morrie_morrie wrote:

HPE have told us that they haven't seen this before in 3.3.1 and have escalated it to their Engineering team to determine what went wrong and hopefully for all of us, provide a fix to prevent it from happening again..



That's a stock answer with 3PAR support you know ... we had that same answer for a full crash on 3.2.2EMU2 code, and also for an issue with high latency ... everything is all a surprise to level 1 support .. get to the level 4 guys and you get the real honest, non-polically correct answers (which I much prefer!)

Why are you running 3.3.1 code by the way ? Our local HPE engineer says it isn't GA yet and only installed to customers who want the latest bells and whistles (and to be guinea pigs) .. I know I'm not going there on the arrays I look after until it is GA + 6 months + the release notes for fixes need to stop mentioning "unexpected node restarts" ... oh and I'd like updatevv to work properly as well as dedupe and compression working would be good too ;-)


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par 3.3.1 - Crashed and completed Cold Reboot
PostPosted: Thu Sep 07, 2017 11:34 pm 

Joined: Mon Aug 07, 2017 8:18 pm
Posts: 3
Hmm.. This is even more concerning.

3.3.1 isn't GA?
I was told it went GA in February.

I'm assuming we're now stuck on 3.3.1

Andrew.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par 3.3.1 - Crashed and completed Cold Reboot
PostPosted: Fri Sep 08, 2017 1:30 am 

Joined: Wed May 07, 2014 10:29 am
Posts: 123
You are either running 3.2.2 MU4 or 3.3.1 something

If it is 3.3.1, what version? That release is only at MU1


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par 3.3.1 - Crashed and completed Cold Reboot
PostPosted: Fri Sep 08, 2017 8:36 pm 

Joined: Mon Aug 07, 2017 8:18 pm
Posts: 3
Yep 3.3.1 MU1 - P02 and P04.
Am just about to work with HPE to upgrade to P12.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par 3.3.1 - Crashed and completed Cold Reboot
PostPosted: Sat Sep 09, 2017 9:16 pm 
Site Admin
User avatar

Joined: Tue Aug 18, 2009 10:35 pm
Posts: 1135
Location: Ft Worth, Texas
Please keep us posted if this turns out to be a bug and not a localized incident.

_________________
Richard Siemers
Storage Admin, Pier 1 Imports
The opinions and positions expressed are my own and do not necessarily reflect those of Pier 1 Imports.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par 3.3.1 - Crashed and completed Cold Reboot
PostPosted: Wed Sep 13, 2017 10:58 am 

Joined: Thu Jan 22, 2015 3:37 pm
Posts: 17
We just updated one of our 20840's to 3.3.1 mu1 P07. I asked about the P12 referenced above but the remote engineer said he doesn't know of a P12. Unfortunately I was not able to look on the website on what patches are available because it has been down since last night.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par 3.3.1 - Crashed and completed Cold Reboot
PostPosted: Thu Sep 14, 2017 5:21 am 

Joined: Wed Nov 09, 2011 12:01 pm
Posts: 173
mitchellm3 wrote:
We just updated one of our 20840's to 3.3.1 mu1 P07. I asked about the P12 referenced above but the remote engineer said he doesn't know of a P12. Unfortunately I was not able to look on the website on what patches are available because it has been down since last night.

Not seen a P12 yet but there is a P11 "Improves SSMC connectivity when LDAP is used".


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par 3.3.1 - Crashed and completed Cold Reboot
PostPosted: Sun Sep 17, 2017 7:49 pm 

Joined: Thu Jan 22, 2015 3:37 pm
Posts: 17
Found this reference to P12. I'm asking my HPE team for updates as this is a concern. We use VEEAM with storage snapshots.

https://forums.veeam.com/veeam-backup-r ... 45526.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt