HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 22 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject: Re: node rescue fails
PostPosted: Wed Oct 18, 2017 9:32 am 

Joined: Thu Oct 05, 2017 5:24 am
Posts: 13
Well I have no idea where our purchasing department sourced the replacement from, but it definitely wouldn't have been eBay. It appears to have been "refurbished" but apparently not enough.

I have checked our original serial number against the public warranty checker and it shows as expired - the dates correspond to when it would have been sold to it's original owner. It was a proper factory refurb when it was supplied to us, but the records weren't updated when that happened (which is consistent with all the other HP/E hardware we acquired in the same way).


Top
 Profile  
Reply with quote  
 Post subject: Re: node rescue fails
PostPosted: Thu Oct 19, 2017 2:56 am 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 1570
Location: Europe
The HP/HPE system serial number of the node you are booting is: CZ34029001
The 3PAR system serial number of your system is: 1610528

On all systems I've seen, the last few numbers should match so my guess is that you have the issue others are mentioning, that the node you got isn't a "clean" spare part but simply a node pulled from another working system.

_________________
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.


Top
 Profile  
Reply with quote  
 Post subject: Re: node rescue fails
PostPosted: Thu Oct 19, 2017 9:53 am 

Joined: Thu Oct 05, 2017 5:24 am
Posts: 13
Hmmmm... I'm not seeing where the HP/E serial of our good node matches the 3par serial either. However, looking at one of our other 3pars, both nodes do indeed have matching HP/E serials and they match the 3par serial. weird.


Top
 Profile  
Reply with quote  
 Post subject: Re: node rescue fails
PostPosted: Fri Oct 20, 2017 3:19 am 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 1570
Location: Europe
Okay....

Do the serial of the other node in your Frankenstein3PAR match the serial of the replacement?


[ 38.528755] Assembly Serial Number: PCMBUA8TM5M0ZV <--- This should be unique per node
[ 38.538626] Assembly Part Number: QR483-63001 <--- This needs to be the same for the entire cluster
[ 38.548148] Saleable Serial Number: CZ34029001 <--- This needs to be the same for the entire cluster
[ 38.557323] Saleable Product Number: QR483A <--- This needs to be the same for the entire cluster
[ 38.567360] Spare Part Number: 683246-001 <--- This needs to be the same for the entire cluster

_________________
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.


Top
 Profile  
Reply with quote  
 Post subject: Re: node rescue fails
PostPosted: Thu Oct 26, 2017 5:08 am 

Joined: Thu Oct 05, 2017 5:24 am
Posts: 13
From the good node:

Assembly Serial Number: PCMBUA3TM3K03U <--- This should be unique per node It is.
Assembly Part Number: QR483-63001 <--- This needs to be the same for the entire cluster It is.
Saleable Serial Number: 2MD25201SL <--- This needs to be the same for the entire cluster It is not.
Saleable Product Number: QR483A <--- This needs to be the same for the entire cluster It is.
Spare Part Number: 683246-001 <--- This needs to be the same for the entire cluster It is.

So apparently its the Saleable Serial Number that needs changing or clearing. I understand there is a way to do this from the whack prompt but I'm reluctant to try it without some documentation and/or guidance.

In the meantime we are chasing our purchasing department to see where they got the replacement from. We did not request or authorize an unrefurbished part so hopefully they can be convinced to source a clean one from somewhere.

I am on a three week tour of sites and facilities so I may be slow updating this thread, but I appreciate all the help.


Top
 Profile  
Reply with quote  
 Post subject: Re: node rescue fails
PostPosted: Tue Oct 31, 2017 9:27 pm 

Joined: Mon Jul 29, 2013 9:01 pm
Posts: 62
We had a similar fault on our 7400 but was under warranty.

HPE tech bought a switch and hard cabled the his lappy and node0 (good working node) to node1

and then did rescue as he stated some network switches etc have issues.

It was just a cheap switch from like a tech shop but allowed him to set speed and duplex

Worked very well.


Top
 Profile  
Reply with quote  
 Post subject: Re: node rescue fails
PostPosted: Wed Nov 22, 2017 8:39 am 

Joined: Thu Oct 05, 2017 5:24 am
Posts: 13
Just a quick little update on this. We have received advice from the seller of the replacement node that running Node Rescue from the SP will reset the serial number, whereas node to node rescue doesn't. However only a hardware SP can perform a rescue because it requires a serial connection to the node as well as network - or is there a way to do it from the virtual?

Another line of enquiry has yielded this document:
https://support.hpe.com/hpsc/doc/public/display?docId=mmr_kc-0111944 which addresses a different issue but contains a procedure for initiating node rescue manually, and it includes the set perm sys_serial command. It makes no mention of requiring a serial port connection so I'm wondering whether it could work with a virtual SP.

Has anyone tried a node rescue from a virtual SP?


Top
 Profile  
Reply with quote  
 Post subject: Re: node rescue fails
PostPosted: Tue Nov 28, 2017 11:40 am 

Joined: Thu Oct 05, 2017 5:24 am
Posts: 13
Further information: Entering whack and running set perm sys_serial=2MD25201SL rejects the alphanumeric serial. It will accept set perm sys_serial=1610528 but it already had this number and still won't join the cluster. Watching the serial output of the new node as noderescue runs you can see the good node running set perm sys_serial=1610528 on it as part of the automated process. Noderescue still fails in the same way - this doesn't seem to be what we need.

So I took a stab in the dark and tried set perm saleable_serial=2MD25201SL and it seemed to run and accept the value, however later in the boot process the new node reports its Saleable Serial as CZ34029001 and it still won't join the cluster so that isn't what we need either. I wish there was a way to read the value of these parameters from within whack rather than setting them blindly, but I don't know enough about the syntax to take a guess.


Top
 Profile  
Reply with quote  
 Post subject: Re: node rescue fails
PostPosted: Thu Dec 07, 2017 11:02 am 

Joined: Thu Oct 05, 2017 5:24 am
Posts: 13
Minor Progress!

I have succeeded in changing the saleable serial number of the replacement node to match the existing good node! The whack command prom hp displays fields from the EEPROM containing identifying information, and the command prom hp edit lets you edit them line-by-line. This is where some information from an internal support document I read about two years ago came back to me (Oh, if only I still had access to it now...) On boot the system verifies the integrity of the eeprom data with a checksum and will halt if it doesn't match up, so I believe any time you make a change to the eeprom you need to follow it with prom checksum to update it. I did this and it resulted in an immediate fatal error, however after a hard reset the system booted and reports the correct saleable serial in the process.

The node still fails to join the cluster after a rescue. Nothing has changed in that respect. However what I'm counting as a tiny bit of progress is this:
Code:
cli% shownode
                                                               Control    Data        Cache
Node --Name--- -State- Master InCluster -Service_LED ---LED--- Mem(MB) Mem(MB) Available(%)
   0 1610528-0 OK      Yes    Yes       Off          GreenBlnk    8192    8192          100
   1           Failed  No     No        Unknown      Unknown         0       0            0

It now recognizes that node 1 exists! Previously only node0 was listed.


Last edited by cheese2 on Tue Dec 12, 2017 8:57 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: node rescue fails
PostPosted: Fri Dec 08, 2017 10:39 am 

Joined: Thu Oct 05, 2017 5:24 am
Posts: 13
Success!

Analysis of the serial port boot logs of the replacement node revealed the following error:
Code:
Prom Node ID Value and Slot ID mismatch. NodeID: 0 SlotID: 1

Which leads me to think there is a node ID value store in the EEPROM along with the serial numbers. Running prom edit steps through a different set of values than prom hp edit I tried yesterday, and sure enough Node ID is one of them:
Code:
Whack>prom edit
Board Spin:       04
Size * 256 bytes: 04
Board Class:      920
Board Base:       200040
Board Rev:        A8
Assembly Vendor:  FXN
Assembly Year:    2013
Assembly Week:    44
Assembly Day:     04
Assembly Serial:  02021391
System Serial:    1610528
Node ID:          00
Midplane Type:    1b
Node Type:        40
W19:              0fffffff
Whack>

Now this isn't node 0 it's supposed to be node 1, so I changed Node ID to 01, ran a quick prom checksum which returned PASS instead of a fatal error, and then ran go to complete the boot process. The node came up and after a few moments joined the cluster!
Code:
cli% shownode -i -svc
-------------------------------------------------------------------------Nodes--------------------------------------------------------------------------
Node --Name--- -Manufacturer- ---Assem_Part--- --Assem_Serial-- -Saleable_Serial- --Saleable_PN--- ----Spare_PN---- -------Model_Name------- -Assem_Rev-
   0 1610528-0 FXN            QR483-63001      PCMBUA3TM3K03U   2MD25201SL        QR483A           683246-001       HP_3PAR 7400                    004
   1 1610528-1 FXN            QR483-63001      PCMBUA8TM5M0ZV   2MD25201SL        QR483A           683246-001       HP_3PAR 7400                    004

There is still an error in the management console
Code:
Cage 0, Interface Card 1 Failed (Interface Card Firmware Unknown {0x0} )
that needs further troubleshooting but that may be a separate issue. Needless to say I am *very* relieved to have this up and running. Thanks everyone for your ideas and suggestions through this process.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 22 posts ]  Go to page Previous  1, 2, 3  Next


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 45 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt