HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 4 posts ] 
Author Message
 Post subject: lun heartbeat in vmware
PostPosted: Tue Sep 05, 2017 1:53 pm 

Joined: Fri Jan 20, 2017 9:39 am
Posts: 58
I have a case open with HP for an issue with LUNS randomly saying they are lost in vmware and according to the vmware logs it says the heartbeat is lost and the nit gets it back right away

One of the things I got back from the HP CSS ERT was VAAI heartbeat is enabled and should be disabled per VMware KB2146451

I knew about that KB but couldn't find anything about having to disable it for 3Par. Is reverting to the old behavior normal for 3par or is HP just guessing at things for me to try?

EDIT: per that KB I linked to, it only suggests disabling ATS heartbeat if you see miscompare messages in the logs and I don't see that. Just see a lost message and then a reocnnected message (and in the vmware logs, one of them points to lost of heartbeart but there's no indication of an issue that I noticed before it logs the error)


Top
 Profile  
Reply with quote  
 Post subject: Re: lun heartbeat in vmware
PostPosted: Wed Sep 06, 2017 11:59 am 

Joined: Mon Jul 08, 2013 9:59 am
Posts: 11
It is normal practice to turn off ATS heartbeat for 3Par arrays using the methods in the article, at least according to support. It's possible you may see miscompare messages in the 3Par debug log.

We have an issue that presents itself similarly sometimes (Other times we'll completely drop paths) that's been escalated to the highest level in HPE as it's been ongoing for several months caused by a problem with the dedupe code.

Are you using dedupe by any chance?


Top
 Profile  
Reply with quote  
 Post subject: Re: lun heartbeat in vmware
PostPosted: Wed Sep 06, 2017 2:30 pm 

Joined: Fri Jan 20, 2017 9:39 am
Posts: 58
SBrayne wrote:
It is normal practice to turn off ATS heartbeat for 3Par arrays using the methods in the article, at least according to support. It's possible you may see miscompare messages in the 3Par debug log


Are we both talking about running this command on all the hosts basically:

esxcli system settings advanced set -i 0 -o /VMFS3/UseATSForHBOnVMFS5

ATS is important for VAAI I think but maybe 3par doesn't like it for heartbeats? I got essentially the same comment (turn off ATS heartbeats) but I can't find any documentation about it. If this needs to be done, wouldn't it be in a best practice document?

SBrayne wrote:
We have an issue that presents itself similarly sometimes (Other times we'll completely drop paths) that's been escalated to the highest level in HPE as it's been ongoing for several months caused by a problem with the dedupe code.

Are you using dedupe by any chance?


Yes I am. On all my volumes except 1. Getting 2:1 savings so I'd like to keep my dedupe if possible :)


Top
 Profile  
Reply with quote  
 Post subject: Re: lun heartbeat in vmware
PostPosted: Wed Sep 06, 2017 5:10 pm 

Joined: Mon Jul 08, 2013 9:59 am
Posts: 11
Yes sounds like we're both on the same page.

You would have thought that they would have added it to the VMWare best practices guide by now, but here is the advisory information copied from the last upgrade major upgrade that I had done: 3.2.2 MU2 -> 3.2.2 MU4

 For VMware Hosts

For ESXi 5.5 Update 2 or ESXi 6.0 disable VAAI-ATS as per VMware Advisory below:

https://kb.vmware.com/selfservice/micro ... %202538771
As per the latest VMware implementation guide - http://h20564.www2.hpe.com/hpsc/doc/pub ... =c03290624 , Page 51 indicates not to install the 3PAR VAAI Plug-in 2.2.0 on the ESXi 5.x if it is connected to a 3PAR StoreServ Storage running 3PAR OS 3.1.1 or later. The VAAI primitives are handled by the default T10, VMware plug-in and do not require the 3PAR VAAI plug-in.



We have several factors in play, periodic remote copy, snapshots and dedupe that combined have caused us significant problems for a long time including minor issues similar to yours all the way up to random host outages that require reboots. Online operations exacerbate the problem such as compactcpg or tunevv and almost always cause host outages due to the excessively long i/o stalls i.e. the array will refuse data for a specifc LUN for on occasion over 19 seconds or more.

We've had the case open for such a long time that it's been escalated all the way up almost to Meg Whitman and HPE have just loaned us an additional 16x 8TB SSDs in order to provide temporary buffer space to complete the "fix" that they have in mind.

Essentially they want us to un-dedupe everything, upgrade to 3.3.1 MU1 (will be GA/default very shortly) and then turn on dedupe for selected volumes that benefit.

Since you are having similar but less severe problems than ourselves then moving to 3.3.1 MU1 might just do the trick, you will likely be able to use tunevv to migrate your vvs to dedupe 3.0


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 


Who is online

Users browsing this forum: No registered users and 34 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt