hi,
i'm writing you today as we are suddenly having huge performance problems on our 3Par and we are struggling to find the root cause.
our setup is the following :
3par 7400 with a 3 tier AO setup (ssd/fc/nl), 64 total disks
we have a HP C7000 behind with many ESX hosts, mounting gold/silver/bronze luns corresponding to the 3par VV's we created using different AO configs.
performance was OK since 3 years and a few days ago, we suddenly had massive complaints from different teams saying that they had huge timeouts, disconnected services, applications disconnected from databases (like with network issues). we have important network issues, but suspect the problem is coming from the storage, as on the vmware side, we are having dozens of "disk latency" alerts.
in the vmware logs, we can see suddenly dozens of "performance was degraded" on 3par LUNs, with ms going from 5000 to 5000000.
we stopped half of our VM's (all non-prod) to save some performance.
now, when looking at the 3par disks, running a statpd, total I/O per second of disks is arround 13000, every disk has an average of 200.
in terms of Queue Length, all FC & SSD disks are 0, all NL disks have a Qlen of about 20.
it looks like in VMware & VMs, all VMs using the BRONZE configuration are having problems (so mostly NL disks).
we suspect issues with 1 disk or other, but the healthcheck didn't give anything, all disks are still marked as correct.
All disks "service time" is arround 30 to 50 (in statpd)
may i require your help to investigate this problem ?
what kind of thing should we check to find out what the problem can be ?
thanks again
regards
3Par 7400 performance suddenly drops
Re: 3Par 7400 performance suddenly drops
Just by the information provided, it seems like your NL tier could be having some issues.
NL is usually rated at 75 iops per disk, FC10k at 150, FC15k at 200 and SSD at «some thousand».
If you have issues with the NL tier, all volumes with some data is in scope for trouble. My general recommendation when using different tier is to limit volumes (of VMs in Vmware depending on what licenses you have on 3PAR and Vmware).
In cases like this, what I usually see is some «test VM» on bronze volumes suddently doing a lot of iops. You could use statvv in CLI to see performance by volume. If you spot a bronze volume with a lot of iops or high latency you should start digging into that.
Statvlun -hostsum will show traffic by host which could help you narrow down the issue if it is related to a single host. Or pinpoint the host which is running that crazy VM on a volume.
Also one thing to remember with Vmware an AO. When something is deleted in Vmware (storage vmotiom, snapshot deleted, etc) Vmware only deleted the pointers (pre-VMFS6). Those blocks will be completely inactive an AO will drop them to NL. When Vmware starts using those blocks again they will be at NL until first run of AO. With VMFS6 and automatic unmap they will be cleared and all new writes will go to User CPG for the volume (usually FC).
NL is usually rated at 75 iops per disk, FC10k at 150, FC15k at 200 and SSD at «some thousand».
If you have issues with the NL tier, all volumes with some data is in scope for trouble. My general recommendation when using different tier is to limit volumes (of VMs in Vmware depending on what licenses you have on 3PAR and Vmware).
In cases like this, what I usually see is some «test VM» on bronze volumes suddently doing a lot of iops. You could use statvv in CLI to see performance by volume. If you spot a bronze volume with a lot of iops or high latency you should start digging into that.
Statvlun -hostsum will show traffic by host which could help you narrow down the issue if it is related to a single host. Or pinpoint the host which is running that crazy VM on a volume.
Also one thing to remember with Vmware an AO. When something is deleted in Vmware (storage vmotiom, snapshot deleted, etc) Vmware only deleted the pointers (pre-VMFS6). Those blocks will be completely inactive an AO will drop them to NL. When Vmware starts using those blocks again they will be at NL until first run of AO. With VMFS6 and automatic unmap they will be cleared and all new writes will go to User CPG for the volume (usually FC).
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.
Re: 3Par 7400 performance suddenly drops
Cross stack analytics could really help you find the culprit(s) here.
https://d8tadude.com/2018/08/15/configuring-3par-with-infosight/
SSMC 3.4 also:
https://d8tadude.com/2018/09/19/3par-ssmc-3-4-whats-new/
https://d8tadude.com/2018/08/15/configuring-3par-with-infosight/
SSMC 3.4 also:
https://d8tadude.com/2018/09/19/3par-ssmc-3-4-whats-new/