HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: 3PAR 7200 high read/write latency after controller failover
PostPosted: Fri Nov 18, 2022 9:03 am 

Joined: Wed Aug 31, 2022 3:27 am
Posts: 6
Hi all,

I have a 3PAR 7200 with 2 Controllers. Controller 1 had a drive issue and was replaced. Controller 0 is now master. After replacement before 3 weeks we noticed pretty bad read/write latency up to 400ms-1000ms one a Filer VM. tunesys, tuneld had no effect.

I read other posts about write cache on failovers here, can it be that initially Controller 1 was master and the setup worked, with the failover Controller 0 is master, but somehow the setup still thinks it has only one controller? statport says that all ports have IO.

Can I reboot the now master Controller 0 with shutdownnode reboot 0, to see if the situation improves?

Code:
15:15:16 11/18/2022 r/w  I/O per second     KBytes per sec      Svt ms    IOSz KB
     Port       D/C       Cur  Avg  Max    Cur   Avg   Max   Cur   Avg   Cur  Avg Qlen
    0:0:1      Data   t  3473  958 3473  64466 23677 81677  8.12  6.50  18.6 24.7   45
    0:0:2      Data   t  3580  646 3580  64540  9627 64540  8.11  7.47  18.0 14.9   48
    0:1:1      Data   t   332  161  593  19787  4319 28357 58.05 17.03  59.5 26.8   64
    0:1:2      Data   t   168  139  744  21976  4101 27750 88.88 11.82 131.1 29.5    1
    1:0:1      Data   t  4087  933 4087  75466 24513 75466 15.12  7.79  18.5 26.3    3
    1:0:2      Data   t  3974  595 3974  69275  8470 69275 13.41  8.49  17.4 14.2    2
    1:1:1      Data   t   168  128  369  21469  4757 25317 81.62 17.44 128.1 37.1    0
    1:1:2      Data   t   220  127  512  22629  3604 28422 95.74 17.87 102.9 28.4    1
--------------------------------------------------------------------------------------
        8      Data   t 16001 3689      359608 83068       15.08  8.75  22.5 22.5  164
Press the enter key to stop...

15:17:09 11/18/2022 r/w I/O per second     KBytes per sec       Svt ms    IOSz KB
     Port       D/C      Cur  Avg  Max    Cur   Avg   Max    Cur   Avg   Cur  Avg Qlen
    0:0:1      Data   t 1438  387 1438  26300  5605 26300  13.68  8.38  18.3 14.5    5
    0:0:2      Data   t 1436  378 1436  26146  5649 26146  14.10  8.45  18.2 15.0    8
    0:1:1      Data   t  126   87  222   7480  2695  7480  28.62 10.63  59.6 31.0    7
    0:1:2      Data   t   27   78  251   5323  1948  5323 137.00 14.88 194.2 24.8   10
    1:0:1      Data   t  948  339  948  17832  4913 17832   7.13  5.87  18.8 14.5    8
    1:0:2      Data   t  909  328  909  16998  4145 16998   7.16  5.79  18.7 12.6    5
    1:1:1      Data   t  190   51  190  10883  2252 10883  37.75 15.66  57.3 44.5    7
    1:1:2      Data   t  109  164  492   5794  2824  5794  58.30 15.76  53.3 17.2    1
--------------------------------------------------------------------------------------
        8      Data   t 5182 1812      116757 30030        14.29  8.72  22.5 16.6   51
Press the enter key to stop...



shownode says also the 2 controllers are OK:
Code:
Node --Name--- -State- Master InCluster -Service_LED ---LED--- Mem(MB) Mem(MB) Available(%)
   0 1618843-0 OK      Yes    Yes       Off          GreenBlnk    8192    4096            0
   1 1618843-1 OK      No     Yes       Off          GreenBlnk    8192    4096            0


Thank you!
Best Regards
Ferenc


Top
 Profile  
Reply with quote  
 Post subject: Re: 3PAR 7200 high read/write latency after controller failo
PostPosted: Sat Nov 19, 2022 2:58 am 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 1449
Location: Europe
Quote:

shownode says also the 2 controllers are OK:
Code:
Node --Name--- -State- Master InCluster -Service_LED ---LED--- Mem(MB) Mem(MB) Available(%)
   0 1618843-0 OK      Yes    Yes       Off          GreenBlnk    8192    4096            0
   1 1618843-1 OK      No     Yes       Off          GreenBlnk    8192    4096            0




You might think so, but it states 0% write cache.

What does

checkhealth -svc -detail

Say?

_________________
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3PAR 7200 high read/write latency after controller failo
PostPosted: Sun Nov 20, 2022 6:40 am 

Joined: Wed Aug 31, 2022 3:27 am
Posts: 6
MammaGutt wrote:

You might think so, but it states 0% write cache.

What does

checkhealth -svc -detail

Say?


Hi MammaGutt,
Thanks for your reply. I read that 7200 2 Controller Setups have no write cache. We have a 4 Node 7400 3PAR and there shownode says 100%. How can I check if there is a write cache on the 7200?

This 7200 has first disks from 2013, and somehow in the FC 1200GB disks were replaced instead of the 900GB, so this line I can follow. I only found a command here how to fix remote chunklets on a logical disk, not on a PD.

Code:
cli% checkhealth -svc -detail
Checking alert
Checking ao
Checking cabling
Checking cage
Checking cert
Checking dar
Checking date
Checking file
Checking fs
Checking host
Checking ld
Checking license
Checking network
Checking node
Checking pd
Checking pdch
Checking port
Checking qos
Checking rc
Checking snmp
Checking task
Checking vlun
Checking vv
Checking sp
Component ---------------Summary Description--------------- Qty
Alert     New alerts                                          2
PD        Disks experiencing a high level of I/O per second   1
PD        Too few PDs of type/speed/size behind Nodes         1
pdch      Chunklets on remote disks                           7
---------------------------------------------------------------
        4 total                                              11

Component ---Identifier--- --------------------------Detailed Description---------------------------
Alert     sw_cp:5:AO_NL_r6 CPG AO_NL_r6 SD and/or user space has reached allocation warning of 9500G
Alert     sw_sysmgr        Total FC raw space usage at 39616G (above 50% of total 79218G)
PD        disk:4           Disk is experiencing a high level of I/O per second: 153.8
PD        Nodes:0&1        Only 2 FC/10K/1200GB PDs are attached to these nodes; the minimum is 6
pdch      ch:72:1540       Chunklet is on a remote disk
pdch      ch:75:1716       Chunklet is on a remote disk
pdch      ch:75:1717       Chunklet is on a remote disk
pdch      ch:78:1716       Chunklet is on a remote disk
pdch      ch:82:1716       Chunklet is on a remote disk
pdch      ch:82:1717       Chunklet is on a remote disk
pdch      ch:83:1717       Chunklet is on a remote disk
----------------------------------------------------------------------------------------------------
       11 total

 cli% shownode -mem
Node Slot SlotID -Name-- -Usage- ---Type--- --Manufacturer--- -Serial- -Latency-- Size(MB)
   0    0 J0155  DIMM0.0 Control DDR3_SDRAM Micron Technology DB86F40E CL5.0/10.0     8192
   0  n/a J0300  DIMM0.0 Data    DDR2_SDRAM Micron Technology DC955F09 CL4.0/6.0      2048
   0  n/a J0301  DIMM1.0 Data    DDR2_SDRAM Micron Technology DC955EBA CL4.0/6.0      2048
   1    0 J0155  DIMM0.0 Control DDR3_SDRAM Micron Technology DB8CA8A1 CL5.0/10.0     8192
   1  n/a J0300  DIMM0.0 Data    DDR2_SDRAM Micron Technology DDA9A891 CL4.0/6.0      2048
   1  n/a J0301  DIMM1.0 Data    DDR2_SDRAM Micron Technology 10C68EC2 CL4.0/6.0      2048


Thank you!
Regards
Ferenc


Top
 Profile  
Reply with quote  
 Post subject: Re: 3PAR 7200 high read/write latency after controller failo
PostPosted: Sun Nov 20, 2022 10:48 am 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 1449
Location: Europe
ferencmatyas wrote:
MammaGutt wrote:

You might think so, but it states 0% write cache.

What does

checkhealth -svc -detail

Say?


Hi MammaGutt,
Thanks for your reply. I read that 7200 2 Controller Setups have no write cache. We have a 4 Node 7400 3PAR and there shownode says 100%. How can I check if there is a write cache on the 7200?

This 7200 has first disks from 2013, and somehow in the FC 1200GB disks were replaced instead of the 900GB, so this line I can follow. I only found a command here how to fix remote chunklets on a logical disk, not on a PD.

Code:
cli% checkhealth -svc -detail
Checking alert
Checking ao
Checking cabling
Checking cage
Checking cert
Checking dar
Checking date
Checking file
Checking fs
Checking host
Checking ld
Checking license
Checking network
Checking node
Checking pd
Checking pdch
Checking port
Checking qos
Checking rc
Checking snmp
Checking task
Checking vlun
Checking vv
Checking sp
Component ---------------Summary Description--------------- Qty
Alert     New alerts                                          2
PD        Disks experiencing a high level of I/O per second   1
PD        Too few PDs of type/speed/size behind Nodes         1
pdch      Chunklets on remote disks                           7
---------------------------------------------------------------
        4 total                                              11

Component ---Identifier--- --------------------------Detailed Description---------------------------
Alert     sw_cp:5:AO_NL_r6 CPG AO_NL_r6 SD and/or user space has reached allocation warning of 9500G
Alert     sw_sysmgr        Total FC raw space usage at 39616G (above 50% of total 79218G)
PD        disk:4           Disk is experiencing a high level of I/O per second: 153.8
PD        Nodes:0&1        Only 2 FC/10K/1200GB PDs are attached to these nodes; the minimum is 6
pdch      ch:72:1540       Chunklet is on a remote disk
pdch      ch:75:1716       Chunklet is on a remote disk
pdch      ch:75:1717       Chunklet is on a remote disk
pdch      ch:78:1716       Chunklet is on a remote disk
pdch      ch:82:1716       Chunklet is on a remote disk
pdch      ch:82:1717       Chunklet is on a remote disk
pdch      ch:83:1717       Chunklet is on a remote disk
----------------------------------------------------------------------------------------------------
       11 total

 cli% shownode -mem
Node Slot SlotID -Name-- -Usage- ---Type--- --Manufacturer--- -Serial- -Latency-- Size(MB)
   0    0 J0155  DIMM0.0 Control DDR3_SDRAM Micron Technology DB86F40E CL5.0/10.0     8192
   0  n/a J0300  DIMM0.0 Data    DDR2_SDRAM Micron Technology DC955F09 CL4.0/6.0      2048
   0  n/a J0301  DIMM1.0 Data    DDR2_SDRAM Micron Technology DC955EBA CL4.0/6.0      2048
   1    0 J0155  DIMM0.0 Control DDR3_SDRAM Micron Technology DB8CA8A1 CL5.0/10.0     8192
   1  n/a J0300  DIMM0.0 Data    DDR2_SDRAM Micron Technology DDA9A891 CL4.0/6.0      2048
   1  n/a J0301  DIMM1.0 Data    DDR2_SDRAM Micron Technology 10C68EC2 CL4.0/6.0      2048


Thank you!
Regards
Ferenc


All storage systems I know of have write cache. Without it, the write performance is usually horrible. Write cache is disabled when system is running on only one node.

The checkhealth gives me no real clue as to why it has no write cache….

_________________
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3PAR 7200 high read/write latency after controller failo
PostPosted: Sun Nov 20, 2022 11:03 am 

Joined: Wed Aug 31, 2022 3:27 am
Posts: 6
MammaGutt wrote:
All storage systems I know of have write cache. Without it, the write performance is usually horrible. Write cache is disabled when system is running on only one node.

The checkhealth gives me no real clue as to why it has no write cache….

Thanks for the hint. I see, on replacing the Controller 1, the local technician mixed up the SAS cables, and I had to rearrange them to the previous order. Till then the cluster status did not recover. HPE support checked remotely and found no issues. Can it be that it still thinks its running on one controller? Would it make sense to reboot controller 1 then if come back successfully controller 0? I could power down the esx hosts, so no IO is present.

I read your other post about statcmp I get only read cache hits right?
Code:
17:17:41 11/20/2022 ---- Current ----- ---------- Total -----------
    Node Type       Accesses Hits Hit% Accesses  Hits Hit%  LockBlk
       0 Read             29   26   90    57674 53467   93    25846
       0 Write         16520    0    0   441458     0    0 34056508
       1 Read             28   20   71    32151 27429   85    20744
       1 Write         16529    0    0   447151     0    0 20407887

        Queue Statistics
Node  Free  Clean Write1 WriteN WrtSched Writing DcowPend DcowProc RcpyRev
   0 17447 204850      0      0        0       1        0        0       0
   1 17386 209558      0      0        0       4        0        0       0

        Temporary and Page Credits
Node Node0 Node1 Node2 Node3 Node4 Node5 Node6 Node7
   0     0 17305   ---   ---   ---   ---   ---   ---
   1 17138     0   ---   ---   ---   ---   ---   ---

        Page Statistics
     ----------CfcDirty----------- --------------CfcMax--------------- -------------DelAck--------------
Node FC NL SSD_150KRPM SSD_100KRPM     FC   NL SSD_150KRPM SSD_100KRPM FC     NL SSD_150KRPM SSD_100KRPM
   0  1  0           0           0 104340 5550           0           0 28 614029           0           0
   1  4  0           0           0 104340 5550           0           0  0      0           0           0
Press the enter key to stop...





Thank you,
Regards
Ferenc


Top
 Profile  
Reply with quote  
 Post subject: Re: 3PAR 7200 high read/write latency after controller failo
PostPosted: Sun Nov 20, 2022 3:44 pm 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 1449
Location: Europe
Correct, statcmp shows only read hit when you don’t have write cache.

Interestting that the local tech didn’t spot the issue. It took me 10 seconds….

If you replaced node1, you could try and reboot it. It should be online.

If you can power down everything so there is no IO you could also try and restart sysmgr or reboot the entire array as well…

But considering you had HPE support engaged when the node was replaced I would contact them and complain and ask if they consider the array OK without write cache.

_________________
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3PAR 7200 high read/write latency after controller failo
PostPosted: Wed Nov 23, 2022 4:27 am 

Joined: Wed Aug 31, 2022 3:27 am
Posts: 6
I rebooted the controller 1, the 0 stayed the master with no outage, came back, still no write cache.
unfortunately we do not have HPE support anymore, so I cannot ask.
I will try to reboot controller 0, and the whole storage during the holidays and as a solution put an SSD appliance for the problematic VM. checkhealth also says many PDs are under heavy IO.
thanks a lot for your support!
regards
Ferenc


Top
 Profile  
Reply with quote  
 Post subject: Re: 3PAR 7200 high read/write latency after controller failo
PostPosted: Wed Nov 23, 2022 4:59 am 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 1449
Location: Europe
ferencmatyas wrote:
I rebooted the controller 1, the 0 stayed the master with no outage, came back, still no write cache.
unfortunately we do not have HPE support anymore, so I cannot ask.
I will try to reboot controller 0, and the whole storage during the holidays and as a solution put an SSD appliance for the problematic VM. checkhealth also says many PDs are under heavy IO.
thanks a lot for your support!
regards
Ferenc


HPE support continue to work on support cases after the contract expires. So I would try and claim that the previous case was closed (if closed) on incorrect basis and that the problem wasn't fixed.

Not saying they would honor it, but I would say it is worth a try ...... Worst case, try and extend the support for a few months to get this issue resolved.

_________________
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 


Who is online

Users browsing this forum: Google [Bot] and 13 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt