HPE Storage Users Group https://3parug.com/ |
|
3PAR 7200 high read/write latency after controller failover https://3parug.com/viewtopic.php?f=18&t=3855 |
Page 1 of 1 |
Author: | ferencmatyas [ Fri Nov 18, 2022 9:03 am ] |
Post subject: | 3PAR 7200 high read/write latency after controller failover |
Hi all, I have a 3PAR 7200 with 2 Controllers. Controller 1 had a drive issue and was replaced. Controller 0 is now master. After replacement before 3 weeks we noticed pretty bad read/write latency up to 400ms-1000ms one a Filer VM. tunesys, tuneld had no effect. I read other posts about write cache on failovers here, can it be that initially Controller 1 was master and the setup worked, with the failover Controller 0 is master, but somehow the setup still thinks it has only one controller? statport says that all ports have IO. Can I reboot the now master Controller 0 with shutdownnode reboot 0, to see if the situation improves? Code: 15:15:16 11/18/2022 r/w I/O per second KBytes per sec Svt ms IOSz KB Port D/C Cur Avg Max Cur Avg Max Cur Avg Cur Avg Qlen 0:0:1 Data t 3473 958 3473 64466 23677 81677 8.12 6.50 18.6 24.7 45 0:0:2 Data t 3580 646 3580 64540 9627 64540 8.11 7.47 18.0 14.9 48 0:1:1 Data t 332 161 593 19787 4319 28357 58.05 17.03 59.5 26.8 64 0:1:2 Data t 168 139 744 21976 4101 27750 88.88 11.82 131.1 29.5 1 1:0:1 Data t 4087 933 4087 75466 24513 75466 15.12 7.79 18.5 26.3 3 1:0:2 Data t 3974 595 3974 69275 8470 69275 13.41 8.49 17.4 14.2 2 1:1:1 Data t 168 128 369 21469 4757 25317 81.62 17.44 128.1 37.1 0 1:1:2 Data t 220 127 512 22629 3604 28422 95.74 17.87 102.9 28.4 1 -------------------------------------------------------------------------------------- 8 Data t 16001 3689 359608 83068 15.08 8.75 22.5 22.5 164 Press the enter key to stop... 15:17:09 11/18/2022 r/w I/O per second KBytes per sec Svt ms IOSz KB Port D/C Cur Avg Max Cur Avg Max Cur Avg Cur Avg Qlen 0:0:1 Data t 1438 387 1438 26300 5605 26300 13.68 8.38 18.3 14.5 5 0:0:2 Data t 1436 378 1436 26146 5649 26146 14.10 8.45 18.2 15.0 8 0:1:1 Data t 126 87 222 7480 2695 7480 28.62 10.63 59.6 31.0 7 0:1:2 Data t 27 78 251 5323 1948 5323 137.00 14.88 194.2 24.8 10 1:0:1 Data t 948 339 948 17832 4913 17832 7.13 5.87 18.8 14.5 8 1:0:2 Data t 909 328 909 16998 4145 16998 7.16 5.79 18.7 12.6 5 1:1:1 Data t 190 51 190 10883 2252 10883 37.75 15.66 57.3 44.5 7 1:1:2 Data t 109 164 492 5794 2824 5794 58.30 15.76 53.3 17.2 1 -------------------------------------------------------------------------------------- 8 Data t 5182 1812 116757 30030 14.29 8.72 22.5 16.6 51 Press the enter key to stop... shownode says also the 2 controllers are OK: Code: Node --Name--- -State- Master InCluster -Service_LED ---LED--- Mem(MB) Mem(MB) Available(%) 0 1618843-0 OK Yes Yes Off GreenBlnk 8192 4096 0 1 1618843-1 OK No Yes Off GreenBlnk 8192 4096 0 Thank you! Best Regards Ferenc |
Author: | MammaGutt [ Sat Nov 19, 2022 2:58 am ] |
Post subject: | Re: 3PAR 7200 high read/write latency after controller failo |
Quote: shownode says also the 2 controllers are OK: Code: Node --Name--- -State- Master InCluster -Service_LED ---LED--- Mem(MB) Mem(MB) Available(%) 0 1618843-0 OK Yes Yes Off GreenBlnk 8192 4096 0 1 1618843-1 OK No Yes Off GreenBlnk 8192 4096 0 You might think so, but it states 0% write cache. What does checkhealth -svc -detail Say? |
Author: | ferencmatyas [ Sun Nov 20, 2022 6:40 am ] |
Post subject: | Re: 3PAR 7200 high read/write latency after controller failo |
MammaGutt wrote: You might think so, but it states 0% write cache. What does checkhealth -svc -detail Say? Hi MammaGutt, Thanks for your reply. I read that 7200 2 Controller Setups have no write cache. We have a 4 Node 7400 3PAR and there shownode says 100%. How can I check if there is a write cache on the 7200? This 7200 has first disks from 2013, and somehow in the FC 1200GB disks were replaced instead of the 900GB, so this line I can follow. I only found a command here how to fix remote chunklets on a logical disk, not on a PD. Code: cli% checkhealth -svc -detail Checking alert Checking ao Checking cabling Checking cage Checking cert Checking dar Checking date Checking file Checking fs Checking host Checking ld Checking license Checking network Checking node Checking pd Checking pdch Checking port Checking qos Checking rc Checking snmp Checking task Checking vlun Checking vv Checking sp Component ---------------Summary Description--------------- Qty Alert New alerts 2 PD Disks experiencing a high level of I/O per second 1 PD Too few PDs of type/speed/size behind Nodes 1 pdch Chunklets on remote disks 7 --------------------------------------------------------------- 4 total 11 Component ---Identifier--- --------------------------Detailed Description--------------------------- Alert sw_cp:5:AO_NL_r6 CPG AO_NL_r6 SD and/or user space has reached allocation warning of 9500G Alert sw_sysmgr Total FC raw space usage at 39616G (above 50% of total 79218G) PD disk:4 Disk is experiencing a high level of I/O per second: 153.8 PD Nodes:0&1 Only 2 FC/10K/1200GB PDs are attached to these nodes; the minimum is 6 pdch ch:72:1540 Chunklet is on a remote disk pdch ch:75:1716 Chunklet is on a remote disk pdch ch:75:1717 Chunklet is on a remote disk pdch ch:78:1716 Chunklet is on a remote disk pdch ch:82:1716 Chunklet is on a remote disk pdch ch:82:1717 Chunklet is on a remote disk pdch ch:83:1717 Chunklet is on a remote disk ---------------------------------------------------------------------------------------------------- 11 total cli% shownode -mem Node Slot SlotID -Name-- -Usage- ---Type--- --Manufacturer--- -Serial- -Latency-- Size(MB) 0 0 J0155 DIMM0.0 Control DDR3_SDRAM Micron Technology DB86F40E CL5.0/10.0 8192 0 n/a J0300 DIMM0.0 Data DDR2_SDRAM Micron Technology DC955F09 CL4.0/6.0 2048 0 n/a J0301 DIMM1.0 Data DDR2_SDRAM Micron Technology DC955EBA CL4.0/6.0 2048 1 0 J0155 DIMM0.0 Control DDR3_SDRAM Micron Technology DB8CA8A1 CL5.0/10.0 8192 1 n/a J0300 DIMM0.0 Data DDR2_SDRAM Micron Technology DDA9A891 CL4.0/6.0 2048 1 n/a J0301 DIMM1.0 Data DDR2_SDRAM Micron Technology 10C68EC2 CL4.0/6.0 2048 Thank you! Regards Ferenc |
Author: | MammaGutt [ Sun Nov 20, 2022 10:48 am ] |
Post subject: | Re: 3PAR 7200 high read/write latency after controller failo |
ferencmatyas wrote: MammaGutt wrote: You might think so, but it states 0% write cache. What does checkhealth -svc -detail Say? Hi MammaGutt, Thanks for your reply. I read that 7200 2 Controller Setups have no write cache. We have a 4 Node 7400 3PAR and there shownode says 100%. How can I check if there is a write cache on the 7200? This 7200 has first disks from 2013, and somehow in the FC 1200GB disks were replaced instead of the 900GB, so this line I can follow. I only found a command here how to fix remote chunklets on a logical disk, not on a PD. Code: cli% checkhealth -svc -detail Checking alert Checking ao Checking cabling Checking cage Checking cert Checking dar Checking date Checking file Checking fs Checking host Checking ld Checking license Checking network Checking node Checking pd Checking pdch Checking port Checking qos Checking rc Checking snmp Checking task Checking vlun Checking vv Checking sp Component ---------------Summary Description--------------- Qty Alert New alerts 2 PD Disks experiencing a high level of I/O per second 1 PD Too few PDs of type/speed/size behind Nodes 1 pdch Chunklets on remote disks 7 --------------------------------------------------------------- 4 total 11 Component ---Identifier--- --------------------------Detailed Description--------------------------- Alert sw_cp:5:AO_NL_r6 CPG AO_NL_r6 SD and/or user space has reached allocation warning of 9500G Alert sw_sysmgr Total FC raw space usage at 39616G (above 50% of total 79218G) PD disk:4 Disk is experiencing a high level of I/O per second: 153.8 PD Nodes:0&1 Only 2 FC/10K/1200GB PDs are attached to these nodes; the minimum is 6 pdch ch:72:1540 Chunklet is on a remote disk pdch ch:75:1716 Chunklet is on a remote disk pdch ch:75:1717 Chunklet is on a remote disk pdch ch:78:1716 Chunklet is on a remote disk pdch ch:82:1716 Chunklet is on a remote disk pdch ch:82:1717 Chunklet is on a remote disk pdch ch:83:1717 Chunklet is on a remote disk ---------------------------------------------------------------------------------------------------- 11 total cli% shownode -mem Node Slot SlotID -Name-- -Usage- ---Type--- --Manufacturer--- -Serial- -Latency-- Size(MB) 0 0 J0155 DIMM0.0 Control DDR3_SDRAM Micron Technology DB86F40E CL5.0/10.0 8192 0 n/a J0300 DIMM0.0 Data DDR2_SDRAM Micron Technology DC955F09 CL4.0/6.0 2048 0 n/a J0301 DIMM1.0 Data DDR2_SDRAM Micron Technology DC955EBA CL4.0/6.0 2048 1 0 J0155 DIMM0.0 Control DDR3_SDRAM Micron Technology DB8CA8A1 CL5.0/10.0 8192 1 n/a J0300 DIMM0.0 Data DDR2_SDRAM Micron Technology DDA9A891 CL4.0/6.0 2048 1 n/a J0301 DIMM1.0 Data DDR2_SDRAM Micron Technology 10C68EC2 CL4.0/6.0 2048 Thank you! Regards Ferenc All storage systems I know of have write cache. Without it, the write performance is usually horrible. Write cache is disabled when system is running on only one node. The checkhealth gives me no real clue as to why it has no write cache…. |
Author: | ferencmatyas [ Sun Nov 20, 2022 11:03 am ] |
Post subject: | Re: 3PAR 7200 high read/write latency after controller failo |
MammaGutt wrote: All storage systems I know of have write cache. Without it, the write performance is usually horrible. Write cache is disabled when system is running on only one node. The checkhealth gives me no real clue as to why it has no write cache…. Thanks for the hint. I see, on replacing the Controller 1, the local technician mixed up the SAS cables, and I had to rearrange them to the previous order. Till then the cluster status did not recover. HPE support checked remotely and found no issues. Can it be that it still thinks its running on one controller? Would it make sense to reboot controller 1 then if come back successfully controller 0? I could power down the esx hosts, so no IO is present. I read your other post about statcmp I get only read cache hits right? Code: 17:17:41 11/20/2022 ---- Current ----- ---------- Total ----------- Node Type Accesses Hits Hit% Accesses Hits Hit% LockBlk 0 Read 29 26 90 57674 53467 93 25846 0 Write 16520 0 0 441458 0 0 34056508 1 Read 28 20 71 32151 27429 85 20744 1 Write 16529 0 0 447151 0 0 20407887 Queue Statistics Node Free Clean Write1 WriteN WrtSched Writing DcowPend DcowProc RcpyRev 0 17447 204850 0 0 0 1 0 0 0 1 17386 209558 0 0 0 4 0 0 0 Temporary and Page Credits Node Node0 Node1 Node2 Node3 Node4 Node5 Node6 Node7 0 0 17305 --- --- --- --- --- --- 1 17138 0 --- --- --- --- --- --- Page Statistics ----------CfcDirty----------- --------------CfcMax--------------- -------------DelAck-------------- Node FC NL SSD_150KRPM SSD_100KRPM FC NL SSD_150KRPM SSD_100KRPM FC NL SSD_150KRPM SSD_100KRPM 0 1 0 0 0 104340 5550 0 0 28 614029 0 0 1 4 0 0 0 104340 5550 0 0 0 0 0 0 Press the enter key to stop... Thank you, Regards Ferenc |
Author: | MammaGutt [ Sun Nov 20, 2022 3:44 pm ] |
Post subject: | Re: 3PAR 7200 high read/write latency after controller failo |
Correct, statcmp shows only read hit when you don’t have write cache. Interestting that the local tech didn’t spot the issue. It took me 10 seconds…. If you replaced node1, you could try and reboot it. It should be online. If you can power down everything so there is no IO you could also try and restart sysmgr or reboot the entire array as well… But considering you had HPE support engaged when the node was replaced I would contact them and complain and ask if they consider the array OK without write cache. |
Author: | ferencmatyas [ Wed Nov 23, 2022 4:27 am ] |
Post subject: | Re: 3PAR 7200 high read/write latency after controller failo |
I rebooted the controller 1, the 0 stayed the master with no outage, came back, still no write cache. unfortunately we do not have HPE support anymore, so I cannot ask. I will try to reboot controller 0, and the whole storage during the holidays and as a solution put an SSD appliance for the problematic VM. checkhealth also says many PDs are under heavy IO. thanks a lot for your support! regards Ferenc |
Author: | MammaGutt [ Wed Nov 23, 2022 4:59 am ] |
Post subject: | Re: 3PAR 7200 high read/write latency after controller failo |
ferencmatyas wrote: I rebooted the controller 1, the 0 stayed the master with no outage, came back, still no write cache. unfortunately we do not have HPE support anymore, so I cannot ask. I will try to reboot controller 0, and the whole storage during the holidays and as a solution put an SSD appliance for the problematic VM. checkhealth also says many PDs are under heavy IO. thanks a lot for your support! regards Ferenc HPE support continue to work on support cases after the contract expires. So I would try and claim that the previous case was closed (if closed) on incorrect basis and that the problem wasn't fixed. Not saying they would honor it, but I would say it is worth a try ...... Worst case, try and extend the support for a few months to get this issue resolved. |
Page 1 of 1 | All times are UTC - 5 hours |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |