HPE Storage Users Group https://3parug.com/ |
|
RC in "Failed" status after network outage https://3parug.com/viewtopic.php?f=18&t=3216 |
Page 1 of 1 |
Author: | keenerb [ Thu Jun 13, 2019 7:20 pm ] |
Post subject: | RC in "Failed" status after network outage |
An existing periodic remote copy setup that's worked flawlessly for several years has suddenly stopped working after a lengthy network outage about two weeks ago. I have been on vacation and returned to a real mess. To make matters worse, my SAN administrator had already turned in his notice and is long gone now. Both targets are marked as "FAILED" under Remote Copy Configuration/Targets. All RCIP ports report READY, and can ping each other with no issues, but the LINKS are "Down" status. What other information can I provide to help here? My only options for the remote copy groups is "Failover remote copy groups." |
Author: | MammaGutt [ Fri Jun 14, 2019 6:06 am ] |
Post subject: | Re: RC in "Failed" status after network outage |
I don't think RCIP ports shouldn't respond to ping. If they are, then I think you might have an IP conflict causing you trouble. edit: I suggest looking at the checkrclink command in CLI to check the status... Or maybe log a call with HPE. |
Author: | keenerb [ Fri Jun 14, 2019 8:20 am ] |
Post subject: | Re: RC in "Failed" status after network outage |
Code: Running Client Side Running link test on: 0:3:1 Test length (secs): 10 Destination Addr: X.X.210.X Local IP Addr: X.X.121.X Local Device name: eth1 ------------------------------------------------------------ Measuring link latency ------------------------------------------------------------ Average measured latency: 12.575 ms Pings Lost: 0 % ------------------------------------------------------------ Starting max MTU test, from 0:3:1 -> X.X.210.X ------------------------------------------------------------ ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500 MTU: 1500 ------------------------------------------------------------ Starting throughput test, from 0:3:1 -> X.X.210.X ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to X.X.210.X, TCP port 5001 TCP window size: 4096 KByte (WARNING: requested 2048 KByte) ------------------------------------------------------------ [ 12] local X.X.121.X port 45132 connected with X.X.210.X port 5001 [ 6] local X.X.121.X port 45126 connected with X.X.210.X port 5001 [ 9] local X.X.121.X port 45129 connected with X.X.210.X port 5001 [ 8] local X.X.121.X port 45128 connected with X.X.210.X port 5001 [ 4] local X.X.121.X port 45125 connected with X.X.210.X port 5001 [ 5] local X.X.121.X port 45124 connected with X.X.210.X port 5001 [ 3] local X.X.121.X port 45123 connected with X.X.210.X port 5001 [ 11] local X.X.121.X port 45131 connected with X.X.210.X port 5001 [ 7] local X.X.121.X port 45127 connected with X.X.210.X port 5001 [ 10] local X.X.121.X port 45130 connected with X.X.210.X port 5001 All the ports report virtually identical output from checkrclink. It appears to "hang" at this point, it never returns to the CLI, not sure if that's normal or not. HPE support is sadly not an option; executive leadership decided that we have no need for HP support on hardware that will be replaced in a few months. I mean, why would we need support for our production SAN and DR environment? That's crazy talk. |
Author: | MammaGutt [ Fri Jun 14, 2019 10:39 am ] |
Post subject: | Re: RC in "Failed" status after network outage |
For the "message too long" you need to reduce MTU. Try 1450 and increase one by one until it stops working. Not sure if that is your only problem. |
Author: | jbguy [ Fri Jun 14, 2019 3:57 pm ] |
Post subject: | Re: RC in "Failed" status after network outage |
Im going to throw my hat in the ring here as well and say network issue. The only time I have seen it fail like that is when it couldnt reach the other side. We use periodic remote copy as well. Can your network team help you to see if the traffic is reaching the other side? What caused the outage? Maybe a switch they are connected to lost a config? Someone forget to do a write mem? Im a total newb on this (only been a 3par admin for about a year and a half) but here are a couple of commands that may give you more info. They may not help but not sure if you knew them at all. Show Remote copy links Showrctransport -rcip Show all target and links for the remote copy group Showrcopy –d targets or links Start and Stop RCOPY from command line Get list of Rcopy groups showrcopy Stop Rcopy Groups stoprcopygroup <groupname> Starting Rcopy groups startrcopygroup <groupname> Report back if you figure it out. |
Author: | keenerb [ Mon Jun 17, 2019 12:50 pm ] |
Post subject: | Re: RC in "Failed" status after network outage |
RCIP pings are succeeding. I've "stoprcopy" and "startrcopy" several times on both ends. showrctransport -rcip reports "State" as "Missing" on all four ports (two local, two remote). Configuration looks good otherwise. checkrclink freezes as reported in the previous listing with server in production testing from DR. When running startserver in DR, I get the following at the end, after normal MTU check and whatnot Code: ------------------------------------------------------------
Starting throughput test, from 0:3:1 -> x.x.121.x ------------------------------------------------------------ Could not connect with server. Please ensure server is running. ============================================================ TEST SUMMARY from 0:3:1 -> x.x.121.x Test Started: Mon Jun 17 13:44:04 EDT 2019 Test Finisshed: Mon Jun 17 13:45:09 EDT 2019 ============================================================ Latency: 12.058 ms Lost pings: 0 % Through-put: 0 Bits/second Max MTU: 1500 Tx TCP Segs: 688 Rx TCP Segs: 647 TCP retrans: 8 % Errored Segs: 0 % Check remote server is running. Link 0:3:1 is NOT SUITABLE for Remote Copy Use ============================================================ |
Author: | khasck [ Wed Jul 03, 2019 6:12 pm ] |
Post subject: | Re: RC in "Failed" status after network outage |
We're seeing an almost identical situation here. We were testing a fiber failover and after we finished testing, put everything back how it was, we saw this exact same situation as you. Link in "down" status even tho all the ports are pingable and up. Same thing -- DR-side shows links up, Prod shows links down. Code: PRODSAN1 cli% checkrclink startclient 0:9:1 x.x.110.131 60
Running Client Side Running link test on: 0:9:1 Test length (secs): 60 Destination Addr: x.x.110.131 Local IP Addr: x.x.110.41 Local Device name: eth1 ------------------------------------------------------------ Measuring link latency ------------------------------------------------------------ Average measured latency: 16.761 ms Pings Lost: 3 % ------------------------------------------------------------ Starting max MTU test, from 0:9:1 -> x.x.110.131 ------------------------------------------------------------ ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500 MTU: 1500 ------------------------------------------------------------ Starting throughput test, from 0:9:1 -> x.x.110.131 ------------------------------------------------------------ Could not connect with server. Please ensure server is running. ============================================================ TEST SUMMARY from 0:9:1 -> x.x.110.131 Test Started: Wed Jul 3 19:02:31 EDT 2019 Test Finished: Wed Jul 3 19:02:37 EDT 2019 ============================================================ Latency: 16.761 ms Lost pings: 3 % Through-put: 0 Bits/second Max MTU: 1500 Tx TCP Segs: 806 Rx TCP Segs: 758 TCP retrans: 16 % Errored Segs: 0 % Check remote server is running. Link 0:9:1 is NOT SUITABLE for Remote Copy Use ============================================================ |
Author: | keenerb [ Wed Jul 03, 2019 7:10 pm ] |
Post subject: | Re: RC in "Failed" status after network outage |
No resolution yet. I've got some professional assistance scheduled early next week from a vendor, they're still unwilling to spring for actual HP support. |
Author: | keenerb [ Tue Nov 12, 2019 12:08 pm ] |
Post subject: | Re: RC in "Failed" status after network outage |
Professional services assistance never happened, but thsi article at least allowed me to clean up the mess. It didn't fix the connectivity issue, but it let me un-replicate the volumes and clean up snapshots and whatnot. https://community.hpe.com/t5/3PAR-Store ... crmidV7kuU Specifically, cli%setrcopytarget no_mirror_config <target array name> is what let me clean up the leftover RC pieces. |
Page 1 of 1 | All times are UTC - 5 hours |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |