I already have tickets in with HP for these issues, but figured I'd reach out to this community just in case anyone has some insight.
Our V400 was upgraded to 3.1.3 MU2 on Wednesday evening. Later that night, our monitoring software alerted that the RCIP ports were no longer available. I started up a CLI session and determined that the ports were up and Remote Copy was transferring properly. I could not, however, ping the Remote Copy ports. Now, I don't know if I was ever able to ping them, but it seemed strange to me.
Logging into the IMC, I went to Systems --> Ports --> Remote Copy. Again, everything looked fine in terms of IPs, Gateways, etc. I then used the ping function from node 0's RCIP port to ping the gateway: success. Its replication partner's node 0 RCIP port: success. Then I tried pinging my computer's IP. This is where it gets fun. Immediately, my IMC froze up, and my CLI sessions shut down. I kill the IMC and try to log in, but the 3PAR array isn't reachable. Ping the management port... nothing. I ask a coworker, and he's able to get to the array fine. My PC, however, was completely locked out from communicating with the array. Thinking maybe something got wonky with my network adapter, I reset it, flush my arp cache, and try again... no dice.
I log onto a server where I keep a bunch of management tools, and I'm able to access the array. Now everything looks good, but Remote Copy link 0 (node 0 here to node 0 remote) is down. Now's a good time to mention I'm running a constant ping to the management IP from my desktop with 100% packet loss. I disable RCIP port 0 on the 3PAR, thinking I'll reset it. Instantly, my computer starts being able to ping the management port (which is on a different VLAN entirely from RCIP). Sure enough, I'm able to log into the IMC and CLI. I reenable port 0, Remote Copy indicates the link is up, data begins transferring, and I'm still able to reach the 3PAR from my PC.
Figuring it may be a fluke or some routing issue, I try the same thing, but this time pinging my management server (which is on the same VLAN as the 3PAR management port) from the Remote Copy port. Same thing! A constant ping to the 3PAR immediately dropped. Resetting the port restored its access.
Okay, that's issue number 1. Issue number 2 is even worse, because at least Remote Copy is working, despite the ports not responding to pings (except if performed from the core). Despite a CPG having plenty of room (>25TB), a volume was unable to allocate space.
Event id: 7252821 Node 1 Cust Alert - Yes, Svc Alert - Yes Severity: Critical Event type: TP VV allocation failure Alert ID: 554 Msg ID: 270007 Component: Virtual Volume 31556 3PAR-Volume-Name CPG 11 CPG_Name Short Dsc: TP VV 3PAR-Volume-Name allocation failure Event String: Thin provisioned VV 3PAR-Volume-Name unable to allocate SD space from CPG CPG_Name
This directly impacted underlying hosts. Luckily, our VMware admin was just doing a storage vmotion, so no data was lost, but it could have been. I used DO on the affected volume to tune it back to the same CPG. This caused the free space on that CPG to grow, essentially avoiding the automatic SD space allocation process. The VV was then able to grow as required... for a while. When that free space was consumed, the error reappeared, and I had to DO the volume again to create wiggle room.
So yeah, fun times. Anyone have any ideas or have an array on 3.1.3 MU2 that they can check for the same RCIP port behavior?
Thanks, Adam
::edit::
Got an update from HP on the CPG issue.
L2 investigation yields that we currently have a SW glitch that is preventing thin-provisioned volumes from grabbing more space from the CPG occasionally, even though there is free space available. The issue is under investigation.
Options at this time are: 1) convert your volumes to fully privisioned, which might not be feasible for you 2) upgrade your current OS 3.1.3MU2 to 3.2.1MU2, as we have not seen the issue in the 3.2.1 code base
There's no way we could accommodate fully provisioning our VVs and we, as an organization, try to steer clear of the bleeding edge, so neither option is really suitable for us. Thanks HP!
|