Hello all,
first of all, thanks for having this forum here.
It already helped a lot over the time.
We are currently testing a new 8440 "Cluster" for VMware only.
While doing acceptance tests, we encountered some behaviour we think is quite strange, and which is "not amusing" for the customer.
Setup is as following:
System_A (primary) ---> RCG ---> System_B (secondary)
LUNs in RCG are presented to Host_A from both System_A and System_B.
When simulating a power outage (by pulling all 3par power cords)
of the secondary 3par system (that is: the system which is the target for the remote copy group),
the ESXi(s) which are only having standby paths to that very system (replicated r/o LUNs are being presented)
are experiencing an APD (all paths down)
for the LUNs which are
on the primary array.
The APDs duration is quite short (1-2s while there is no I/O, maybe longer if there is more I/O) but VMs running off that LUN(s) experience a freeze of I/O for about 15s.
The main question is - is this normal?
Works as designed? Someone seeing the same phenomenon?
To clarify things here:
LUNs which are active on the powered off array and have to be switched over (failovered) to the other site (read: SystemA "breaks" RCG, makes Copy-LUNs active, and accepts IO) are usually _not_ experiencing this delay.
It is the LUNs that are active on the primary array - which is _not_ powered off - which are having this problem.
I just can not wrap my head around this.
HPE Case is in the works, same with VMware.
Best practice guide(s) have been followed, with exception of one "recommendation" (dynamic FC-ID assignment should be "off")
Specs:
Code:
3Par Arrays: 2x 8440 Flash Only
3parOS: 3.2.2 (MU4)
HostPersona: 11 / SATP rule for round robin and iops=1 in place on esx
ESXi: 5.5u3a & 6.0.0u3
HBAs: Qlogic & Emulex (CNA & "classic" HBAs) (rec. FW lvl)
SAN: Brocade G620 FOS 8.0.1b (1st fabric) & 5100 FOS 7.4.1d (2nd fabric)
Distance: between primary/secondary: 10km, 8x DarkFibre
Thankful for every input.
TIA & Cheers,
mlu