3Par/OEL7/Multipath Issue
Author:  steve_berry [ Wed Jun 26, 2019 3:32 pm ]
3Par/OEL7/Multipath Issue

Hi All,

We have a strange issue that when we simulate a failover we only get 3 out of 4 paths back.

The environment is IBM/Lenovo SR 560 servers running Oracle Linux 7.6. These are part of a RAC cluster but that doesn't come into play here.

Basically we have two fiber cards connected to two different switches for 4 paths to each lun (2 per fabric). When you pull one cable (or shut one switch port down) you see 2 active and 2 failed paths. This much is normal. When you plug the cable back in we end up with 3 active paths and 1 failed.

Restarting multipathd results in losing the failed path completely and only a reboot will get it back.

Can someone point us in the right direction here? Below is our multipath.conf file.

defaults {
user_friendly_names yes
find_multipaths yes

blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^(hd|xvd|vd)[a-z]*"
wwid "*"
devnode "^asm/*" # for DBA
devnode "ofsctl" # for DBA

devices {
device {
vendor "3PARdata"
product "VV"
features "0"
hardware_handler "1 alua"
path_selector "round-robin 0"
#getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_grouping_policy "group_by_prio"
prio "alua"
failback "immediate"
rr_weight "uniform"
no_path_retry fail
rr_min_io_rq 1
path_checker "tur"
detect_prio "yes"
fast_io_fail_tmo 30
dev_loss_tmo "infinity"

blacklist_exceptions {

wwid "360002ac0000000000000037c0002140b"
wwid "360002ac0000000000000037d0002140b"

wwid "360002ac0000000000000036e0002140b"
wwid "360002ac0000000000000036f0002140b"

wwid "360002ac0000000000000037a0002140b"
wwid "360002ac000000000000003820002140b"

Author:  MammaGutt [ Thu Jun 27, 2019 2:04 am ]
Re: 3Par/OEL7/Multipath Issue

What persona is the host configured with?

Author:  steve_berry [ Thu Jun 27, 2019 8:41 am ]
Re: 3Par/OEL7/Multipath Issue

The persona is set to 2.

Author:  MammaGutt [ Thu Jun 27, 2019 1:21 pm ]
Re: 3Par/OEL7/Multipath Issue

That means that ALUA is correct.

Just taking a quick look into the implementation guide and your multipath.conf there seems to be a few differences. I suggest looking more into that. I'm not OEL expert but there is a "polling interval" defined under defaults which might come into play (not sure what defaults are).

There are also some comments in SPOCK if you are using Cisco SAN-switches.

https://support.hpe.com/hpsc/doc/public ... =c04448818

