After a recent issue with the company I am working for, and the resultant RCA, the esxi environment suffered serious iscsi latency to the point that esxi hosts hung the HPE 3par 7400 reported no issues and infact performance went up slightly due to less workload from the supported ESX environment.
HPE 3Par central did a full analysis of the array and only advised that it was outside of supported FW levels.
One graph has now been targeted, which shows iSCSI utilisation avgbusy running consistently at around 96-98%, until the exact time of the issue where it dropped to around 76-78%.
It is my understanding that this is purely a metric of the Host ports activity where any traffic is registered as busy on the port.
They have 4 iSCSI 10g ports to the array. So it is my belief that a single iSCSI port became congested and that the juniper switches had an issue and dropped the port, there are no logs available from the switches. Round Robin was set up incorrectly on a number of hosts . The problem i that the RCA report is being submitted with the iscsi avgbusy graph front and centre as the major contributor if not the actual Root cause.
HPE cannot tell me the exact meaning of the graph and what it represents even.
As I said it is my understanding that the iscsi utilisation avgbusy is only an indicator of the port being active, and has no bearing on the array performance..
Any references would be most welcome, as I need to present something in black and white.
|