iSCSI latency messages (problem with NMS?)
Pending an answer about 3.1.3 availability, I moved one VM (centos6.2) to the iSCSI LUN. I fire it up and start doing things in it. I notice in the vmware/esxi console, messages like this:
2012-03-20T13:20:04.277Z cpu2:2050)WARNING: ScsiDeviceIO: 1218: Device naa.600144f0a874410000004f5b89590001 performance has deteriorated. I/O latency increased from average value of 6753 microseconds to 834157 microseconds. 2012-03-20T13:20:04.333Z cpu4:2052)WARNING: ScsiDeviceIO: 1218: Device naa.600144f0a874410000004f5b89590001 performance has deteriorated. I/O latency increased from average value of 6753 microseconds to 1721544 microseconds. 2012-03-20T13:20:04.621Z cpu7:2055)ScsiDeviceIO: 1198: Device naa.600144f0a874410000004f5b89590001 performance has improved. I/O latency reduced from 1721544 microseconds to 336203 microseconds. 2012-03-20T13:20:28.237Z cpu0:2056)ScsiDeviceIO: 1198: Device naa.600144f0a874410000004f5b89590001 performance has improved. I/O latency reduced from 336203 microseconds to 65804 microseconds.
repeat ad infinitum... I also noticed in the nexenta ssh window that the load factor was over 2. This is a lightly loaded box, and the SAN is a 3.2ghz pentium-D (dual-core) with 8GB ram. There is no way it should be overloaded. Googling turned me to a thread in this forum back around 3.0.4 that mentioned multiple nms processes. I made a change per that article and the load factor went down to 0.7 or so. Still seeing latency complaints from esxi but not as many. Gotta say this doesn't give me a warm and fuzzy :(
Wondering too if this could be exacerbated by having had a management GUI window open in the background. Killed that, and the load is 0.44. On the other hand, I started a bulk write (8GB) to a test file in the VM, so it goes to the iSCSI LUN. Messages streaming in esxi, and the nexenta load shot up to 1.68. This is a vanilla 1gb link, so I can't understand how a config like this can't hack that :(
Hmmm, over on the vmware forum, people have been commenting it may be an esxi5 software initiator bug. Setting delayed ack on the initiator has helped some folks. I can also try using the qlogic iscsi hba instead of the SW initiator and see if that makes a difference. Can't try that until later this evening though.