Write Cache flushing on HBA for ZFS Virtual Machine on ESX
Any help with this would be greatly appreciated.
We're trialling running a couple of ZFS (Nexenta 3.1.3) virtual machines to provide NFS storage to our VMWare Farm. I realise best practice is to have use PassThrough to give the VM direct control of a HBA (and disks) but unfortunately our hosts (DL385 G6's) don't support it. We're therefore stuck with physical disks being presented to ESX, formatted to VMFS, vmdk created and presented to ZFS VM. At the moment we're also using the onboard Raid controllers (P410/512 w/ BBWC) and creating RAID0 volumes from each disk.
I now need to add additional drives and would ideally buy HBAs (e.g. LSI 9211-8i) as these could be reused in a future PassThrough/Baremetal build.
QUESTION 1: Does anyone know if a "Cache Flush" command issued from ZFS to a vmdk disk would be passed through to the underlying disk (or how to test). If not then I would need to disable the Physical Drive Write Caches which I believe would kill performance.
QUESTION 2: Does anyone know if the Physical Drive Write Caches could be enabled (and survive reboot) on drives connected via a HBA to ESX.
I'd probably keep the ZIL behind the Raid controller so the main issue is the "Cache Flushes" issued when txg's are committed to the Pool disks every 5s.
- 4 vCPU, 16GB RAM
- 1 zpool (3 mirrors of 2x500gb 7.2k sas drives, 1x Intel320 L2ARC and 1x Overprovisioned Intel320 ZIL)
- dd 4k seq write - fdatasync (or dysnc with sync disabled) = 110MB/s
- dd 4k seq write - dsync = 8MB/s
- iometer vm 4k seq write QD1 - (sync disabled) = 5.68MB/s
- iometer vm 4k seq write QD1 – (sync standard) = 3.2MB/s
- disabling zfs cache flush makes no difference so controller must ignore those commands (or they’re not making it through ESX to the physical controller).
No, the cache flush is not passed through to the underlying disk. ESXi buffers the IO.
Im not 100% aware of all the ESXi Parameters. There are some parameters in the ESXi which iunfluences this:
BufferCache.SoftMaxDirty : Flush immediately if this many buffers are dirty (percent)
BufferCache.HardMaxDirty : Block writers if this many buffers are dirty (percent)
BufferCache.FlushInterval : Flush periodically at this interval (msec)
the on disk write cache normaly doesn't survive a powerloss but modern enterprise class SSD's
do have a supercap to buffer there internal DRAM until the Data is written to stable Storage.
So for the ZIL you should alwas use a good SSD, it's your data you lose.
The best SSD's are:
also a cheaper one with Supercap:
Heiko, Thanks. So on the basis that the cache flush isn't passed through from ESX, I'd conclude that you have to use controllers with BBWC if you're going to run an ZFS VSA on ESX without passthrough.
Answer2: Sorry, I actually meant for the write cache to stay enabled after reboot rather than for the content in the cache to survive. But this is irrelevant if we can't use HBAs in this setup.
Hi the best is to use absolutely no RAID Controllers for the Datastore
map with passthrough or RDM the disks to the NexentaStor VM.
Think what will happen if you lose the power of your server in your configuration? The ESX Buffers are lost. ZFS thinks everything was written, so you should put an UPS in front of the Server!
Heiko, Thanks. I agree it's best to use HBA's with Passthrough if available. I hadn't really considered RDM so might research that option a bit more.
btw. The following posts seem to indicate that ESX doesn't cache writes from VMs, only from userland apps so I think integrity should be ok. (The server's got dual PSU's to redundant feeds in a Tier4 datacenter but I still want it to survive if power was ever lost - hard power reset etc).
thats why i recomend to use no raid controller with ZFS and only configure the discs as JBOD's to the Nexenta VM!
Raid Controllers are to slow and causes Problems when ZFS has no direct access to the disc!
So my opinion would be in your configuration to have RDM mapping's of every single disc into the VM!
could you test it with RDM?