Question about Used Space and refreservation

Added by David Marker about 1 year ago

We're using Nexenta in our vmware production environment. We recently ran into an issue where our SAN filled up to the point that our auto-sync jobs didn't have enough space left to create their snapshots, so they would fail. Bear in mind that we've come nowhere close to filling our vmfs datastores. The zfs dataset sizes were only a couple hundred GB. We've found that in our particular case, the usedbyrefreservation was using all or almost all of our zvol. Is this the typical case when you put a vmfs datastore on a zvol? The calculation for used space includes usedbyrefreservation+usedbydataset. In my mind, for our particular use case, this is incorrectly indicating the size of the used space to the main zpool and under-reporting the amount of free space left for new zvols, snapshots, and other space consuming objects.

Am I incorrect in my thinking? Would it make more sense to drop into !bash and create our zvols using the reservation option instead? If we create a vmfs datastore, does that definitely make sure the full datastore amount of space is available for the datastore?


Replies

RE: Question about Used Space and refreservation - Added by Ryan W about 1 year ago

When I set refreservation on one of my ZVOLs (I did it just now) the USED portion of 'zfs list' jumped from 100GB to 300GB. So the accounting should be right.

What I've always had a problem with is the fact the datasets status page is using the zpool volume sizes, which are the pre-baked figures before accounting for parity, metadata and redundancy. I've raised this issue with Nexenta a number of times, but their answer presently is they use these numbers because that's what you license based off (RAW Capacity) even though your actual capacity is less.

Perhaps that's where you ran out of space? eyeballing AVAIL/Free in the data management GUI? That figure did not change when I set refreservation.

zpool list
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
raidpool  1.63T   913G   755G    54%  1.00x  ONLINE  -
syspool    136G  19.3G   117G    14%  1.00x  ONLINE  -
tank      3.25T   282G  2.97T     8%  1.00x  ONLINE  -
root@calculon:/volumes# zfs set refreserv=300GB raidpool/esxi_zvol
root@calculon:/volumes# zpool list
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
raidpool  1.63T   913G   755G    54%  1.00x  ONLINE  -
syspool    136G  19.3G   117G    14%  1.00x  ONLINE  -
tank      3.25T   282G  2.97T     8%  1.00x  ONLINE  -
 zfs set refreserv=none raidpool/esxi_zvol
root@calculon:/volumes# zpool list
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
raidpool  1.63T   913G   755G    54%  1.00x  ONLINE  -
syspool    136G  19.3G   117G    14%  1.00x  ONLINE  -
tank      3.25T   282G  2.97T     8%  1.00x  ONLINE  -
zfs set refreserv=300GB raidpool/esxi_zvol
root@calculon:/volumes# zfs list raidpool
NAME       USED  AVAIL  REFER  MOUNTPOINT
raidpool   757G  64.2G    60K  /volumes/raidpool
root@calculon:/volumes# zfs set refreserv=none raidpool/esxi_zvol
root@calculon:/volumes# zfs list raidpool
NAME       USED  AVAIL  REFER  MOUNTPOINT
raidpool   457G   364G    60K  /volumes/raidpool

RE: Question about Used Space and refreservation - Added by David Marker about 1 year ago

No, it was definitely an issue where zfs reported only a few GB of space left in the main pool. After looking at our numbers again, we're suspecting that somewhere between various auto-syncs back and forth between our production cluster and replication cluster during the testing phase, the usedbyrefreservation value was somehow increased/corrupted. The additional zvol we added for backups after we went into production is fine and reporting correctly. We've svmotioned a couple of machines to a newly-created replacement zvol and everything appears to be reporting correctly now.

RE: Question about Used Space and refreservation - Added by David Marker about 1 year ago

Well, I have apparently spoken too soon. Our box crashed overnight during an auto-sync job and now we're having the issue crop up again. Here's what our zvol looks like:

root@I6K-Left:/volumes# zfs get all JJC/ZVOL01
NAME        PROPERTY              VALUE                  SOURCE
JJC/ZVOL01  type                  volume                 -
JJC/ZVOL01  creation              Wed Dec  7 10:04 2011  -
JJC/ZVOL01  used                  2.24T                  -
JJC/ZVOL01  available             4.63T                  -
JJC/ZVOL01  referenced            703G                   -
JJC/ZVOL01  compressratio         1.18x                  -
JJC/ZVOL01  reservation           none                   local
JJC/ZVOL01  volsize               2T                     local
JJC/ZVOL01  volblocksize          64K                    -
JJC/ZVOL01  checksum              on                     default
JJC/ZVOL01  compression           on                     local
JJC/ZVOL01  readonly              off                    default
JJC/ZVOL01  copies                1                      default
JJC/ZVOL01  refreservation        2T                     local
JJC/ZVOL01  primarycache          all                    default
JJC/ZVOL01  secondarycache        all                    default
JJC/ZVOL01  usedbysnapshots       1.37G                  -
JJC/ZVOL01  usedbydataset         703G                   -
JJC/ZVOL01  usedbychildren        0                      -
JJC/ZVOL01  usedbyrefreservation  1.55T                  -
JJC/ZVOL01  logbias               latency                default
JJC/ZVOL01  dedup                 off                    default
JJC/ZVOL01  mlslabel              none                   default
JJC/ZVOL01  sync                  standard               default
JJC/ZVOL01  nms:swap              no                     local
JJC/ZVOL01  nms:description       JJC 1st zvol           local

The usedbyrefreservation and usedbydataset were adding up to 2TB perfectly until after the system crashed. This may be turning into more of a Help issue than a discussion..

RE: Question about Used Space and refreservation - Added by Linda Kateley about 1 year ago

if you have a support contract, it might be a good time to use it. This appears to be a bug. You can report it as a bug in the tool as a community member.

do you have any log data from the crash? crashes typically are caused by failed hardware.

RE: Question about Used Space and refreservation - Added by David Marker about 1 year ago

Yes, we have gold support. I've already got another ticket open with them regarding an issue with auto-sync. I last commented in that issue on Nov. 28th and haven't heard back since then. Regardless, I'll put in another ticket and see what happens.

RE: Question about Used Space and refreservation - Added by Linda Kateley about 1 year ago

David, if you have a number can you send it to me in email and i will follow up? linda.kateley@nexenta.com