NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain

Added by Edmund White over 2 years ago

I've an x4540 Sun storage server running NexentaStor Enterprise. It's serving NFS over 10GbE CX4 for several VMWare vSphere hosts. There are 30 virtual machines running.

For the past few weeks, I've had random crashes spaced 10-14 days apart. This system used to open OpenSolaris and was stable in that arrangement. The crashes trigger the automated system recovery feature on the hardware, forcing a hard system reset.

Here's the output from mdb debugger:

panic[cpu5]/thread=ffffff003fefbc60: 
Deadlock: cycle in blocking chain
ffffff003fefb570 genunix:turnstile_block+795 ()
ffffff003fefb5d0 unix:mutex_vector_enter+261 ()
ffffff003fefb630 zfs:dbuf_find+5d ()
ffffff003fefb6c0 zfs:dbuf_hold_impl+59 ()
ffffff003fefb700 zfs:dbuf_hold+2e ()
ffffff003fefb780 zfs:dmu_buf_hold+8e ()
ffffff003fefb820 zfs:zap_lockdir+6d ()
ffffff003fefb8b0 zfs:zap_update+5b ()
ffffff003fefb930 zfs:zap_increment+9b ()
ffffff003fefb9b0 zfs:zap_increment_int+68 ()
ffffff003fefba10 zfs:do_userquota_update+8a ()
ffffff003fefba70 zfs:dmu_objset_do_userquota_updates+de ()
ffffff003fefbaf0 zfs:dsl_pool_sync+112 ()
ffffff003fefbba0 zfs:spa_sync+37b ()
ffffff003fefbc40 zfs:txg_sync_thread+247 ()
ffffff003fefbc50 unix:thread_start+8 ()

What's going on?


Replies

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Jan Krupka about 1 year ago

Hi,

Same issue here this night, the first time.

Jun  1 00:07:14 nexenta1 genunix: [ID 783603 kern.notice] Deadlock: cycle in blocking chain
Jun  1 00:07:14 nexenta1 unix: [ID 100000 kern.notice]
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863570 genunix:turnstile_block+795 ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff00118635d0 unix:mutex_vector_enter+261 ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863630 zfs:dbuf_find+5d ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff00118636c0 zfs:dbuf_hold_impl+59 ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863700 zfs:dbuf_hold+2e ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863780 zfs:dmu_buf_hold+8e ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863820 zfs:zap_lockdir+6d ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff00118638b0 zfs:zap_update+5b ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863930 zfs:zap_increment+9b ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff00118639b0 zfs:zap_increment_int+68 ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863a10 zfs:do_userquota_update+8a ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863a70 zfs:dmu_objset_do_userquota_updates+de ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863af0 zfs:dsl_pool_sync+112 ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863ba0 zfs:spa_sync+37b ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863c40 zfs:txg_sync_thread+247 ()
Jun  1 00:07:14 nexenta1 genunix: [ID 655072 kern.notice] ffffff0011863c50 unix:thread_start+8 ()

This machine serving nfs for vmware esxi over aggr of four 1 GbE.

Edmund, do you have any information about this crash type?

Regards Jan

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Edmond DeMattia about 1 year ago

I just setup Nexenta on my X4540 and am experiencing the exact same issues! Does anyone have any information about this? I even had Oracle out to look at the hardware but no hardware errors have been identified.

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Linda Kateley about 1 year ago

Is there anyone here with a support contract where we could take a deeper look at this? It looks like is should be a hardware issue but not with all the zfs errors. I remember on those boxes there used to be an issue with the controllers. Have you looked to see if there is a firmware upgrades?

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Dan Swartzendruber about 1 year ago

Huh? Linda, how is this hardware related? The kernel is complaining of a kernel lock botch! I've seen similar bug reports on other sites (not related to nexenta.)

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Linda Kateley about 1 year ago

If i rememeber correctly(and it has been several years since i've worked with a thumper) the old bug was, zfs is sending zio to be queued on the controller and it doesn't want to take it or the logic was such that zio is sending something the controller doesn't understand.

I will do some research on it and see what we have on it.

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Edmond DeMattia about 1 year ago

Thank you for the help. I upgraded the BIOS/ILOM to the latest firmware release for the X4540. I also decided to reinstall Nexenta 3.1.2 as I still haven't been able to put this storage array into production. Now during the second wizard just before pool creation (just after the list of disks in the system), I get a kernel panic. I can reproduce the kernel panic if I do a zpool import as well. At one time I did have pools setup on this filer but have deleted them. Is there something corrupt on the disks that's causing the panic at this stage of the wizard?

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Linda Kateley about 1 year ago

I did a quick internet scan and found a few of these with importing opensolaris pools into nexenta. Most were solved by reworking the pools to be nexenta.

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Edmond DeMattia about 1 year ago

The thing is I'm not trying to import the pools. When I do a zpool list, no pools (besides sysvol) come up. This has been very strange indeed...

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Dan Swartzendruber about 1 year ago

Linda Kateley wrote:

If i rememeber correctly(and it has been several years since i've worked with a thumper) the old bug was, zfs is sending zio to be queued on the controller and it doesn't want to take it or the logic was such that zio is sending something the controller doesn't understand.

I will do some research on it and see what we have on it.

I'd love to see what you turn up. No offense, but the HW/controller thing doesn't make sense to me. I'm a kernel engineer, and have to deal with lock hierarchies and such all the time, and I don't care what HW/firmware issues you have - the kernel should not be hitting an assertion like this.

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Linda Kateley about 1 year ago

I am thinking you might have a bad drive. Can you run fmadm faulty and fmdump -V?

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Edmond DeMattia about 1 year ago

I did have a bad drive last week and replaced it with a drive that once was formated NTFS. I did this when the server was powered off, just before rebuilding Nexenta. I had no need for the old zpools so I didn't take any steps to prep the drive.

Could this be whats causing my problem? Is it scanning the drives for exported zpools and coming across this NTFS formated drive in the slot what once was part of the exported zpool?

If so, how can I reinitialize all the drives so no stale exported or corrupted zpools exist?

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Dan Swartzendruber about 1 year ago

If this did prevent a crash, it's a bandaid, not a fix. There is a kernel or driver bug.

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Edmond DeMattia about 1 year ago

the problem I'm having now is new, the problem I was having before I replaced the drive and reinstalled Nexenta hasn't been resolved as of yet. I can't get back to a configured system to do additional testing until I get past the kernel panic when going through the Wizard at pool creation time.

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Linda Kateley about 1 year ago

Yes i will report as a bug, if we are conclusive. I have sent this issue back to eng for review..

It would be nice if this was under support so we could look at core if you have it and have some more examination? maybe you can just upload core onto the downloads page?

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Edmond DeMattia about 1 year ago

Unfortunately I cant. The filer is in a classified facility so I can't even export logs. It's very frustrating, I assure you.

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Edmund White about 1 year ago

I solved my original problem by wiping and rebuilding the pools. They were originally imported from an OpenSolaris instance on the same hardware. Nexenta did not behave well until I wiped the pools and formatted with the proper zpool versions. Further issues were caused by NexentaStor's poor handling of the SATA drives in the x4540 unit. Support essentially told me that SATA was not supported/recommended (even though I was using purpose-built ZFS hardware), so I abandoned the x4540 and built a new SAS-based solution on HP ProLiant hardware.

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Edmond DeMattia about 1 year ago

Can you tell me how you wiped the pools and formatted the disks?

At this point, there are no pools (that I can see) on the disks. The pools that were on the disks were created in Nexenta, but with the repeated kernel panics, I'm sure everything is corrupt.

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Edmund White about 1 year ago

zpool destroy... and then I recreated everything. Also see the answer on the related post on Server Fault.

RE: NexentaStor kernel panic/crash on Sun x4540 - Deadlock: cycle in blocking chain - Added by Linda Kateley about 1 year ago

Before you reimage can see if you can upload core to this site under downloads?