Well, I'm at the
# cat /sys/kernel/debug/kmemleak
par of my day.
I guess it's my turn to be "it's a kernel bug!" guy.
Turned out there were no locks around the atomic commit so it was only atomic if it page flipping was synchronous. If another commit came in it would clobber the commit obj and/or abandon the completion event. Vendor solution in the unholy fork was to disable async page flipping to a terrible performance hit. But they also actually fixed the bug later. :facepalm:
f'realz tho...
# grep drm_atomic_helper_commit /sys/kernel/debug/kmemleak | wc -l
51838