April 11th, 2025

The case of the UI thread that hung in a kernel call

Raymond Chen

A customer asked for help with a longstanding but low-frequency hang that they have never been able to figure out. From what they could tell, their UI thread was calling into the kernel, and the call simply hung for no apparent reason. Unfortunately, the kernel dump couldn’t show a stack from user mode because the stack had been paged out. (Which makes sense, because a hung thread isn’t using its stack, so once the system is under some memory pressure, that stack gets paged out.)

0: kd> !thread 0xffffd18b976ec080 7
THREAD ffffd18b976ec080  Cid 79a0.7f18  Teb: 0000003d7ca28000
    Win32Thread: ffffd18b89a8f170 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
    ffffd18b976ec360  NotificationEvent
Not impersonating
DeviceMap                 ffffad897944d640
Owning Process            ffffd18bcf9ec080       Image:         contoso.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      14112735       Ticks: 1235580 (0:05:21:45.937)
Context Switch Count      1442664        IdealProcessor: 2             
UserTime                  00:02:46.015
KernelTime                00:01:11.515

 nt!KiSwapContext+0x76
 nt!KiSwapThread+0x928
 nt!KiCommitThreadWait+0x370
 nt!KeWaitForSingleObject+0x7a4
 nt!KiSchedulerApc+0xec
 nt!KiDeliverApc+0x5f9
 nt!KiCheckForKernelApcDelivery+0x34
 nt!MiUnlockAndDereferenceVad+0x8d
 nt!MmProtectVirtualMemory+0x312
 nt!NtProtectVirtualMemory+0x1d9
 nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffff8707`a9bef3a0)
 ntdll!ZwProtectVirtualMemory+0x14
 [end of stack trace]

Although we couldn’t see what the code was doing in user mode, there was something unusual in the information that was present.

Observe that the offending thread is Suspended. And it appears to have been suspended for over five hours.

THREAD ffffd18b976ec080  Cid 79a0.7f18  Teb: 0000003d7ca28000
    Win32Thread: ffffd18b89a8f170 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
    ffffd18b976ec360  NotificationEvent
Not impersonating
DeviceMap                 ffffad897944d640
Owning Process            ffffd18bcf9ec080       Image:         contoso.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      14112735       Ticks: 1235580 (0:05:21:45.937)

Naturally, a suspended UI thread is going to manifest itself as a hang.

Functions like SuspendThread exist primarily for debuggers to use, so we asked them if they had a debugger attached to the process when they captured the kernel dump. They said that they did not.

So who suspended the thread, and why?

The customer then realized that they had a watchdog thread which monitors the UI thread for responsiveness, and every so often, it suspends the UI thread, captures a stack trace, and then resumes the UI thread. And in the dump file, they were able to observe their watchdog thread in the middle of its stack trace capturing code. But why was the stack trace capture taking five hours?

The stack of the watchdog thread looks like this:

ntdll!ZwWaitForAlertByThreadId(void)+0x14
ntdll!RtlpAcquireSRWLockSharedContended+0x15a
ntdll!RtlpxLookupFunctionTable+0x180
ntdll!RtlLookupFunctionEntry+0x4d
contoso!GetStackTrace+0x72
contoso!GetStackTraceOfUIThread+0x127
...

Okay, so we see that the watchdog thread is trying to get a stack trace of the UI thread, but it’s hung inside RtlLookupFunctionEntry which is waiting for a lock.

You know who I bet holds the lock?

The UI thread.

Which is suspended.

The UI thread is probably trying to dispatch an exception, which means that it is walking the stack looking for an exception handler. But in the middle of this search, it got suspended by the watchdog thread. Then the watchdog thread tries to walk the stack of the UI thread, but it can’t do that yet because the function table is locked by the UI thread’s stack walk.

This is a practical exam for a previous discussion: Why you should never suspend a thread.

Specifically, the title should say “Why you should never suspend a thread in your own process.” Suspending a thread in your own process runs the risk that the thread you suspended was in possession of some resource that the rest of the program needs. In particular, it might possess a resource that is needed by the code which has responsible for eventually resuming the thread. Since it is suspended, it will never get a chance to release those resources, and you end up with a deadlock between the suspended thread and the thread whose job it is to resume that thread.

If you want to suspend a thread and capture stacks from it, you’ll have to do it from another process, so that you don’t deadlock with the thread you suspended.¹

Bonus chatter: In this kernel stack, you can see evidence that the SuspendThread operates asynchronously. When the watchdog thread calls SuspendThread to suspend the UI thread, the UI thread was in the kernel, in the middle of changing memory protections. The thread does not suspend immediately, but rather waits for the kernel to finish its work, and then before returning to user mode, the kernel does a CheckForKernelApcDelivery to see if there were any requests waiting. It picks up the request to suspend, and that is when the thread actually suspends.²

Bonus bonus chatter: “What if the kernel delayed suspending a thread if it held any user-mode locks? Wouldn’t that avoid this problem?” First of all, how would the kernel even know whether a thread held any user-mode locks? There is no reliable signature for a user-mode lock. After all, you can make a user-mode lock out of any byte of memory by using it as a spin lock. Second, even if the kernel somehow could figure out whether a thread held a user-mode lock, you don’t want that to block thread suspension, because that would let a program make itself un-suspendable! Just call AcquireSRWLockShared(some_global_srwlock) and never call the corresponding Release function. Congratulations, the thread perpetually owns the global lock in shared mode and would therefore now be immune from suspension.

¹ Of course, this also requires that the code that does the suspending does not wait on cross-process resources like semaphores, mutexes, or file locks, because those might be held by the suspended thread.

² The kernel doesn’t suspend the thread immediately because it might be in possession of internal kernel locks, and suspending a thread while it owns a kernel lock (such as the lock that synchronizes access to the page tables) would result in the kernel itself deadlocking!

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

12 comments

Discussion is closed. Login to edit/delete existing comments.

Igor Levicki 1 week ago

@Joshua Hudson
> Process debugging itself is surprisingly natural once you’ve done it once.

Seems to me you have no idea how to write code.

> I tried to build a garbage collector that worked like that. The only reason it didn’t work is it needed prolog/epilog tables, and then x64 came along and made prolog/epilog tables a thing anyway.

Or you could have, you know, just used a garbage collected language like C#? No? Then how about not forgetting to free allocated stuff? It's a solved problem.

> This is about 200 lines of code (because it can’t call malloc or stdio) and...
Read more
@Joshua Hudson
> Process debugging itself is surprisingly natural once you’ve done it once.

Seems to me you have no idea how to write code.

> I tried to build a garbage collector that worked like that. The only reason it didn’t work is it needed prolog/epilog tables, and then x64 came along and made prolog/epilog tables a thing anyway.

Or you could have, you know, just used a garbage collected language like C#? No? Then how about not forgetting to free allocated stuff? It’s a solved problem.

> This is about 200 lines of code (because it can’t call malloc or stdio) and is otherwise actually simpler than attaching to another process and less likely to set off a virus scanner. The only wrinkle is as this customer discovered, getting a stacktrace inside kernel32.dll is broken. Solving it for your own dlls is easy; we just have to annotate the calls to outside functions.

The point that you keep missing is — if you have to write a watchdog to collect stack traces of your UI thread to check for it becoming unresponsive then you are already doing something wrong on the UI thread and you should be fixing that, not adding another thing you shouldn’t be doing in Windows and that is debugging your own process from within.

> Hint: There is no suspend process, only suspend thread.

NtSuspendProcess in ntdll.dll. Try harder.

Read less
Steve April 18, 2025

"you don’t want that to block thread suspension, because that would let a program make itself un-suspendable" - There are some well documented cases and code examples of processes and threads making themselves immune to suspension via SuspendThread. For example a program can avoid suspension by calling CreateThread in a loop and calling ResumeThread for it's own threads creating a race condition between itself and other processes trying to suspend the thread or process as one example, not to mention remote threads injected by other software prevent suspending a process. The MSDN documentation has a page named "Controlling Processes and...
Read more
“you don’t want that to block thread suspension, because that would let a program make itself un-suspendable” – There are some well documented cases and code examples of processes and threads making themselves immune to suspension via SuspendThread. For example a program can avoid suspension by calling CreateThread in a loop and calling ResumeThread for it’s own threads creating a race condition between itself and other processes trying to suspend the thread or process as one example, not to mention remote threads injected by other software prevent suspending a process. The MSDN documentation has a page named “Controlling Processes and Threads” and recommends using the newer ‘freeze’ functionality included with newer versions of Windows when debugging using ~f and ~u since it fixes a number of issues with suspension and especially important while debugging.

Read less
Arben Tapia April 17, 2025

This is great analysis Raymond. I am just peeved by a similar problem but my bet is on Windows Explorer. This is the google query/rant I wrote before I remembered the wonderful “The Old New Thing” of yours (which I have not visited for a long time):
“Why is windows explorer so crappy so it can not sort even an small list of files while is doing some copying?”

Do you think these issues might be related? Thanks!
Ivan Kljajic April 16, 2025

Wouldn’t that be like reintroducing cooperative multitasking to Windows?
Doug Nebeker April 15, 2025

It seems to me that the proposed functions would let any user mode program hang the kernel. The loader lock does cause some pain but I’m sure it’s not there because the Windows devs are dumb or stubborn.
Sigge Mannen April 15, 2025 · Edited

I like the fact that customer is having problems with a hanging thread, but fail to mention out of the gates that they’re using nuclear code like SuspendThread in their application. And also that the Watchdog thread actually made things… less responsive, cue the star wars meme
Joshua Hudson April 12, 2025

No, Raymond. Microsoft did this one to themselves. This problem comes up repeatedly in different contexts.

The actual problem is despite over twenty years of problems caused by the loader lock being inaccessible, Microsoft has neither learned nor listened. Having an inaccessible lock in the middle of the lock graph is not sound engineering.

Microsoft can solve this entire class with two methods:

1) BOOL AcquireLoaderLock(BOOL *acquired);
2) void ReleaseLoaderLock();

And then every time somebody has to do something wild they can take the loader lock and make sure this nonsense doesn't happen.

I'm pretty sure this just works here because nothing should be able...
Read more
No, Raymond. Microsoft did this one to themselves. This problem comes up repeatedly in different contexts.

The actual problem is despite over twenty years of problems caused by the loader lock being inaccessible, Microsoft has neither learned nor listened. Having an inaccessible lock in the middle of the lock graph is not sound engineering.

Microsoft can solve this entire class with two methods:

1) BOOL AcquireLoaderLock(BOOL *acquired);
2) void ReleaseLoaderLock();

And then every time somebody has to do something wild they can take the loader lock and make sure this nonsense doesn’t happen.

I’m pretty sure this just works here because nothing should be able to acquire the function table lock exclusively without first acquiring the loader lock; otherwise the lock graph would have a cycle.

Read less
- Raymond Chen Author 1 week ago
  
  The loader lock is not involved in this deadlock. The loader lock is not held. The problem is with the function table lock, not the loader lock. The problem is that taking a stack trace requires the function table lock, so you are in trouble if you suspend a thread while it also happens to hold the function table lock.
- Steve April 18, 2025
  
  I've debugged a lot of software and seen a lot of issues over the last 20 years but the loader lock has never been a problem?
  
  Your "Microsoft can solve this" idea by using try/acquire/release semantics with the loader lock is exactly what Windows NT has been doing for +30 years. Windows 2000 and XP had a single function used for the loader lock and does the 'try/acquire' part using the TryEnterCriticalSection function and 'release' using the LeaveCriticalSection with the address of loader lock - versions of Windows after XP probably have lots of changes but essentially the same.
  
  Read more
  I’ve debugged a lot of software and seen a lot of issues over the last 20 years but the loader lock has never been a problem?
  
  Your “Microsoft can solve this” idea by using try/acquire/release semantics with the loader lock is exactly what Windows NT has been doing for +30 years. Windows 2000 and XP had a single function used for the loader lock and does the ‘try/acquire’ part using the TryEnterCriticalSection function and ‘release’ using the LeaveCriticalSection with the address of loader lock – versions of Windows after XP probably have lots of changes but essentially the same.
  
  Read less
- Chris Sarbora April 17, 2025
  What does any particular lock at all have to do with this? The situations appears to be the following:
  
  <code>
  
  The problem is not any sort of WINAPI lock ordering, it's that the application is making assumptions about object consistency that it cannot make in an asynchronous-interruption state. This could be equally invalid without any locks at all:
  
  <code>
  
  The problem is not the loader lock availability, it's that the application is violating its own concurrency expectations.
  
  Read more
  What does any particular lock at all have to do with this? The situations appears to be the following:
  
  // UI Thread // .. la di da, just ui thread things .. { std::unique_lock lock{gFooMutex}; gFoo = doSomeStuffLocked(); // GOT SUSPENDED doSomeOtherStuffLocked(gFoo); } // release gSomeMutex // Watchdog Thread std::unique_ptr GetStacktrace(Thread target) { std::unique_ptr trace; SuspendThread(target); { std::shared_lock lock{gSomeMutex}; // ruh roh trace = CreateTrace(target, gFoo); // y i am dedlock? } ResumeThread(target); return trace; }
  
  The problem is not any sort of WINAPI lock ordering, it’s that the application is making assumptions about object consistency that it cannot make in an asynchronous-interruption state. This could be equally invalid without any locks at all:
  
  // UI Thread // .. la di da, just ui thread things .. { gFoo.Length = newLength; // old Length < newLength // GOT SUSPENDED gFoo.Data = newDataPtr; } // Watchdog Thread std::unique_ptr GetStacktrace(Thread target) { SuspendThread(target); std::unique_ptr trace = CreateTrace(target, gFoo.Data, gFoo.Length); // y i am crash?? ResumeThread(target); return trace; }
  
  The problem is not the loader lock availability, it’s that the application is violating its own concurrency expectations.
  Read less
- Igor Levicki April 15, 2025 · Edited
  
  Sorry, but I am with Raymond on this one.
  
  > And then every time somebody has to do something wild...
  
  Someone doing something wild (stupid?) is how we got here in the first place, with the wild thing in this case being "process trying to debug itself" (or "watchdog living inside the thing it is supposed to be watching" if the first explanation didn't trigger alarms for you).
  
  What would solve the problem is customer using external watchdog process to monitor UI thread -- post a custom WM_HEARTBEAT message to UI thread every second and if it doesn't respond suspend the whole process...
  Read more
  Sorry, but I am with Raymond on this one.
  
  > And then every time somebody has to do something wild…
  
  Someone doing something wild (stupid?) is how we got here in the first place, with the wild thing in this case being “process trying to debug itself” (or “watchdog living inside the thing it is supposed to be watching” if the first explanation didn’t trigger alarms for you).
  
  What would solve the problem is customer using external watchdog process to monitor UI thread — post a custom WM_HEARTBEAT message to UI thread every second and if it doesn’t respond suspend the whole process and create a minidump instead of fiddling with function table locks and UI thread stack traces.
  
  Access to loader lock has nothing to do with this.
  
  Read less
  - Joshua Hudson April 15, 2025
    
    "process trying to debug itself"
    
    Seems far more reasonable to me than the last time I had to reach for SuspendThread. Process debugging itself is surprisingly natural once you've done it once. I tried to build a garbage collector that worked like that. The only reason it didn't work is it needed prolog/epilog tables, and then x64 came along and made prolog/epilog tables a thing anyway.
    
    "watchdog living inside the thing it is supposed to be watching"
    
    This is about 200 lines of code (because it can't call malloc or stdio) and is otherwise actually simpler than attaching to another process and less...
    Read more
    “process trying to debug itself”
    
    Seems far more reasonable to me than the last time I had to reach for SuspendThread. Process debugging itself is surprisingly natural once you’ve done it once. I tried to build a garbage collector that worked like that. The only reason it didn’t work is it needed prolog/epilog tables, and then x64 came along and made prolog/epilog tables a thing anyway.
    
    “watchdog living inside the thing it is supposed to be watching”
    
    This is about 200 lines of code (because it can’t call malloc or stdio) and is otherwise actually simpler than attaching to another process and less likely to set off a virus scanner. The only wrinkle is as this customer discovered, getting a stacktrace inside kernel32.dll is broken. Solving it for your own dlls is easy; we just have to annotate the calls to outside functions.
    
    Hint: There is no suspend process, only suspend thread.
    
    Read less