3

I've been given a 20 core CPU with an intermittent fault on some core. Allegedly it's confirmed by Intel but no further details and not replaceable under warranty, or so Im told (I can't check now). CPU returned to owner and offered to me. If I disable that core in the BIOS I might have a free 19 core CPU. Fun times!

To identify the defective core prior to stress testing and use, I could run software repeatedly with affinity for one or few cores, and see which runs eventually fault. Or disable each core in turn in the BIOS. But

it seems more elegant to run something like 40 threads of prime95 or some other stressy software, or one prone to causing bugchecks on this CPU as a whole. When the system BSODs, I could in theory use WinDbg to find the faulting thread, and from that, the physical (or is it logical?) core it was running on at bugcheck, disable that specific identified core in the BIOS, and recheck CPU reliability, in case there's a 2nd defective core I don't know about.

I could run 20 copies of some single thread software, but I'd like to use this as a "teaching moment".

How would I use WinDbg on a crash dump, to trace the physical core number on which the thread ran which bugchecked?

Stilez
  • 1,825

1 Answers1

3

I know it's a year late, but since no one answered. You can issue !running -i -t in a windbg debugger connected kernel or on a minidump, that will shows all thread callstacks running on logical cores.

If you have 1 socket, with 4 physical cores, with Hyperthreading/Simultaneous Multi Threading, the OS will see 8 logical cores, hence you have 8 active threads.

Here's a sample my target machibe:

6: kd> !running -i -t

System Processors: (00000000000000ff) Idle Processors: (00000000000000ba)

   Prcbs             Current         (pri) Next            (pri) Idle

0 fffff80633125180 ffffa5067382c080 ( 8) fffff80635638400 ................

Child-SP RetAddr Call Site

00 ffffb585704d6878 fffff80635273151 0xfffff80634f90003 01 ffffb585704d6880 fffff8063533591c nt!HvcallpExtendedFastHypercall+0x51 02 ffffb585704d6890 fffff80635335b10 nt!HvlpFastFlushListTb+0xac 03 ffffb585704d6950 fffff806353355f3 nt!HvlpFlushRangeListTb+0x88 04 ffffb585704d69b0 fffff806352f9642 nt!HvlFlushRangeListTb+0x63 [..] 1c ffffb585704d78b0 fffff806356df621 nt!IopSynchronousServiceTail+0x2d8 1d ffffb585704d7950 fffff8063527ac15 nt!NtQueryVolumeInformationFile+0x431 1e ffffb585704d7a10 00007ffb4ec9c9e4 nt!KiSystemServiceCopyEnd+0x25 1f 0000000f24cfed48 0000000000000000 0x00007ffb4ec9c9e4

1 ffff918159ea5180 ffff918159eb6240 ( 0) ffff918159eb6240 ................

Child-SP RetAddr Call Site

00 ffffb5856e22f720 fffff8063516269c nt!PpmIdleGuestExecute+0x1e 01 ffffb5856e22f760 fffff80635161dee nt!PpmIdleExecuteTransition+0x70c 02 ffffb5856e22fa80 fffff8063526ce88 nt!PoIdle+0x36e 03 ffffb5856e22fbe0 0000000000000000 nt!KiIdleLoop+0x48

2 ffff918159f54180 ffffa506714860c0 (15) ffff918159f65240 ................

Child-SP RetAddr Call Site

00 ffffb5856f42f920 fffff806351b9ee7 nt!KeAbPostRelease+0x11f 01 ffffb5856f42f970 fffff8026a4f3f5c nt!ExReleasePushLockSharedEx+0x37 02 ffffb5856f42f9b0 fffff02f9262d81a dxgkrnl!CCompositionFrameCollection::FindCompositionFrame+0x9c 03 ffffb5856f42f9f0 fffff8063527ac15 win32kbase!NtDCompositionGetFrameSurfaceUpdates+0x14a 04 ffffb5856f42fa80 00007ffb4c8136e4 nt!KiSystemServiceCopyEnd+0x25 05 0000000da5cff2b8 0000000000000000 0x00007ffb`4c8136e4

3 ffff91815a080180 ffff91815a091240 ( 0) ffff91815a091240 ................

Child-SP RetAddr Call Site

00 ffffb5856e24f720 fffff8063516269c nt!PpmIdleGuestExecute+0x1e 01 ffffb5856e24f760 fffff80635161dee nt!PpmIdleExecuteTransition+0x70c 02 ffffb5856e24fa80 fffff8063526ce88 nt!PoIdle+0x36e 03 ffffb5856e24fbe0 0000000000000000 nt!KiIdleLoop+0x48

4 ffff91815a180180 ffff91815a191240 ( 0) ffff91815a191240 ................

Child-SP RetAddr Call Site

00 ffffb5856e25f720 fffff8063516269c nt!PpmIdleGuestExecute+0x1e 01 ffffb5856e25f760 fffff80635161dee nt!PpmIdleExecuteTransition+0x70c 02 ffffb5856e25fa80 fffff8063526ce88 nt!PoIdle+0x36e 03 ffffb5856e25fbe0 0000000000000000 nt!KiIdleLoop+0x48

5 ffff91815a1d9180 ffff91815a1ea240 ( 0) ffff91815a1ea240 ................

Child-SP RetAddr Call Site

00 ffffb5856e26f720 fffff8063516269c nt!PpmIdleGuestExecute+0x1e 01 ffffb5856e26f760 fffff80635161dee nt!PpmIdleExecuteTransition+0x70c 02 ffffb5856e26fa80 fffff8063526ce88 nt!PoIdle+0x36e 03 ffffb5856e26fbe0 0000000000000000 nt!KiIdleLoop+0x48

6 ffff91815a306180 ffffa5066f961040 (13) ffff91815a317240 ................

Child-SP RetAddr Call Site

00 ffffb5856fa66df0 fffff8026bf01497 kmsysinfo!GetProcTopology+0x2ab [C:\Users\qwe\source\repos\kmsysinfo\cpufeatures.c @ 703] 01 ffffb5856fa66ec0 fffff8026bf0278f kmsysinfo!GetCpuInfo+0x157 [C:\Users\qwe\source\repos\kmsysinfo\cpufeatures.c @ 782] 02 ffffb5856fa677e0 fffff8026bf11020 kmsysinfo!DriverEntry+0x15f [C:\Users\qwe\source\repos\kmsysinfo\hpkmsysinfo.c @ 402] 03 ffffb5856fa678a0 fffff8063579f72e kmsysinfo!GsDriverEntry+0x20 [minkernel\tools\gs_support\kmodefastfail\gs_driverentry.c @ 47] 04 ffffb5856fa678d0 fffff8063579ee6e nt!IopLoadDriver+0x4c2 05 ffffb5856fa67ab0 fffff8063519b3c5 nt!IopLoadUnloadDriver+0x4e 06 ffffb5856fa67af0 fffff80635112ce5 nt!ExpWorkerThread+0x105 07 ffffb5856fa67b90 fffff806352709ca nt!PspSystemThreadStartup+0x55 08 ffffb5856fa67be0 0000000000000000 nt!KiStartSystemThread+0x2a

7 ffff91815a400180 ffff91815a411240 ( 0) ffff91815a411240 ................

Child-SP RetAddr Call Site

00 ffffb5856e28f720 fffff8063516269c nt!PpmIdleGuestExecute+0x1e 01 ffffb5856e28f760 fffff80635161dee nt!PpmIdleExecuteTransition+0x70c 02 ffffb5856e28fa80 fffff8063526ce88 nt!PoIdle+0x36e 03 ffffb5856e28fbe0 0000000000000000 nt!KiIdleLoop+0x48

What we see is 8 threads, and you can judge what they're doing based on callstack with public MS symbols loaded.

  • Thread 0: something related to a NtQueryVolumeInformationFile and the hypervisor
  • Thread 1: idle
  • Thread 2: something related to graphics/directx
  • Thread 3: idle
  • Thread 4: idle
  • Thread 5: idle
  • Thread 6: my kernel driver I'm writing, hit a DebugBreak()
  • Thread 7: idle

In your scenario the bugcheck will be clearly seen on a thread callstack.

I'm afraid I don't think windbg can present you with the physical core/socket, there aren't even any WDM APIs that give you this, even though WDM APIs may name KeQueryActiveProcessors and *processors related kernel APIs, they always return you the logical core info. Windows does not really distinguish between physical cores, only the BIOS does that.

However you can query this info directly from the chipset via CPUID.0xB calling OS APIs to set your kernel thread affinity iterating through all logical cores, querying x2APIC IDs, a bunch of crazy bit shifting as defined in intel-64-architecture-processor-topology-enumeration.pdf, then you can obtain package count (sockets), physical cores and logical cores. Unfortunately this all requires kernel code, and won't work on a minidump. Maybe someone will enlighten me on windbg tips I didn't know about to do the above.