Merge pull request #46 from microsoft/bw

Formatting and changes for debug session
This commit is contained in:
Bob Ward 2023-03-04 22:30:40 -06:00 коммит произвёл GitHub
Родитель 76f9e3cf6b 8b2aa326a8
Коммит e261d00dea
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
9 изменённых файлов: 84 добавлений и 80 удалений

Просмотреть файл

@ -1 +1 @@
windbg -y c:\debgugsql\adm64 -z SQLDump0008.mdmp
windbg -y srv*c:\symbols*https://msdl.microsoft.com/download/symbols;c:\debugsql\debug\symbols.pri\amd64\dll;c:\debugsql\debug\symbols.pri\amd64\exe -z SQLDump0001.mdmp

Просмотреть файл

@ -1 +1 @@
windbg -y srv*c:\symbols*http://msdl.microsoft.com/download/symbols -pn sqlservr.exe
windbg -y srv*c:\symbols*https://msdl.microsoft.com/download/symbols -pn sqlservr.exe

Просмотреть файл

@ -1,52 +1,49 @@
# Demos for an Access Violation
1. Load up crash.sln in Visual Studio
1. Load up **crash.sln** in Visual Studio
Show the basics of the code
Hit Play
Show the console window with exit code. The number in hex is C0000005. Show winnt.h for what C0000005 means which is STATUS_ACCESS_VIOLATION
Now uncomment the code for SEH and show the behavior. That is what SQL Server tries to do for many AVs.
- Show the basics of the code
- Hit Play
- Show the console window with exit code. The number in hex is **C0000005**. Show **winnt.h** for what C0000005 means which is STATUS_ACCESS_VIOLATION
- Now uncomment the code for SEH and show the behavior. That is what SQL Server tries to do for many AVs.
2. Show an AV with SQL Server
Stop the SQL Server Service
Run the debug server from the debugsql directory using sqlservr.cmd
Load crash.sql and talk about DBCC UNITTEST
Show the ERRORLOG
Call stack is in the ERRORLOG but let's use the debugger
Go to the c:\debugsql\sqlservr\log directory. Talk about the SQLDUMPER_ERRORLOG.LOG file
Edit debugav.cmd to put in the name of the MDMP file and run it.
Show the header at the top of the debugger
To get the right AV context type in .ecxr
Type in k to see the callstack
Show the assembly instruction for the AV. Type r to see registers and show it is a write to rax which is 0.
- Stop the SQL Server Service
- Run the debug server from the debugsql\debug\sqlservr directory using **sqlservr.cmd**
- Load **crash.sql** and talk about **DBCC UNITTEST**
- Show the ERRORLOG
- Call stack is in the ERRORLOG but let's use the debugger
- Go to the c:\debugsql\debug\sqlservr\log directory. Talk about the SQLDUMPER_ERRORLOG.LOG file
- Edit **debugav.cmd** to put in the name of the MDMP file and run it.
- Show the header at the top of the debugger
- To get the right AV context type in `.ecxr`
- Type in `k` to see the callstack
- Show the assembly instruction for the AV. Type `r` to see registers and show it is a write to **rax** which is 0.
3. Now let's have some fun and cause our own AV
Stop the debug server
Startup the SQL Server service
Attach the debugger with debugsql.cmd
Set a breakpoint with this command
- Stop the debug server
- Startup the normal SQL Server service
- Attach the debugger with **debugsql.cmd**
- Set a breakpoint with this command
bp sqllang!process_commands
g
`bp sqllang!process_commands`
`g`
Try to run a query with SSMS
4. Try to run a query with SSMS
Trace the assembly until you get to this line of code
- Trace the assembly until you get to this line of code
mov rax,qword ptr [rcx] ds:000001b3`11138f40={sqllang!CBatch::`vftable' (00007ffb`68058478)}
```mov rax,qword ptr [rcx] ds:000001b3`11138f40={sqllang!CBatch::`vftable' (00007ffb`68058478)}```
Now type in
bd 0
r rcx=0
gu
`bd 0`
`r rcx=0`
`gu`
`g`
Go look at the ERRORLOG directory and see the AV. Notice there is no input buffer because we hit an AV before the server read in the query
- Go look at the ERRORLOG directory and see the AV. Notice there is no input buffer because we hit an AV before the server read in the query
Detach the debugger and shutdown SQL Server
Use the debugger on a live SQL Server and change a register to cause an AV using public symbols – Attach debugger and set breakpoint at sqllang!process_commands. After breakpoint hit hit “t” until you come to this line of code mov rax,qword ptr [rcx] ds:000001b3`11138f40={sqllang!CBatch::`vftable' (00007ffb`68058478)}. Then execute r rcx=0 and gu. Look at ERRORLOG.
5. Detach the debugger and shutdown SQL Server

Просмотреть файл

@ -0,0 +1 @@
windbg -y srv*c:\symbols*https://msdl.microsoft.com/download/symbols;c:\debugsql\debug\symbols.pri\amd64\dll;c:\debugsql\debug\symbols.pri\amd64\exe -z SQLDump0002.mdmp

Просмотреть файл

@ -2,38 +2,37 @@
1. Run debug server again
Load nonyield.sql in SSMS and run it
Use requests.sql to see the status is RUNNING
After a few seconds you will see info in ERRORLOG
Wait a bit longer and you get a stack dump
- Load **nonyield.sql** in SSMS and run it
- Use **requests.sql** to see the status is RUNNING
- After a few seconds you will see info in ERRORLOG
- Wait a bit longer and you get a stack dump
2. Go to the LOG directory
Edit debughang.cmd to get the last .MDMP file
Look at the debugger header
Type in k to see the call stack
Why is this a non-yielding problem based on the assembly instruction?
Type in u and find these instructions
- Edit **debughang.cmd** to get the last .MDMP file
- Look at the debugger header
- Type in `k` to see the call stack
- Why is this a non-yielding problem based on the assembly instruction?
- Type in `u` and find these instructions
```
00007ff8`ed0b7b0a 85c0 test eax,eax
00007ff8`ed0b7b0c 7402 je sqlmin!NonYieldTest+0x410 (00007ff8`ed0b7b10)
00007ff8`ed0b7b0e eb02 jmp sqlmin!NonYieldTest+0x412 (00007ff8`ed0b7b12)
00007ff8`ed0b7b10 ebb3 jmp sqlmin!NonYieldTest+0x3c5 (00007ff8`ed0b7ac5)
```
These are typical of a counter and a loop. This instruction points "backwards" before our current instruction
- These are typical of a counter and a loop. This instruction points "backwards" before our current instruction
00007ff8`ed0b7b10 ebb3 jmp sqlmin!NonYieldTest+0x3c5 (00007ff8`ed0b7ac5)
```00007ff8`ed0b7b10 ebb3 jmp sqlmin!NonYieldTest+0x3c5 (00007ff8`ed0b7ac5)```
So basically we are in a big loop not yielding to SQLOS
- So basically we are in a big loop not yielding to SQLOS. How is the server detecting the dump?
How is the server detecting the dump?
Type in !uniqstack
Search for SchedulerMonitor
Notice the callstack is the one triggering the dump
- Type in `!uniqstack`
- Search for **SchedulerMonitor**
- Notice the callstack is the one triggering the dump
3. Let's see a stalled scheduler scenario
Follow the readme.md in the scheduler_deadlock folder
Follow the readme.md in the **scheduler_deadlock** folder

Просмотреть файл

@ -0,0 +1,6 @@
SELECT er.session_id, er.status wait_type, wait_time, wait_resource, blocking_session_id
FROM sys.dm_exec_requests er
JOIN sys.dm_exec_sessions es
ON er.session_id = es.session_id
WHERE es.is_user_process = 1
go

Просмотреть файл

@ -1,3 +0,0 @@
del /f ".\ostress_connectme\*.out" > NUL
rem ostress.exe -Utest -Ptest -Slpc:. -iConnectionDelay.rml -n2000 -q -dlockhashme -l120
ostress.exe -E -Slpc:. -iConnectionDelay.rml -n2000 -q -dlockhashme -l120 -oostress_connectme

Просмотреть файл

@ -1,26 +1,27 @@
# Debugging a spinlock
1. restart sql server with startsqllockhash.cmd
- Create a database called lockhashme.
- Create a login called test with password of test and make that dbo of the db
2. Load up execrequests.sql and get_spinlock_stats.sql to observe waits and spinlock stats
3. Run start connectme.cmd. This connects a bunch of users to the same db
4. Run resetit.cmd to run a workload to connect/disconnect
1. restart sql server with **startsqllockhash.cmd**
- Create a database called **lockhashme**.
- Create a login called **test** with password of **test** and make that dbo of the db
2. Load up **execrequests.sql** and get_**spinlock_stats.sql** to observe waits and spinlock stats
3. Run start **connectme.cmd**. This connects a bunch of users to the same db
4. Run **resetit.cmd** to run a workload to connect/disconnect
5. Observe CPU and spinlock stats. Look at Query store to see no one query chewing up a bunch of CPU
6. Attach windbg with public symbols like this:
`windbg -y srv*c:\symbols*http://msdl.microsoft.com/download/symbols -pn sqlservr.exe`
`windbg -y srv*c:\symbols*http://msdl.microsoft.com/download/symbols -pn sqlservr.exe`
2. Set a breakpoint on the "backoff" code like this
`bp sqldk!SpinlockBase::Backoff`
`bp sqldk!SpinlockBase::Backoff`
3. Type in
`g`
`g`
4. Dump out callstack using k when the breakpoint is hit. It should look similar to this:
```
00 000000e3`31679608 00007ff9`04a89b21 sqldk!SpinlockBase::Backoff
01 000000e3`31679610 00007ff9`04a899dd sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff+0x1f5
02 000000e3`31679690 00007ff9`0634236c sqlmin!lck_lockInternal+0x10aa
@ -46,10 +47,11 @@
16 000000e3`3167f920 00007ff9`355f4ed0 sqldk!SchedulerManager::ThreadEntryPoint+0x3cf
17 000000e3`3167fa30 00007ff9`370be40b KERNEL32!BaseThreadInitThunk+0x10
18 000000e3`3167fa60 00000000`00000000 ntdll!RtlUserThreadStart+0x2b
```
4. Notice this part of the call stack
sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff+0x1f5
`sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff+0x1f5`
This is our code to "get" the spinlock AFTER we have attempted to get it once (start the spin, acquire, backoff)
@ -58,30 +60,32 @@ of the spinlock type or name. What does this map to?
type in
`bd *`
`g`
`bd *`
`g`
to disable breakpoints and go so we can run a query.
Run this query
```sql
select * from sys.dm_xe_map_values
where name = 'spinlock_types' and map_key = 156
```
and you will find that the name of this spinlock is LOCK_HASH
and you will find that the name of this spinlock is **LOCK_HASH**
5. Back to our code. What does the code look like to "acquire" the spinlock? The "compare and swap"?
Enable the breakpoint by typing in
`be *`
`g`
`be *`
`g`
6. You should hit a breakpoint pretty quickly again that looks the same as before. Let's find the "compare and swap" code
Type in this to get the symbol address for sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff+0x1f5
Type in this to get the symbol address for `sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff+0x1f5`
x sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff
`x sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff`
you should see something like this
@ -89,14 +93,14 @@ you should see something like this
Take that value and use the "uf" function to dump out the assembly
`uf 00007ffd``7caae500`
`uf 00007ffd``7caae500`
7. Search this for keyword cmpxchg
7. Search this for keyword **cmpxchg**
You will see something like this
00007ff9`0508995f f04d0fb126 lock cmpxchg qword ptr [r14],r12
`00007ff9`0508995f f04d0fb126 lock cmpxchg qword ptr [r14],r12`
This is the "compare and swap" instruction called cmpxchg on Intel CPUs. In our code we actually call InterlockedCompareExchange() but this is a compiler "intrinsic" so not a call into any DLL. It gets translated into this intruction
This is the "compare and swap" instruction called cmpxchg on Intel CPUs. In our code we actually call **InterlockedCompareExchange()** but this is a compiler "intrinsic" so not a call into any DLL. It gets translated into this instruction
8. Now scroll down and look at the instructions and branches to either call the backoff or "loop back" and do it all over again.