зеркало из https://github.com/microsoft/bobsql.git
Merge pull request #46 from microsoft/bw
Formatting and changes for debug session
This commit is contained in:
Коммит
e261d00dea
|
@ -1 +1 @@
|
|||
windbg -y c:\debgugsql\adm64 -z SQLDump0008.mdmp
|
||||
windbg -y srv*c:\symbols*https://msdl.microsoft.com/download/symbols;c:\debugsql\debug\symbols.pri\amd64\dll;c:\debugsql\debug\symbols.pri\amd64\exe -z SQLDump0001.mdmp
|
|
@ -1 +1 @@
|
|||
windbg -y srv*c:\symbols*http://msdl.microsoft.com/download/symbols -pn sqlservr.exe
|
||||
windbg -y srv*c:\symbols*https://msdl.microsoft.com/download/symbols -pn sqlservr.exe
|
|
@ -1,52 +1,49 @@
|
|||
# Demos for an Access Violation
|
||||
|
||||
1. Load up crash.sln in Visual Studio
|
||||
1. Load up **crash.sln** in Visual Studio
|
||||
|
||||
Show the basics of the code
|
||||
Hit Play
|
||||
Show the console window with exit code. The number in hex is C0000005. Show winnt.h for what C0000005 means which is STATUS_ACCESS_VIOLATION
|
||||
Now uncomment the code for SEH and show the behavior. That is what SQL Server tries to do for many AVs.
|
||||
- Show the basics of the code
|
||||
- Hit Play
|
||||
- Show the console window with exit code. The number in hex is **C0000005**. Show **winnt.h** for what C0000005 means which is STATUS_ACCESS_VIOLATION
|
||||
- Now uncomment the code for SEH and show the behavior. That is what SQL Server tries to do for many AVs.
|
||||
|
||||
2. Show an AV with SQL Server
|
||||
|
||||
Stop the SQL Server Service
|
||||
Run the debug server from the debugsql directory using sqlservr.cmd
|
||||
Load crash.sql and talk about DBCC UNITTEST
|
||||
Show the ERRORLOG
|
||||
Call stack is in the ERRORLOG but let's use the debugger
|
||||
Go to the c:\debugsql\sqlservr\log directory. Talk about the SQLDUMPER_ERRORLOG.LOG file
|
||||
Edit debugav.cmd to put in the name of the MDMP file and run it.
|
||||
Show the header at the top of the debugger
|
||||
To get the right AV context type in .ecxr
|
||||
Type in k to see the callstack
|
||||
Show the assembly instruction for the AV. Type r to see registers and show it is a write to rax which is 0.
|
||||
- Stop the SQL Server Service
|
||||
- Run the debug server from the debugsql\debug\sqlservr directory using **sqlservr.cmd**
|
||||
- Load **crash.sql** and talk about **DBCC UNITTEST**
|
||||
- Show the ERRORLOG
|
||||
- Call stack is in the ERRORLOG but let's use the debugger
|
||||
- Go to the c:\debugsql\debug\sqlservr\log directory. Talk about the SQLDUMPER_ERRORLOG.LOG file
|
||||
- Edit **debugav.cmd** to put in the name of the MDMP file and run it.
|
||||
- Show the header at the top of the debugger
|
||||
- To get the right AV context type in `.ecxr`
|
||||
- Type in `k` to see the callstack
|
||||
- Show the assembly instruction for the AV. Type `r` to see registers and show it is a write to **rax** which is 0.
|
||||
|
||||
3. Now let's have some fun and cause our own AV
|
||||
|
||||
Stop the debug server
|
||||
Startup the SQL Server service
|
||||
Attach the debugger with debugsql.cmd
|
||||
Set a breakpoint with this command
|
||||
- Stop the debug server
|
||||
- Startup the normal SQL Server service
|
||||
- Attach the debugger with **debugsql.cmd**
|
||||
- Set a breakpoint with this command
|
||||
|
||||
bp sqllang!process_commands
|
||||
g
|
||||
`bp sqllang!process_commands`
|
||||
`g`
|
||||
|
||||
Try to run a query with SSMS
|
||||
4. Try to run a query with SSMS
|
||||
|
||||
Trace the assembly until you get to this line of code
|
||||
- Trace the assembly until you get to this line of code
|
||||
|
||||
mov rax,qword ptr [rcx] ds:000001b3`11138f40={sqllang!CBatch::`vftable' (00007ffb`68058478)}
|
||||
```mov rax,qword ptr [rcx] ds:000001b3`11138f40={sqllang!CBatch::`vftable' (00007ffb`68058478)}```
|
||||
|
||||
Now type in
|
||||
|
||||
bd 0
|
||||
r rcx=0
|
||||
gu
|
||||
`bd 0`
|
||||
`r rcx=0`
|
||||
`gu`
|
||||
`g`
|
||||
|
||||
Go look at the ERRORLOG directory and see the AV. Notice there is no input buffer because we hit an AV before the server read in the query
|
||||
- Go look at the ERRORLOG directory and see the AV. Notice there is no input buffer because we hit an AV before the server read in the query
|
||||
|
||||
Detach the debugger and shutdown SQL Server
|
||||
|
||||
|
||||
|
||||
Use the debugger on a live SQL Server and change a register to cause an AV using public symbols – Attach debugger and set breakpoint at sqllang!process_commands. After breakpoint hit hit “t” until you come to this line of code mov rax,qword ptr [rcx] ds:000001b3`11138f40={sqllang!CBatch::`vftable' (00007ffb`68058478)}. Then execute r rcx=0 and ‘gu’. Look at ERRORLOG.
|
||||
5. Detach the debugger and shutdown SQL Server
|
||||
|
|
|
@ -0,0 +1 @@
|
|||
windbg -y srv*c:\symbols*https://msdl.microsoft.com/download/symbols;c:\debugsql\debug\symbols.pri\amd64\dll;c:\debugsql\debug\symbols.pri\amd64\exe -z SQLDump0002.mdmp
|
|
@ -2,38 +2,37 @@
|
|||
|
||||
1. Run debug server again
|
||||
|
||||
Load nonyield.sql in SSMS and run it
|
||||
Use requests.sql to see the status is RUNNING
|
||||
After a few seconds you will see info in ERRORLOG
|
||||
Wait a bit longer and you get a stack dump
|
||||
- Load **nonyield.sql** in SSMS and run it
|
||||
- Use **requests.sql** to see the status is RUNNING
|
||||
- After a few seconds you will see info in ERRORLOG
|
||||
- Wait a bit longer and you get a stack dump
|
||||
|
||||
2. Go to the LOG directory
|
||||
|
||||
Edit debughang.cmd to get the last .MDMP file
|
||||
Look at the debugger header
|
||||
Type in k to see the call stack
|
||||
Why is this a non-yielding problem based on the assembly instruction?
|
||||
Type in u and find these instructions
|
||||
- Edit **debughang.cmd** to get the last .MDMP file
|
||||
- Look at the debugger header
|
||||
- Type in `k` to see the call stack
|
||||
- Why is this a non-yielding problem based on the assembly instruction?
|
||||
- Type in `u` and find these instructions
|
||||
|
||||
|
||||
```
|
||||
00007ff8`ed0b7b0a 85c0 test eax,eax
|
||||
00007ff8`ed0b7b0c 7402 je sqlmin!NonYieldTest+0x410 (00007ff8`ed0b7b10)
|
||||
00007ff8`ed0b7b0e eb02 jmp sqlmin!NonYieldTest+0x412 (00007ff8`ed0b7b12)
|
||||
00007ff8`ed0b7b10 ebb3 jmp sqlmin!NonYieldTest+0x3c5 (00007ff8`ed0b7ac5)
|
||||
```
|
||||
|
||||
These are typical of a counter and a loop. This instruction points "backwards" before our current instruction
|
||||
- These are typical of a counter and a loop. This instruction points "backwards" before our current instruction
|
||||
|
||||
00007ff8`ed0b7b10 ebb3 jmp sqlmin!NonYieldTest+0x3c5 (00007ff8`ed0b7ac5)
|
||||
```00007ff8`ed0b7b10 ebb3 jmp sqlmin!NonYieldTest+0x3c5 (00007ff8`ed0b7ac5)```
|
||||
|
||||
So basically we are in a big loop not yielding to SQLOS
|
||||
- So basically we are in a big loop not yielding to SQLOS. How is the server detecting the dump?
|
||||
|
||||
How is the server detecting the dump?
|
||||
|
||||
Type in !uniqstack
|
||||
Search for SchedulerMonitor
|
||||
Notice the callstack is the one triggering the dump
|
||||
- Type in `!uniqstack`
|
||||
- Search for **SchedulerMonitor**
|
||||
- Notice the callstack is the one triggering the dump
|
||||
|
||||
3. Let's see a stalled scheduler scenario
|
||||
|
||||
Follow the readme.md in the scheduler_deadlock folder
|
||||
|
||||
|
||||
Follow the readme.md in the **scheduler_deadlock** folder
|
|
@ -0,0 +1,6 @@
|
|||
SELECT er.session_id, er.status wait_type, wait_time, wait_resource, blocking_session_id
|
||||
FROM sys.dm_exec_requests er
|
||||
JOIN sys.dm_exec_sessions es
|
||||
ON er.session_id = es.session_id
|
||||
WHERE es.is_user_process = 1
|
||||
go
|
|
@ -1,3 +0,0 @@
|
|||
del /f ".\ostress_connectme\*.out" > NUL
|
||||
rem ostress.exe -Utest -Ptest -Slpc:. -iConnectionDelay.rml -n2000 -q -dlockhashme -l120
|
||||
ostress.exe -E -Slpc:. -iConnectionDelay.rml -n2000 -q -dlockhashme -l120 -oostress_connectme
|
|
@ -1,26 +1,27 @@
|
|||
# Debugging a spinlock
|
||||
|
||||
1. restart sql server with startsqllockhash.cmd
|
||||
- Create a database called lockhashme.
|
||||
- Create a login called test with password of test and make that dbo of the db
|
||||
2. Load up execrequests.sql and get_spinlock_stats.sql to observe waits and spinlock stats
|
||||
3. Run start connectme.cmd. This connects a bunch of users to the same db
|
||||
4. Run resetit.cmd to run a workload to connect/disconnect
|
||||
1. restart sql server with **startsqllockhash.cmd**
|
||||
- Create a database called **lockhashme**.
|
||||
- Create a login called **test** with password of **test** and make that dbo of the db
|
||||
2. Load up **execrequests.sql** and get_**spinlock_stats.sql** to observe waits and spinlock stats
|
||||
3. Run start **connectme.cmd**. This connects a bunch of users to the same db
|
||||
4. Run **resetit.cmd** to run a workload to connect/disconnect
|
||||
5. Observe CPU and spinlock stats. Look at Query store to see no one query chewing up a bunch of CPU
|
||||
6. Attach windbg with public symbols like this:
|
||||
|
||||
`windbg -y srv*c:\symbols*http://msdl.microsoft.com/download/symbols -pn sqlservr.exe`
|
||||
`windbg -y srv*c:\symbols*http://msdl.microsoft.com/download/symbols -pn sqlservr.exe`
|
||||
|
||||
2. Set a breakpoint on the "backoff" code like this
|
||||
|
||||
`bp sqldk!SpinlockBase::Backoff`
|
||||
`bp sqldk!SpinlockBase::Backoff`
|
||||
|
||||
3. Type in
|
||||
|
||||
`g`
|
||||
`g`
|
||||
|
||||
4. Dump out callstack using k when the breakpoint is hit. It should look similar to this:
|
||||
|
||||
```
|
||||
00 000000e3`31679608 00007ff9`04a89b21 sqldk!SpinlockBase::Backoff
|
||||
01 000000e3`31679610 00007ff9`04a899dd sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff+0x1f5
|
||||
02 000000e3`31679690 00007ff9`0634236c sqlmin!lck_lockInternal+0x10aa
|
||||
|
@ -46,10 +47,11 @@
|
|||
16 000000e3`3167f920 00007ff9`355f4ed0 sqldk!SchedulerManager::ThreadEntryPoint+0x3cf
|
||||
17 000000e3`3167fa30 00007ff9`370be40b KERNEL32!BaseThreadInitThunk+0x10
|
||||
18 000000e3`3167fa60 00000000`00000000 ntdll!RtlUserThreadStart+0x2b
|
||||
```
|
||||
|
||||
4. Notice this part of the call stack
|
||||
|
||||
sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff+0x1f5
|
||||
`sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff+0x1f5`
|
||||
|
||||
This is our code to "get" the spinlock AFTER we have attempted to get it once (start the spin, acquire, backoff)
|
||||
|
||||
|
@ -58,30 +60,32 @@ of the spinlock type or name. What does this map to?
|
|||
|
||||
type in
|
||||
|
||||
`bd *`
|
||||
`g`
|
||||
`bd *`
|
||||
`g`
|
||||
|
||||
to disable breakpoints and go so we can run a query.
|
||||
|
||||
Run this query
|
||||
|
||||
```sql
|
||||
select * from sys.dm_xe_map_values
|
||||
where name = 'spinlock_types' and map_key = 156
|
||||
```
|
||||
|
||||
and you will find that the name of this spinlock is LOCK_HASH
|
||||
and you will find that the name of this spinlock is **LOCK_HASH**
|
||||
|
||||
5. Back to our code. What does the code look like to "acquire" the spinlock? The "compare and swap"?
|
||||
|
||||
Enable the breakpoint by typing in
|
||||
|
||||
`be *`
|
||||
`g`
|
||||
`be *`
|
||||
`g`
|
||||
|
||||
6. You should hit a breakpoint pretty quickly again that looks the same as before. Let's find the "compare and swap" code
|
||||
|
||||
Type in this to get the symbol address for sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff+0x1f5
|
||||
Type in this to get the symbol address for `sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff+0x1f5`
|
||||
|
||||
x sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff
|
||||
`x sqlmin!Spinlock<183,7,257>::SpinToAcquireWithExponentialBackoff`
|
||||
|
||||
you should see something like this
|
||||
|
||||
|
@ -89,14 +93,14 @@ you should see something like this
|
|||
|
||||
Take that value and use the "uf" function to dump out the assembly
|
||||
|
||||
`uf 00007ffd``7caae500`
|
||||
`uf 00007ffd``7caae500`
|
||||
|
||||
7. Search this for keyword cmpxchg
|
||||
7. Search this for keyword **cmpxchg**
|
||||
|
||||
You will see something like this
|
||||
|
||||
00007ff9`0508995f f04d0fb126 lock cmpxchg qword ptr [r14],r12
|
||||
`00007ff9`0508995f f04d0fb126 lock cmpxchg qword ptr [r14],r12`
|
||||
|
||||
This is the "compare and swap" instruction called cmpxchg on Intel CPUs. In our code we actually call InterlockedCompareExchange() but this is a compiler "intrinsic" so not a call into any DLL. It gets translated into this intruction
|
||||
This is the "compare and swap" instruction called cmpxchg on Intel CPUs. In our code we actually call **InterlockedCompareExchange()** but this is a compiler "intrinsic" so not a call into any DLL. It gets translated into this instruction
|
||||
|
||||
8. Now scroll down and look at the instructions and branches to either call the backoff or "loop back" and do it all over again.
|
Загрузка…
Ссылка в новой задаче