Bug 1651086 - Handle tgkill failure - r=canaltinova

On Linux (including Android), it was assumed that a registered thread could always be suspended through `tgkill`.
However in some cases a thread may not be correctly unregistered, in which case this would trigger `MOZ_ASSERT` or wait forever in the following loop.

This will especially be needed when `profiler_{,un}register_thread()` are made less strict in the following patch.

Windows and Mac already handle suspension failures.

Differential Revision: https://phabricator.services.mozilla.com/D83292
This commit is contained in:
Gerald Squelart 2020-07-13 13:14:32 +00:00
Родитель 760f059b00
Коммит 7610ff4326
2 изменённых файлов: 110 добавлений и 108 удалений

Просмотреть файл

@ -329,66 +329,67 @@ void Sampler::SuspendAndSampleAndResumeThread(
// Send message 1 to the samplee (the thread to be sampled), by // Send message 1 to the samplee (the thread to be sampled), by
// signalling at it. // signalling at it.
// This could fail if the thread doesn't exist anymore.
int r = tgkill(mMyPid, sampleeTid, SIGPROF); int r = tgkill(mMyPid, sampleeTid, SIGPROF);
MOZ_ASSERT(r == 0); if (r == 0) {
// Wait for message 2 from the samplee, indicating that the context
// Wait for message 2 from the samplee, indicating that the context // is available and that the thread is suspended.
// is available and that the thread is suspended. while (true) {
while (true) { r = sem_wait(&sSigHandlerCoordinator->mMessage2);
r = sem_wait(&sSigHandlerCoordinator->mMessage2); if (r == -1 && errno == EINTR) {
if (r == -1 && errno == EINTR) { // Interrupted by a signal. Try again.
// Interrupted by a signal. Try again. continue;
continue; }
// We don't expect any other kind of failure.
MOZ_ASSERT(r == 0);
break;
} }
// We don't expect any other kind of failure.
//----------------------------------------------------------------//
// Sample the target thread.
// WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
//
// The profiler's "critical section" begins here. In the critical section,
// we must not do any dynamic memory allocation, nor try to acquire any lock
// or any other unshareable resource. This is because the thread to be
// sampled has been suspended at some entirely arbitrary point, and we have
// no idea which unsharable resources (locks, essentially) it holds. So any
// attempt to acquire any lock, including the implied locks used by the
// malloc implementation, risks deadlock. This includes TimeStamp::Now(),
// which gets a lock on Windows.
// The samplee thread is now frozen and sSigHandlerCoordinator->mUContext is
// valid. We can poke around in it and unwind its stack as we like.
// Extract the current register values.
Registers regs;
PopulateRegsFromContext(regs, &sSigHandlerCoordinator->mUContext);
aProcessRegs(regs, aNow);
//----------------------------------------------------------------//
// Resume the target thread.
// Send message 3 to the samplee, which tells it to resume.
r = sem_post(&sSigHandlerCoordinator->mMessage3);
MOZ_ASSERT(r == 0); MOZ_ASSERT(r == 0);
break;
}
//----------------------------------------------------------------// // Wait for message 4 from the samplee, which tells us that it has
// Sample the target thread. // finished with |sSigHandlerCoordinator|.
while (true) {
// WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING r = sem_wait(&sSigHandlerCoordinator->mMessage4);
// if (r == -1 && errno == EINTR) {
// The profiler's "critical section" begins here. In the critical section, continue;
// we must not do any dynamic memory allocation, nor try to acquire any lock }
// or any other unshareable resource. This is because the thread to be MOZ_ASSERT(r == 0);
// sampled has been suspended at some entirely arbitrary point, and we have break;
// no idea which unsharable resources (locks, essentially) it holds. So any
// attempt to acquire any lock, including the implied locks used by the
// malloc implementation, risks deadlock. This includes TimeStamp::Now(),
// which gets a lock on Windows.
// The samplee thread is now frozen and sSigHandlerCoordinator->mUContext is
// valid. We can poke around in it and unwind its stack as we like.
// Extract the current register values.
Registers regs;
PopulateRegsFromContext(regs, &sSigHandlerCoordinator->mUContext);
aProcessRegs(regs, aNow);
//----------------------------------------------------------------//
// Resume the target thread.
// Send message 3 to the samplee, which tells it to resume.
r = sem_post(&sSigHandlerCoordinator->mMessage3);
MOZ_ASSERT(r == 0);
// Wait for message 4 from the samplee, which tells us that it has
// finished with |sSigHandlerCoordinator|.
while (true) {
r = sem_wait(&sSigHandlerCoordinator->mMessage4);
if (r == -1 && errno == EINTR) {
continue;
} }
MOZ_ASSERT(r == 0);
break;
}
// The profiler's critical section ends here. After this point, none of the // The profiler's critical section ends here. After this point, none of the
// critical section limitations documented above apply. // critical section limitations documented above apply.
// //
// WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING // WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
}
// This isn't strictly necessary, but doing so does help pick up anomalies // This isn't strictly necessary, but doing so does help pick up anomalies
// in which the signal handler is running when it shouldn't be. // in which the signal handler is running when it shouldn't be.

Просмотреть файл

@ -323,66 +323,67 @@ void Sampler::SuspendAndSampleAndResumeThread(
// Send message 1 to the samplee (the thread to be sampled), by // Send message 1 to the samplee (the thread to be sampled), by
// signalling at it. // signalling at it.
// This could fail if the thread doesn't exist anymore.
int r = tgkill(mMyPid, sampleeTid, SIGPROF); int r = tgkill(mMyPid, sampleeTid, SIGPROF);
MOZ_ASSERT(r == 0); if (r == 0) {
// Wait for message 2 from the samplee, indicating that the context
// Wait for message 2 from the samplee, indicating that the context // is available and that the thread is suspended.
// is available and that the thread is suspended. while (true) {
while (true) { r = sem_wait(&sSigHandlerCoordinator->mMessage2);
r = sem_wait(&sSigHandlerCoordinator->mMessage2); if (r == -1 && errno == EINTR) {
if (r == -1 && errno == EINTR) { // Interrupted by a signal. Try again.
// Interrupted by a signal. Try again. continue;
continue; }
// We don't expect any other kind of failure.
MOZ_ASSERT(r == 0);
break;
} }
// We don't expect any other kind of failure.
//----------------------------------------------------------------//
// Sample the target thread.
// WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
//
// The profiler's "critical section" begins here. In the critical section,
// we must not do any dynamic memory allocation, nor try to acquire any lock
// or any other unshareable resource. This is because the thread to be
// sampled has been suspended at some entirely arbitrary point, and we have
// no idea which unsharable resources (locks, essentially) it holds. So any
// attempt to acquire any lock, including the implied locks used by the
// malloc implementation, risks deadlock. This includes TimeStamp::Now(),
// which gets a lock on Windows.
// The samplee thread is now frozen and sSigHandlerCoordinator->mUContext is
// valid. We can poke around in it and unwind its stack as we like.
// Extract the current register values.
Registers regs;
PopulateRegsFromContext(regs, &sSigHandlerCoordinator->mUContext);
aProcessRegs(regs, aNow);
//----------------------------------------------------------------//
// Resume the target thread.
// Send message 3 to the samplee, which tells it to resume.
r = sem_post(&sSigHandlerCoordinator->mMessage3);
MOZ_ASSERT(r == 0); MOZ_ASSERT(r == 0);
break;
}
//----------------------------------------------------------------// // Wait for message 4 from the samplee, which tells us that it has
// Sample the target thread. // finished with |sSigHandlerCoordinator|.
while (true) {
// WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING r = sem_wait(&sSigHandlerCoordinator->mMessage4);
// if (r == -1 && errno == EINTR) {
// The profiler's "critical section" begins here. In the critical section, continue;
// we must not do any dynamic memory allocation, nor try to acquire any lock }
// or any other unshareable resource. This is because the thread to be MOZ_ASSERT(r == 0);
// sampled has been suspended at some entirely arbitrary point, and we have break;
// no idea which unsharable resources (locks, essentially) it holds. So any
// attempt to acquire any lock, including the implied locks used by the
// malloc implementation, risks deadlock. This includes TimeStamp::Now(),
// which gets a lock on Windows.
// The samplee thread is now frozen and sSigHandlerCoordinator->mUContext is
// valid. We can poke around in it and unwind its stack as we like.
// Extract the current register values.
Registers regs;
PopulateRegsFromContext(regs, &sSigHandlerCoordinator->mUContext);
aProcessRegs(regs, aNow);
//----------------------------------------------------------------//
// Resume the target thread.
// Send message 3 to the samplee, which tells it to resume.
r = sem_post(&sSigHandlerCoordinator->mMessage3);
MOZ_ASSERT(r == 0);
// Wait for message 4 from the samplee, which tells us that it has
// finished with |sSigHandlerCoordinator|.
while (true) {
r = sem_wait(&sSigHandlerCoordinator->mMessage4);
if (r == -1 && errno == EINTR) {
continue;
} }
MOZ_ASSERT(r == 0);
break;
}
// The profiler's critical section ends here. After this point, none of the // The profiler's critical section ends here. After this point, none of the
// critical section limitations documented above apply. // critical section limitations documented above apply.
// //
// WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING // WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
}
// This isn't strictly necessary, but doing so does help pick up anomalies // This isn't strictly necessary, but doing so does help pick up anomalies
// in which the signal handler is running when it shouldn't be. // in which the signal handler is running when it shouldn't be.