After some investigation, I was able to find a theoretical codepath
which could lead to the "missing initial frame browsing context" error:
1. Two iframes are created for the same origin, and begin process
switching.
2. The first iframe finishes process switching, but for some reason
(e.g. being in shutdown) the call to `LaunchSubprocessResolve`
errors.
3. The second callback is called and also calls LaunchSubprocessResolve,
which this time returns `true` due to it previously having been
called.
4. The BrowserParent is created in the new content process despite
`InitInternal()` never having been finished, and therefore the
ContentParent never becoming subscribed to the BrowsingContextGroup.
To fix this, I made 2 changes:
1. Abort from process switching if the target process which we're going
to be creating a BrowserParent in `IsDead()`, and
2. Track the return value from `LaunchSubprocessResolve`, so we return
`false` if it is called a second time after a failed content process
launch.
I'm not confident that this is the cause of the crashes, as I was unable
to reproduce the issue.
Differential Revision: https://phabricator.services.mozilla.com/D123548
This will, for example, make it possible to behave differently for a normal navigation,
a reload navigation, a history navigation, and a pushstate navigation.
Differential Revision: https://phabricator.services.mozilla.com/D122532
Looks like the code below does similar thing already for iframes, so this is adding InternalSetRequestedIndex
call only for the top level case.
This is based on the same assumptions as bug 1697905, but unfortunately testing this is still
super hard.
Differential Revision: https://phabricator.services.mozilla.com/D122355
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Differential Revision: https://phabricator.services.mozilla.com/D120674
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
1. The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch.
2. Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier.
3. Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement.
4. ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging.
5. Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Differential Revision: https://phabricator.services.mozilla.com/D120673
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Differential Revision: https://phabricator.services.mozilla.com/D120671
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Differential Revision: https://phabricator.services.mozilla.com/D120674
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
1. The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch.
2. Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier.
3. Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement.
4. ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging.
5. Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Differential Revision: https://phabricator.services.mozilla.com/D120673
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Differential Revision: https://phabricator.services.mozilla.com/D120671
To support more cases, change this value to more general name and use a count instead, if the count is larger than zero, then we would not suspend the page.
In addition, this value now can be set in any processes (but still for the top level only), which is different from before where we would only set the value from the chrome process.
Differential Revision: https://phabricator.services.mozilla.com/D119837
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Differential Revision: https://phabricator.services.mozilla.com/D120674
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
1. The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch.
2. Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier.
3. Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement.
4. ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging.
5. Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Differential Revision: https://phabricator.services.mozilla.com/D120673
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Differential Revision: https://phabricator.services.mozilla.com/D120671
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Differential Revision: https://phabricator.services.mozilla.com/D120674
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
1. The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch.
2. Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier.
3. Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement.
4. ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging.
5. Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Differential Revision: https://phabricator.services.mozilla.com/D120673
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Differential Revision: https://phabricator.services.mozilla.com/D120671
This accomplishes 2 things:
1. Allows us to directly fetch the layersId of the process that is
autoscrolling, which avoids having to fetch it in AutoScrollChild and pass it
around. This fixes autoscrolling out-of-process frames with Fission enabled.
2. Makes it easier to handle autoscrolling of in-process documents, since that
can't happen through PBrowser.
Differential Revision: https://phabricator.services.mozilla.com/D120766
This accomplishes 2 things:
1. Allows us to directly fetch the layersId of the process that is
autoscrolling, which avoids having to fetch it in AutoScrollChild and pass it
around. This fixes autoscrolling out-of-process frames with Fission enabled.
2. Makes it easier to handle autoscrolling of in-process documents, since that
can't happen through PBrowser.
Differential Revision: https://phabricator.services.mozilla.com/D120766
Instead of initializing on first use - which can race with workers - do the
initialization in nsLayoutStatics::Initialize(). Also make `sInShutdown` an
Atomic since it is accessed without locks.
Differential Revision: https://phabricator.services.mozilla.com/D121143
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Differential Revision: https://phabricator.services.mozilla.com/D120674
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
1. The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch.
2. Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier.
3. Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement.
4. ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging.
5. Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Differential Revision: https://phabricator.services.mozilla.com/D120673
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Differential Revision: https://phabricator.services.mozilla.com/D120671
There are a number of modules that we import from C++ and can't continue
running without. We have a number of crashes for some of those failed loads. A
lot of them are from OOMs or corruption, but we're not sure about the rest.
This patch adds a crash annotation with the details of the error wherever we
abort for failing to load a module.
Differential Revision: https://phabricator.services.mozilla.com/D120290
There are a number of modules that we import from C++ and can't continue
running without. We have a number of crashes for some of those failed loads. A
lot of them are from OOMs or corruption, but we're not sure about the rest.
This patch adds a crash annotation with the details of the error wherever we
abort for failing to load a module.
Differential Revision: https://phabricator.services.mozilla.com/D120290
Ensure firing the onStateChange on the BrowsingContext's WebProgress
of the page we navigate *from* instead of the one we navigate *to*.
(which was what used to happen before fission/bfcacheInParent)
Differential Revision: https://phabricator.services.mozilla.com/D119648
I don't have a test for this, since the race condition is super hard to trigger.
The patch is based on code auditing.
The patch just prevents bfcaching if we're in middle of another load.
Differential Revision: https://phabricator.services.mozilla.com/D119992
This makes sure to clear and set the value more consistently when replacing
documents within a WindowGlobal, and makes sure to include the relevant flag in
the initializer.
In addition, the place where the flag is set is moved ahead to happen before
the call to `Embed` so that the information is ready before the window is
created.
Differential Revision: https://phabricator.services.mozilla.com/D119815
This moves the logic of whether a pres shell should be active to a
single place to make it sane to reason about, and fixes the
subdocument propagation when a BrowserChild becomes visible.
Differential Revision: https://phabricator.services.mozilla.com/D118703
Move the counting of private browsing contexts to the parent
process. Also change to only consider non-chrome browsing contexts
when counting private contexts. The latter is possible due to bug
1528115, because we no longer need to support hidden private windows.
With counting in the parent process we can make sure that when we're
changing remoteness on a private browsing context the private browsing
context count never drops to zero. This fixes an issue with Fission,
where we remoteness changes could transiently have a zero private
browsing context count, that would be mistaken for the last private
browsing context going away.
Changing to only count non-chrome browsing contexts makes us only fire
'last-pb-context-exited' once, and since we count them in the parent
there is no missing information about contexts that makes us wait for
a content process about telling us about insertion or removal of
browsing contexts.
Differential Revision: https://phabricator.services.mozilla.com/D118182
Move the counting of private browsing contexts to the parent
process. Also change to only consider non-chrome browsing contexts
when counting private contexts. The latter is possible due to bug
1528115, because we no longer need to support hidden private windows.
With counting in the parent process we can make sure that when we're
changing remoteness on a private browsing context the private browsing
context count never drops to zero. This fixes an issue with Fission,
where we remoteness changes could transiently have a zero private
browsing context count, that would be mistaken for the last private
browsing context going away.
Changing to only count non-chrome browsing contexts makes us only fire
'last-pb-context-exited' once, and since we count them in the parent
there is no missing information about contexts that makes us wait for
a content process about telling us about insertion or removal of
browsing contexts.
Differential Revision: https://phabricator.services.mozilla.com/D118182
Move the counting of private browsing contexts to the parent
process. Also change to only consider non-chrome browsing contexts
when counting private contexts. The latter is possible due to bug
1528115, because we no longer need to support hidden private windows.
With counting in the parent process we can make sure that when we're
changing remoteness on a private browsing context the private browsing
context count never drops to zero. This fixes an issue with Fission,
where we remoteness changes could transiently have a zero private
browsing context count, that would be mistaken for the last private
browsing context going away.
Changing to only count non-chrome browsing contexts makes us only fire
'last-pb-context-exited' once, and since we count them in the parent
there is no missing information about contexts that makes us wait for
a content process about telling us about insertion or removal of
browsing contexts.
Differential Revision: https://phabricator.services.mozilla.com/D118182
During a `TextControlState` alive, `PasswordMaskData` should be alive too.
Otherwise, we cannot keep unmasked range at reframing.
Depends on D118756
Differential Revision: https://phabricator.services.mozilla.com/D118757
We can't rely on the Children list since it may have been cleared on shutdown.
Since we don't clear parent edges, walking the parent chain and using
mChildOffset works.
Differential Revision: https://phabricator.services.mozilla.com/D118384
We can't rely on the Children list since it may have been cleared on shutdown.
Since we don't clear parent edges, walking the parent chain and using
mChildOffset works.
Differential Revision: https://phabricator.services.mozilla.com/D118384
This moves the logic of whether a pres shell should be active to a
single place to make it sane to reason about, and fixes the
subdocument propagation when a BrowserChild becomes visible.
Differential Revision: https://phabricator.services.mozilla.com/D118703
We can't rely on the Children list since it may have been cleared on shutdown.
Since we don't clear parent edges, walking the parent chain and using
mChildOffset works.
Differential Revision: https://phabricator.services.mozilla.com/D118384
If a <browser> is removed in the middle of a process change, the previous
DocShell may be torn down while it's still expecting a process change. There's
really nothing more to do in that case, though, so we can just skip the
PrepareForProcessChange call rather than asserting.
Differential Revision: https://phabricator.services.mozilla.com/D118514