gecko-dev/tools/profiler/lul
Julian Seward d952612d22 Bug 1665029 - LUL: reduce space consumption by not storing duplicate RuleSets. r=fitzgen.
LUL (a Lightweight Unwind Library) performs unwinding on targets that use
Dwarf CFI metadata.  As each Linux/Android shared object is mapped into
memory, it reads unwind data from the objects .eh_frame and/or .debug_frame
sections, and from that info produces a set of canned how-to-unwind recipes,
called RuleSets, which are stored in a SecMap object.  There is one SecMap
object for each mapped object in the process.

Each RuleSet describes how to do a step of unwinding for some code address
range.  Most code address ranges are very short (a few bytes) and so there are
many RuleSets.  libxul.so as of Sept 2020 has approaching 4 million RuleSets,
for example.  Currently, each is 48 bytes long, and so storing them all
requires considerable space, over 200MB.

This patch reduces the storage requirement almost by a factor of 6.  The key
observation is that although there are many RuleSets, almost all of them are
duplicates.  libxul.so typically has less than 300 different RuleSets.  This
patch exploits that observation using two different compression schemes.

Firstly, it makes sense to store each different RuleSet only once, in a vector
("the dictionary").  Then, instead of storing (for libxul.so) a vector of 4
million 48-byte-sized RuleSets, we store a vector of 4 million triples, of the
form

 (code_address, len, dictionary_index)

If `code_address` is 64-bit, and we (entirely reasonably) constrain `len` and
`dictionary_index` to be 32 bits, then a triple takes 16 bytes.  This would
give a factor of 3 memory saving, assuming (again reasonably) that the
dictionary's size is insignificant.

Secondly, we observe that (a) all `code_address`es for any specific shared
object (hence, for the associated RuleSet) span at maximum about 120MB, (b)
that the highest observed `dictionary_index` is less than 400, and (c) that
almost all `len` values are less than 2^16.  Hence we can represent that
triple as

  (32-bit offset_from_RuleSet_base_address, 16-bit len, 16-bit dictionary_index)

For the few `len` values that won't fit into 16 bits, we can chop the range up
into a sequence of 2^16-1 long chunks.  This is exceedingly rare in practice.

With this arrangement, each triple is 8 bytes, hence giving the final
compression figure of 6 == 48 / 8.

In the patch, the triple is represented by a new struct, `Extent`.

This scheme is described (more or less) in
https://blog.mozilla.org/jseward/2013/09/03/how-compactly-can-cfiexidx-stack-unwinding-info-be-represented/

and there is background information on the representations at
https://blog.mozilla.org/jseward/2013/08/29/how-fast-can-cfiexidx-based-stack-unwinding-be/

---

Specific changes are:

class RuleSet: fields `mAddr` and `mLen`, which used to specify the address
range to which the RuleSet applied, have been removed.  They can no longer be
part of RuleSet because each RuleSet is now stored only once, and referenced
by each address range fragment which uses it.  The address information has
instead been moved to ..

struct Extent: this is a new, 8 byte structure, which specifies address
ranges, and indices into the dictionary of RuleSets, as described above.

class SecMap: this holds all the unwind information harvested from a single
Linux/Android shared object.

* Per the description above, the may-contain-duplicates vector of RuleSets,
  `mRuleSet`, has been removed.  Instead it is replaced by a vector of
  `Extent`s, `mExtents`, and the duplicate-free vector of RuleSets,
  `mDictionary`, entries in which are referred to from `mExtents`.

* `mDictionary` cannot be efficiently created until we know all the RuleSets
  that it will need to contain.  Hence, while reading unwind data, a hash
  table, `mUniqifier`, is used as an intermediate.  This maps RuleSets to
  unique integer IDs.  Once reading is complete, `mUniqifier` is enumerated in
  order of the unique IDs, and the RuleSets are copied into their correct
  locations in `mDictionary`.  `mUniqifier` is then deleted, and plays no
  further role.

In terms of actions, the main changes are:

* SecMap::AddRuleSet: the new RuleSet is looked up in `mUniqifier`, or added
  if missing.  This generates a dictionary-index for it.  This is the core of
  the de-duplication process.  Also, a new `mExtent` entry is added for the
  range.

* SecMap::PrepareRuleSets: this is called once all info has been read, but
  before we commence unwinding.  The `mExtent`s implied-address-ranges are
  sorted, trimmed and generally tidied up.  `mDictionary` is created from
  `mUniqifier` and the latter is deleted.

Secondary changes:

* SecMap::mSummaryMinAddr and SecMap::mSummaryMaxAddr have been removed and
  replaced by `mMapMinAVMA` and `mMapMaxAVMA`.

  `mSummaryMinAddr` and `mSummaryMaxAddr` previously held the minimum and
  maximum code addresses of any RuleSets in this SecMap.  However, computing
  them incrementally is no longer possible, and in any case we need to have a
  fixed address for the SecMap against which the Extent::offset fields are
  based.

  Hence we store instead the lowest and highest code addresses for the mapped
  text segment that this SecMap covers -- hence `mMapMinAVMA` and
  `mMapMaxAVMA`.  These are known before we start reading unwind info for this
  SecMap, and are guaranteed to be a superset of the range previously
  specified by `mSummaryMinAddr` and `mSummaryMaxAddr`.  These ranges are
  guaranteed not to overlap the ranges of any other SecMap in the system, and
  hence can still be used for their intended purpose of binary-searching to
  top level collection of SecMaps (which is owned by the one-and-only PriMap).

* Some comments have been cleaned up.  Some imprecise uses of the term
  "address" have been replaced with the more precise terminology "AVMA"
  (Actual Virtual Memory Address).  See existing comment at the top of
  LulMain.h.

Differential Revision: https://phabricator.services.mozilla.com/D90289
2020-09-16 10:18:36 +00:00
..
AutoObjectMapper.cpp
AutoObjectMapper.h
LulCommon.cpp
LulCommonExt.h
LulDwarf.cpp
LulDwarfExt.h Bug 1648972 - Fix profiler non-unified build - r=canaltinova 2020-06-30 05:07:21 +00:00
LulDwarfInt.h
LulDwarfSummariser.cpp Bug 1665029 - LUL: reduce space consumption by not storing duplicate RuleSets. r=fitzgen. 2020-09-16 10:18:36 +00:00
LulDwarfSummariser.h
LulElf.cpp
LulElfExt.h
LulElfInt.h
LulMain.cpp Bug 1665029 - LUL: reduce space consumption by not storing duplicate RuleSets. r=fitzgen. 2020-09-16 10:18:36 +00:00
LulMain.h
LulMainInt.h Bug 1665029 - LUL: reduce space consumption by not storing duplicate RuleSets. r=fitzgen. 2020-09-16 10:18:36 +00:00
platform-linux-lul.cpp
platform-linux-lul.h