Bug 1696393 - Add in-tree HTTP cache docs r=necko-reviewers,sylvestre,dragana DONTBUILD

Differential Revision: https://phabricator.services.mozilla.com/D107203
2021-03-08 10:28:11 +00:00 · 2021-03-08 10:28:11 +00:00 · 07f95b4824
--- a/docs/config.yml
+++ b/docs/config.yml
@ -20,6 +20,7 @@ categories:
        - mobile/android/geckoview
        - dom/bindings/webidl
        - modules/libpref
+        - networking
        - remote
        - services
        - uriloader
--- a/netwerk/docs/cache2/doc.rst
+++ b/netwerk/docs/cache2/doc.rst
@ -0,0 +1,558 @@
+HTTP Cache
+==========
+
+This document describes the **HTTP cache implementation**.
+
+The code resides in `/netwerk/cache2 (searchfox)
+<https://searchfox.org/mozilla-central/source/netwerk/cache2>`_
+
+API
+---
+
+Here is a detailed description of the HTTP cache v2 API, examples
+included.  This document only contains what cannot be found or may not
+be clear directly from the `IDL files <https://searchfox.org/mozilla-central/search?q=&path=cache2%2FnsICache&case=false&regexp=false>`_ comments.
+
+-  The cache API is **completely thread-safe** and **non-blocking**.
+-  There is **no IPC support**.  It's only accessible on the default
+   chrome process.
+-  When there is no profile the new HTTP cache works, but everything is
+   stored only in memory not obeying any particular limits.
+
+nsICacheStorageService
+----------------------
+
+-  The HTTP cache entry-point. Accessible as a service only, fully
+   thread-safe, scriptable.
+
+-  `nsICacheStorageService.idl (searchfox) <https://searchfox.org/mozilla-central/source/netwerk/cache2/nsICacheStorageService.idl>`_
+
+-   \ ``"@mozilla.org/netwerk/cache-storage-service;1"``
+
+-  Provides methods accessing "storage" objects – see `nsICacheStorage` below – giving further access to cache entries – see :ref:`nsICacheEntry` more below – per specific URL.
+
+-  Currently we have 3 types of storages, all the access methods return
+   an :ref:`nsICacheStorage` object:
+
+   -  **memory-only** (``memoryCacheStorage``): stores data only in a
+      memory cache, data in this storage are never put to disk
+
+   -  **disk** (``diskCacheStorage``): stores data on disk, but for
+      existing entries also looks into the memory-only storage; when
+      instructed via a special argument also primarily looks into
+      application caches
+
+   .. note::
+
+      **application cache** (``appCacheStorage``): when a consumer has a
+      specific ``nsIApplicationCache`` (i.e. a particular app cache
+      version in a group) in hands, this storage will provide read and
+      write access to entries in that application cache; when the app
+      cache is not specified, this storage will operate over all
+      existing app caches. **This kind of storage is deprecated and will be removed** in `bug 1694662 <https://bugzilla.mozilla.org/show_bug.cgi?id=1694662>`_ 
+
+-  The service also provides methods to clear the whole disk and memory
+   cache content or purge any intermediate memory structures:
+
+   -  ``clear``– after it returns, all entries are no longer accessible
+      through the cache APIs; the method is fast to execute and
+      non-blocking in any way; the actual erase happens in background
+
+   -  ``purgeFromMemory``– removes (schedules to remove) any
+      intermediate cache data held in memory for faster access (more
+      about the :ref:`Intermediate_Memory_Caching` below)
+
+nsILoadContextInfo
+------------------
+
+-  Distinguishes the scope of the storage demanded to open.
+
+-  Mandatory argument to ``*Storage`` methods of :ref:`nsICacheStorageService`.
+
+-  `nsILoadContextInfo.idl (searchfox) <https://searchfox.org/mozilla-central/source/netwerk/base/nsILoadContextInfo.idl>`_
+
+
+-  It is a helper interface wrapping following four arguments into a single one:
+
+   -  **private-browsing** boolean flag
+   -  **anonymous load** boolean flag
+   -  **origin attributes** js value
+
+   .. note::
+
+      Helper functions to create nsILoadContextInfo objects:
+
+      -  C++ consumers: functions at ``LoadContextInfo.h`` exported
+         header
+
+      -  JS consumers: ``Services.loadContextInfo`` which is an instance of ``nsILoadContextInfoFactory``.
+
+-  Two storage objects created with the same set of
+   ``nsILoadContextInfo``\ arguments are identical, containing the same
+   cache entries.
+
+-  Two storage objects created with in any way different
+   ``nsILoadContextInfo``\ arguments are strictly and completely
+   distinct and cache entries in them do not overlap even when having
+   the same URIs.
+
+
+nsICacheStorage
+---------------
+
+-  `nsICacheStorage.idl (searchfox) <https://searchfox.org/mozilla-central/source/netwerk/cache2/nsICacheStorage.idl>`_
+
+-  Obtained from call to one of the ``*Storage`` methods on
+   :ref:`nsICacheStorageService`.
+
+-  Represents a distinct storage area (or scope) to put and get cache
+   entries mapped by URLs into and from it.
+
+-  *Similarity with the old cache*\ : this interface may be with some
+   limitations considered as a mirror to ``nsICacheSession``, but less
+   generic and not inclining to abuse.
+
+nsICacheEntryOpenCallback
+-------------------------
+
+-  `nsICacheEntryOpenCallback.idl (searchfox) <https://searchfox.org/mozilla-central/source/netwerk/cache2/nsICacheEntryOpenCallback.idl>`_
+
+-  The result of ``nsICacheStorage.asyncOpenURI`` is always and only
+   sent to callbacks on this interface.
+
+-  These callbacks are ensured to be invoked when ``asyncOpenURI``
+   returns ``NS_OK``.
+
+-  
+
+   .. note::
+
+      When the
+      cache entry object is already present in memory or open as
+      "force-new" (a.k.a "open-truncate") this callback is invoked
+      sooner then the ``asyncOpenURI``\ method returns (i.e.
+      immediately); there is currently no way to opt out of this feature
+      (see `bug
+      938186 <https://bugzilla.mozilla.org/show_bug.cgi?id=938186>`__).
+
+nsICacheEntry
+-------------
+
+-  `nsICacheEntry.idl (searchfox) <https://searchfox.org/mozilla-central/source/netwerk/cache2/nsICacheEntry.idl>`_
+
+-  Obtained asynchronously or pseudo-asynchronously by a call to
+   ``nsICacheStorage.asyncOpenURI``.
+
+-  Provides access to a cached entry data and meta data for reading or
+   writing or in some cases both, see below.
+
+Lifetime of a new entry
+-----------------------
+
+-  Such entry is initially empty (no data or meta data is stored in it).
+
+-  The ``aNew``\ argument in ``onCacheEntryAvailable`` is ``true`` for
+   and only for new entries.
+
+-  Only one consumer (the so called "*writer*") may have such an entry
+   available (obtained via ``onCacheEntryAvailable``).
+
+-  Other parallel openers of the same cache entry are blocked (wait) for
+   invocation of their ``onCacheEntryAvailable`` until one of the
+   following occurs:
+
+   -  The *writer* simply throws the entry away: other waiting opener in
+      line gets the entry again as "*new*", the cycle repeats.
+
+      .. note::
+
+         This applies in general, writers throwing away the cache entry
+         means a failure to write the cache entry and a new writer is
+         being looked for again, the cache entry remains empty (a.k.a.
+         "new").
+
+   -  The *writer* stored all necessary meta data in the cache entry and
+      called ``metaDataReady`` on it: other consumers now get the entry
+      and may examine and potentially modify the meta data and read the
+      data (if any) of the cache entry.
+   -  When the *writer* has data (i.e. the response payload) to write to
+      the cache entry, it **must** open the output stream on it
+      **before** it calls ``metaDataReady``.
+
+-  When the *writer* still keeps the cache entry and has open and keeps
+   open the output stream on it, other consumers may open input streams
+   on the entry. The data will be available as the *writer* writes data
+   to the cache entry's output stream immediately, even before the
+   output stream is closed. This is called :ref:`concurrent
+   read/write <Concurrent_read_and_write>`.
+
+Concurrent read and write
+-------------------------
+
+The cache supports reading a cache entry data while it is still being
+written by the first consumer - the *writer*.
+This can only be engaged for resumable responses that (`bug
+960902 <https://bugzilla.mozilla.org/show_bug.cgi?id=960902#c17>`__)
+don't need revalidation. Reason is that when the writer is interrupted
+(by e.g. external canceling of the loading channel) concurrent readers
+would not be able to reach the remaining unread content.
+
+.. note::
+
+   This could be improved by keeping the network load running and being
+   stored to the cache entry even after the writing channel has been
+   canceled.
+
+When the *writer* is interrupted, the first concurrent *reader* in line
+does a range request for the rest of the data - and becomes that way a
+new *writer*. The rest of the *readers* are still concurrently reading
+the content since output stream for the cache entry is again open and
+kept by the current *writer*.
+
+Lifetime of an existing entry with only a partial content
+---------------------------------------------------------
+
+-  Such a cache entry is first examined in the
+   ``nsICacheEntryOpenCallback.onCacheEntryCheck`` callback, where it
+   has to be checked for completeness.
+-  In this case, the ``Content-Length`` (or different indicator) header
+   doesn't equal to the data size reported by the cache entry.
+-  The consumer then indicates the cache entry needs to be revalidated
+   by returning ``ENTRY_NEEDS_REVALIDATION``\ from
+   ``onCacheEntryCheck``.
+-  This consumer, from the point of view the cache, takes a role of the
+   *writer*.
+-  Other parallel consumers, if any, are blocked until the *writer*
+   calls ``setValid`` on the cache entry.
+-  The consumer is then responsible to validate the partial content
+   cache entry with the network server and attempt to load the rest of
+   the data.
+-  When the server responds positively (in case of an HTTP server with a
+   206 response code) the *writer* (in this order) opens the output
+   stream on the cache entry and calls ``setValid`` to unblock other
+   pending openers.
+-  Concurrent read/write is engaged.
+
+Lifetime of an existing entry that doesn't pass server revalidation
+-------------------------------------------------------------------
+
+-  Such a cache entry is first examined in the
+   ``nsICacheEntryOpenCallback.onCacheEntryCheck`` callback, where the
+   consumer finds out it must be revalidated with the server before use.
+-  The consumer then indicates the cache entry needs to be revalidated
+   by returning ``ENTRY_NEEDS_REVALIDATION``\ from
+   ``onCacheEntryCheck``.
+-  This consumer, from the point of view the cache, takes a role of the
+   *writer*.
+-  Other parallel consumers, if any, are blocked until the *writer*
+   calls ``setValid`` on the cache entry.
+-  The consumer is then responsible to validate the partial content
+   cache entry with the network server.
+-  The server responses with a 200 response which means the cached
+   content is no longer valid and a new version must be loaded from the
+   network.
+-  The *writer* then calls ``recreate``\ on the cache entry. This
+   returns a new empty entry to write the meta data and data to, the
+   *writer* exchanges its cache entry by this new one and handles it as
+   a new one.
+-  The *writer* then (in this order) fills the necessary meta data of
+   the cache entry, opens the output stream on it and calls
+   ``metaDataReady`` on it.
+-  Any other pending openers, if any, are now given this new entry to
+   examine and read as an existing entry.
+
+Adding a new storage
+--------------------
+
+Should there be a need to add a new distinct storage for which the
+current scoping model would not be sufficient - use one of the two
+following ways:
+
+#. *[preferred]* Add a new ``<Your>Storage`` method on
+   :ref:`nsICacheStorageService` and if needed give it any arguments to
+   specify the storage scope even more.  Implementation only should need
+   to enhance the context key generation and parsing code and enhance
+   current - or create new when needed - :ref:`nsICacheStorage`
+   implementations to carry any additional information down to the cache
+   service.
+#. *[*\ **not**\ *preferred]* Add a new argument to
+   :ref:`nsILoadContextInfo`; **be careful
+   here**, since some arguments on the context may not be known during
+   the load time, what may lead to inter-context data leaking or
+   implementation problems. Adding more distinction to
+   :ref:`nsILoadContextInfo` also affects all existing storages which may
+   not be always desirable.
+
+See context keying details for more information.
+
+Threading
+---------
+
+The cache API is fully thread-safe.
+
+The cache is using a single background thread where any IO operations
+like opening, reading, writing and erasing happen.  Also memory pool
+management, eviction, visiting loops happen on this thread.
+
+The thread supports several priority levels. Dispatching to a level with
+a lower number is executed sooner then dispatching to higher number
+layers; also any loop on lower levels yields to higher levels so that
+scheduled deletion of 1000 files will not block opening cache entries.
+
+#. **OPEN_PRIORITY:** except opening priority cache files also file
+   dooming happens here to prevent races
+#. **READ_PRIORITY:** top level documents and head blocking script cache
+   files are open and read as the first
+#. **OPEN**
+#. **READ:** any normal priority content, such as images are open and
+   read here
+#. **WRITE:** writes are processed as last, we cache data in memory in
+   the mean time
+#. **MANAGEMENT:** level for the memory pool and CacheEntry background
+   operations
+#. **CLOSE:** file closing level
+#. **INDEX:** index is being rebuild here
+#. **EVICT:** files overreaching the disk space consumption limit are
+   being evicted here
+
+NOTE: Special case for eviction - when an eviction is scheduled on the
+IO thread, all operations pending on the OPEN level are first merged to
+the OPEN_PRIORITY level. The eviction preparation operation - i.e.
+clearing of the internal IO state - is then put to the end of the
+OPEN_PRIORITY level.  All this happens atomically.
+
+Storage and entries scopes
+--------------------------
+
+A *scope key* string used to map the storage scope is based on the
+arguments of :ref:`nsILoadContextInfo`. The form is following (currently
+pending in `bug
+968593 <https://bugzilla.mozilla.org/show_bug.cgi?id=968593>`__):
+
+.. code:: bz_comment_text
+
+   a,b,i1009,p,
+
+-  Regular expression: ``(.([-,]+)?,)*``
+-  The first letter is an identifier, identifiers are to be
+   alphabetically sorted and always terminate with ','
+-  a - when present the scope is belonging to an **anonymous** load
+-  b - when present the scope is **in browser element** load
+-  i - when present must have a decimal integer value that represents an
+   app ID the scope belongs to, otherwise there is no app (app ID is
+   considered ``0``)
+-  p - when present the scope is of a **private browsing** load, this
+   never persists
+
+``CacheStorageService``\ keeps a global hashtable mapped by the *scope
+key*. Elements in this global hashtable are hashtables of cache entries.
+The cache entries are mapped by concantation of Enhance ID and URI
+passed to ``nsICacheStorage.asyncOpenURI``.  So that when an entry is
+being looked up, first the global hashtable is searched using the
+*scope key*. An entries hashtable is found. Then this entries hashtable
+is searched using <enhance-id:><uri> string. The elements in this
+hashtable are CacheEntry classes, see below.
+
+The hash tables keep a strong reference to ``CacheEntry`` objects. The
+only way to remove ``CacheEntry`` objects from memory is by exhausting a
+memory limit for :ref:`Intermediate_Memory_Caching`, what triggers a background
+process of purging expired and then least used entries from memory.
+Another way is to directly call the
+``nsICacheStorageService.purge``\ method. That method is also called
+automatically on the ``"memory-pressure"`` indication.
+
+Access to the hashtables is protected by a global lock. We also - in a
+thread-safe manner - count the number of consumers keeping a reference
+on each entry. The open callback actually doesn't give the consumer
+directly the ``CacheEntry`` object but a small wrapper class that
+manages the 'consumer reference counter' on its cache entry. This both
+mechanisms ensure thread-safe access and also inability to have more
+then a single instance of a ``CacheEntry`` for a single
+<scope+enhanceID+URL> key.
+
+``CacheStorage``, implementing the :ref:`nsICacheStorage` interface, is
+forwarding all calls to internal methods of ``CacheStorageService``
+passing itself as an argument.  ``CacheStorageService`` then generates
+the *scope key* using the ``nsILoadContextInfo`` of the storage. Note:
+CacheStorage keeps a thread-safe copy of ``nsILoadContextInfo`` passed
+to a ``*Storage`` method on ``nsICacheStorageService``.
+
+Invoking open callbacks
+-----------------------
+
+``CacheEntry``, implementing the ``nsICacheEntry`` interface, is
+responsible for managing the cache entry internal state and to properly
+invoke ``onCacheEntryCheck`` and ``onCacheEntryAvaiable`` callbacks to
+all callers of ``nsICacheStorage.asyncOpenURI``.
+
+-  Keeps a FIFO of all openers.
+-  Keeps its internal state like NOTLOADED, LOADING, EMPTY, WRITING,
+   READY, REVALIDATING.
+-  Keeps the number of consumers keeping a reference to it.
+-  Refers a ``CacheFile`` object that holds actual data and meta data
+   and, when told to, persists it to the disk.
+
+The openers FIFO is an array of ``CacheEntry::Callback`` objects.
+``CacheEntry::Callback`` keeps a strong reference to the opener plus the
+opening flags.  ``nsICacheStorage.asyncOpenURI`` forwards to
+``CacheEntry::AsyncOpen`` and triggers the following pseudo-code:
+
+**CacheStorage::AsyncOpenURI** - the API entry point:
+
+-  globally atomic:
+
+   -  look a given ``CacheEntry`` in ``CacheStorageService`` hash tables
+      up
+   -  if not found: create a new one, add it to the proper hash table
+      and set its state to NOTLOADED
+   -  consumer reference ++
+
+-  call to `CacheEntry::AsyncOpen`
+-  consumer reference --
+
+**CacheEntry::AsyncOpen** (entry atomic):
+
+-  the opener is added to FIFO, consumer reference ++ (dropped back
+   after an opener is removed from the FIFO)
+-  state == NOTLOADED:
+
+   -  state = LOADING
+   -  when OPEN_TRUNCATE flag was used:
+
+      -  ``CacheFile`` is created as 'new', state = EMPTY
+
+   -  otherwise:
+
+      -  ``CacheFile`` is created and load on it started
+      -  ``CacheEntry::OnFileReady`` notification is now expected
+
+-  state == LOADING: just do nothing and exit
+-  call to `CacheEntry::InvokeCallbacks`
+
+**CacheEntry::InvokeCallbacks** (entry atomic):
+
+-  called on:
+
+   -  a new opener has been added to the FIFO via an ``AsyncOpen`` call
+   -  asynchronous result of CacheFile open ``CacheEntry::OnFileReady>``
+   -  the writer throws the entry away - ``CacheEntry::OnHandleClosed``
+   -  the **output stream** of the entry has been **opened** or
+      **closed**
+   -  ``metaDataReady``\ or ``setValid``\ on the entry has been called
+   -  the entry has been **doomed**
+
+-  state == EMPTY:
+
+   -  on OPER_READONLY flag use: onCacheEntryAvailable with
+      ``null``\ for the cache entry
+   -  otherwise:
+
+      -  state = WRITING
+      -  opener is removed from the FIFO and remembered as the current
+         '*writer*'
+      -  onCacheEntryAvailable with ``aNew = true``\ and this entry is
+         invoked (on the caller thread) for the *writer*
+
+-  state == READY:
+
+   -  onCacheEntryCheck with the entry is invoked on the first opener in
+      FIFO - on the caller thread if demanded
+   -  result == RECHECK_AFTER_WRITE_FINISHED:
+
+      -  opener is left in the FIFO with a flag ``RecheckAfterWrite``
+      -  such openers are skipped until the output stream on the entry
+         is closed, then ``onCacheEntryCheck`` is re-invoked on them
+      -  Note: here is a potential for endless looping when
+         RECHECK_AFTER_WRITE_FINISHED is abused
+
+   -  result == ENTRY_NEEDS_REVALIDATION:
+
+      -  state = REVALIDATING, this prevents invocation of any callback
+         until ``CacheEntry::SetValid`` is called
+      -  continue as in state ENTRY_WANTED (just below)
+
+   -  result == ENTRY_WANTED:
+
+      -  consumer reference ++ (dropped back when the consumer releases
+         the entry)
+      -  onCacheEntryAvailable is invoked on the opener with
+         ``aNew = false``\ and the entry
+      -  opener is removed from the FIFO
+
+   -  result == ENTRY_NOT_WANTED:
+
+      -  ``onCacheEntryAvailable`` is invoked on the opener with
+         ``null``\ for the entry
+      -  opener is removed from the FIFO
+
+-  state == WRITING or REVALIDATING:
+
+   -  do nothing and exit
+
+-  any other value of state is unexpected here (assertion failure)
+-  loop this process while there are openers in the FIFO
+
+**CacheEntry::OnFileReady** (entry atomic):
+
+-  load result == failure or the file has not been found on disk (is
+   new): state = EMPTY
+-  otherwise: state = READY since the cache file has been found and is
+   usable containing meta data and data of the entry
+-  call to ``CacheEntry::InvokeCallbacks``
+
+**CacheEntry::OnHandleClosed** (entry atomic):
+
+-  Called when any consumer throws the cache entry away
+-  If the handle is not the handle given to the current *writer*, then
+   exit
+-  state == WRITING: the writer failed to call ``metaDataReady`` on the
+   entry - state = EMPTY
+-  state == REVALIDATING: the writer failed the re-validation process
+   and failed to call ``setValid`` on the entry - state = READY
+-  call to ``CacheEntry::InvokeCallbacks``
+
+**All consumers release the reference:**
+
+-  the entry may now be purged (removed) from memory when found expired
+   or least used on overrun of the :ref:`memory
+   pool <Intermediate_Memory_Caching>` limit
+-  when this is a disk cache entry, its cached data chunks are released
+   from memory and only meta data is kept
+
+Intermediate memory caching
+---------------------------
+
+Intermediate memory caching of frequently used metadata (a.k.a. disk cache memory pool).
+
+For the disk cache entries we keep some of the most recent and most used
+cache entries' meta data in memory for immediate zero-thread-loop
+opening. The default size of this meta data memory pool is only 250kB
+and is controlled by a new ``browser.cache.disk.metadata_memory_limit``
+preference. When the limit is exceeded, we purge (throw away) first
+**expired** and then **least used** entries to free up memory again. 
+
+Only ``CacheEntry`` objects that are already loaded and filled with data
+and having the 'consumer reference == 0' (`bug
+942835 <https://bugzilla.mozilla.org/show_bug.cgi?id=942835#c3>`__) can
+be purged.
+
+The 'least used' entries are recognized by the lowest value of
+`frecency <https://wiki.mozilla.org/User:Jesse/NewFrecency?title=User:Jesse/NewFrecency>`__
+we re-compute for each entry on its every access. The decay time is
+controlled by the ``browser.cache.frecency_half_life_hours`` preference
+and defaults to 6 hours. The best decay time will be based on results of
+`an experiment <https://bugzilla.mozilla.org/show_bug.cgi?id=986728>`__.
+
+The memory pool is represented by two lists (strong referring ordered
+arrays) of ``CacheEntry`` objects:
+
+#. Sorted by expiration time (that default to 0xFFFFFFFF)
+#. Sorted by frecency (defaults to 0)
+
+We have two such pools, one for memory-only entries actually
+representing the memory-only cache and one for disk cache entries for
+which we only keep the meta data.  Each pool has a different limit
+checking - the memory cache pool is controlled by
+``browser.cache.memory.capacity``, the disk entries pool is already
+described above. The pool can be accessed and modified only on the cache
+background thread.
--- a/netwerk/docs/index.rst
+++ b/netwerk/docs/index.rst
@ -0,0 +1,9 @@
+Networking
+==========
+
+These linked pages contain design documents for the Networking stack implementation in Gecko. They live in-tree under the 'netwerk/docs' directory.
+
+.. toctree::
+   :maxdepth: 1
+
+   cache2/doc
--- a/netwerk/moz.build
+++ b/netwerk/moz.build
@ -32,3 +32,5 @@ DIRS += ["locales"]

 DIRS += ["build"]
 TEST_DIRS += ["test"]
+
+SPHINX_TREES["/networking"] = "docs"