memory-barriers: Replace uses of "transitive"

The current version of memory-barriers.txt misuses the term "transitive", so this commit replaces it with multi-copy atomic, also adding a definition of this term. Reported-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-29 15:49:21 -07:00 · 2017-08-29 15:49:21 -07:00 · f1ab25a30c
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@ -53,7 +53,7 @@ CONTENTS
     - SMP barrier pairing.
     - Examples of memory barrier sequences.
     - Read memory barriers vs load speculation.
-     - Transitivity
+     - Multicopy atomicity.
 (*) Explicit kernel barriers.
@ -635,6 +635,11 @@ can be used to record rare error conditions and the like, and the CPUs'
 naturally occurring ordering prevents such records from being lost.
 Note well that the ordering provided by a data dependency is local to
 the CPU containing it.  See the section on "Multicopy atomicity" for
 more information.
 The data dependency barrier is very important to the RCU system,
 for example.  See rcu_assign_pointer() and rcu_dereference() in
 include/linux/rcupdate.h.  This permits the current target of an RCU'd
@ -851,38 +856,11 @@ In short, control dependencies apply only to the stores in the then-clause
 and else-clause of the if-statement in question (including functions
 invoked by those two clauses), not to code following that if-statement.
 Finally, control dependencies do -not- provide transitivity.  This is
 demonstrated by two related examples, with the initial values of
 'x' and 'y' both being zero:
-	CPU 0                     CPU 1
+Note well that the ordering provided by a control dependency is local
-	=======================   =======================
+to the CPU containing it.  See the section on "Multicopy atomicity"
-	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
+for more information.
 	if (r1 > 0)               if (r2 > 0)
 	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
 	assert(!(r1 == 1 && r2 == 1));
 The above two-CPU example will never trigger the assert().  However,
 if control dependencies guaranteed transitivity (which they do not),
 then adding the following CPU would guarantee a related assertion:
 	CPU 2
 	=====================
 	WRITE_ONCE(x, 2);
 	assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */
 But because control dependencies do -not- provide transitivity, the above
 assertion can fail after the combined three-CPU example completes.  If you
 need the three-CPU example to provide ordering, you will need smp_mb()
 between the loads and stores in the CPU 0 and CPU 1 code fragments,
 that is, just before or just after the "if" statements.  Furthermore,
 the original two-CPU example is very fragile and should be avoided.
 These two examples are the LB and WWC litmus tests from this paper:
 http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
 site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.
 In summary:
@ -922,8 +900,8 @@ In summary:
  (*) Control dependencies pair normally with other types of barriers.
-  (*) Control dependencies do -not- provide transitivity.  If you
+  (*) Control dependencies do -not- provide multicopy atomicity.  If you
-      need transitivity, use smp_mb().
+      need all the CPUs to see a given store at the same time, use smp_mb().
  (*) Compilers do not understand control dependencies.  It is therefore
      your job to ensure that they do not break your code.
@ -936,13 +914,14 @@ When dealing with CPU-CPU interactions, certain types of memory barrier should
 always be paired.  A lack of appropriate pairing is almost certainly an error.
 General barriers pair with each other, though they also pair with most
-other types of barriers, albeit without transitivity.  An acquire barrier
+other types of barriers, albeit without multicopy atomicity.  An acquire
-pairs with a release barrier, but both may also pair with other barriers,
+barrier pairs with a release barrier, but both may also pair with other
-including of course general barriers.  A write barrier pairs with a data
+barriers, including of course general barriers.  A write barrier pairs
-dependency barrier, a control dependency, an acquire barrier, a release
+with a data dependency barrier, a control dependency, an acquire barrier,
-barrier, a read barrier, or a general barrier.  Similarly a read barrier,
+a release barrier, a read barrier, or a general barrier.  Similarly a
-control dependency, or a data dependency barrier pairs with a write
+read barrier, control dependency, or a data dependency barrier pairs
-barrier, an acquire barrier, a release barrier, or a general barrier:
+with a write barrier, an acquire barrier, a release barrier, or a
 general barrier:
 	CPU 1		      CPU 2
 	===============	      ===============
@ -1359,64 +1338,77 @@ the speculation will be cancelled and the value reloaded:
 	retrieved                               :       :       +-------+
-TRANSITIVITY
+MULTICOPY ATOMICITY
------------
+--------------------
-Transitivity is a deeply intuitive notion about ordering that is not
+Multicopy atomicity is a deeply intuitive notion about ordering that is
-always provided by real computer systems.  The following example
+not always provided by real computer systems, namely that a given store
-demonstrates transitivity:
+is visible at the same time to all CPUs, or, alternatively, that all
 CPUs agree on the order in which all stores took place.  However, use of
 full multicopy atomicity would rule out valuable hardware optimizations,
 so a weaker form called ``other multicopy atomicity'' instead guarantees
 that a given store is observed at the same time by all -other- CPUs.  The
 remainder of this document discusses this weaker form, but for brevity
 will call it simply ``multicopy atomicity''.
 The following example demonstrates multicopy atomicity:
 	CPU 1			CPU 2			CPU 3
 	=======================	=======================	=======================
 		{ X = 0, Y = 0 }
-	STORE X=1		LOAD X			STORE Y=1
+	STORE X=1		r1=LOAD X (reads 1)	LOAD Y (reads 1)
-				<general barrier>	<general barrier>
+				<general barrier>	<read barrier>
-				LOAD Y			LOAD X
+				STORE Y=r1		LOAD X
-Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.
+Suppose that CPU 2's load from X returns 1 which it then stores to Y and
-This indicates that CPU 2's load from X in some sense follows CPU 1's
+that CPU 3's load from Y returns 1.  This indicates that CPU 2's load
-store to X and that CPU 2's load from Y in some sense preceded CPU 3's
+from X in some sense follows CPU 1's store to X and that CPU 2's store
-store to Y.  The question is then "Can CPU 3's load from X return 0?"
+to Y in some sense preceded CPU 3's load from Y.  The question is then
 "Can CPU 3's load from X return 0?"
-Because CPU 2's load from X in some sense came after CPU 1's store, it
+Because CPU 3's load from X in some sense came after CPU 2's load, it
 is natural to expect that CPU 3's load from X must therefore return 1.
-This expectation is an example of transitivity: if a load executing on
+This expectation is an example of multicopy atomicity: if a load executing
-CPU A follows a load from the same variable executing on CPU B, then
+on CPU A follows a load from the same variable executing on CPU B, then
-CPU A's load must either return the same value that CPU B's load did,
+an understandable but incorrect expectation is that CPU A's load must
-or must return some later value.
+either return the same value that CPU B's load did, or must return some
 later value.
-In the Linux kernel, use of general memory barriers guarantees
+In the Linux kernel, the above use of a general memory barrier compensates
-transitivity.  Therefore, in the above example, if CPU 2's load from X
+for any lack of multicopy atomicity.  Therefore, in the above example,
-returns 1 and its load from Y returns 0, then CPU 3's load from X must
+if CPU 2's load from X returns 1 and its load from Y returns 0, and CPU 3's
-also return 1.
+load from Y returns 1, then CPU 3's load from X must also return 1.
-However, transitivity is -not- guaranteed for read or write barriers.
+However, dependencies, read barriers, and write barriers are not always
-For example, suppose that CPU 2's general barrier in the above example
+able to compensate for non-multicopy atomicity.  For example, suppose
-is changed to a read barrier as shown below:
+that CPU 2's general barrier is removed from the above example, leaving
 only the data dependency shown below:
 	CPU 1			CPU 2			CPU 3
 	=======================	=======================	=======================
 		{ X = 0, Y = 0 }
-	STORE X=1		LOAD X			STORE Y=1
+	STORE X=1		r1=LOAD X (reads 1)	LOAD Y (reads 1)
-				<read barrier>		<general barrier>
+				<data dependency>	<read barrier>
-				LOAD Y			LOAD X
+				STORE Y=r1		LOAD X (reads 0)
-This substitution destroys transitivity: in this example, it is perfectly
+This substitution allows non-multicopy atomicity to run rampant: in
-legal for CPU 2's load from X to return 1, its load from Y to return 0,
+this example, it is perfectly legal for CPU 2's load from X to return 1,
-and CPU 3's load from X to return 0.
+CPU 3's load from Y to return 1, and its load from X to return 0.
-The key point is that although CPU 2's read barrier orders its pair
+The key point is that although CPU 2's data dependency orders its load
-of loads, it does not guarantee to order CPU 1's store.  Therefore, if
+and store, it does not guarantee to order CPU 1's store.  Therefore,
-this example runs on a system where CPUs 1 and 2 share a store buffer
+if this example runs on a non-multicopy-atomic system where CPUs 1 and 2
-or a level of cache, CPU 2 might have early access to CPU 1's writes.
+share a store buffer or a level of cache, CPU 2 might have early access
-General barriers are therefore required to ensure that all CPUs agree
+to CPU 1's writes.  A general barrier is therefore required to ensure
-on the combined order of CPU 1's and CPU 2's accesses.
+that all CPUs agree on the combined order of CPU 1's and CPU 2's accesses.
-General barriers provide "global transitivity", so that all CPUs will
+General barriers can compensate not only for non-multicopy atomicity,
-agree on the order of operations.  In contrast, a chain of release-acquire
+but can also generate additional ordering that can ensure that -all-
-pairs provides only "local transitivity", so that only those CPUs on
+CPUs will perceive the same order of -all- operations.  In contrast, a
-the chain are guaranteed to agree on the combined order of the accesses.
+chain of release-acquire pairs do not provide this additional ordering,
-For example, switching to C code in deference to Herman Hollerith:
+which means that only those CPUs on the chain are guaranteed to agree
 on the combined order of the accesses.  For example, switching to C code
 in deference to the ghost of Herman Hollerith:
 	int u, v, x, y, z;
@ -1448,9 +1440,9 @@ For example, switching to C code in deference to Herman Hollerith:
 		r3 = READ_ONCE(u);
 	}
-Because cpu0(), cpu1(), and cpu2() participate in a local transitive
+Because cpu0(), cpu1(), and cpu2() participate in a chain of
-chain of smp_store_release()/smp_load_acquire() pairs, the following
+smp_store_release()/smp_load_acquire() pairs, the following outcome
-outcome is prohibited:
+is prohibited:
 	r0 == 1 && r1 == 1 && r2 == 1
@ -1460,9 +1452,9 @@ outcome is prohibited:
 	r1 == 1 && r5 == 0
-However, the transitivity of release-acquire is local to the participating
+However, the ordering provided by a release-acquire chain is local
-CPUs and does not apply to cpu3().  Therefore, the following outcome
+to the CPUs participating in that chain and does not apply to cpu3(),
-is possible:
+at least aside from stores.  Therefore, the following outcome is possible:
 	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
@ -1490,8 +1482,8 @@ following outcome is possible:
 Note that this outcome can happen even on a mythical sequentially
 consistent system where nothing is ever reordered.
-To reiterate, if your code requires global transitivity, use general
+To reiterate, if your code requires full ordering of all operations,
-barriers throughout.
+use general barriers throughout.
 ========================
@ -3101,6 +3093,9 @@ AMD64 Architecture Programmer's Manual Volume 2: System Programming
 	Chapter 7.1: Memory-Access Ordering
 	Chapter 7.4: Buffering and Combining Memory Writes
 ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
 	Chapter B2: The AArch64 Application Level Memory Model
 IA-32 Intel Architecture Software Developer's Manual, Volume 3:
 System Programming Guide
 	Chapter 7.1: Locked Atomic Operations
@ -3112,6 +3107,8 @@ The SPARC Architecture Manual, Version 9
 	Appendix D: Formal Specification of the Memory Models
 	Appendix J: Programming with the Memory Models
 Storage in the PowerPC (Stone and Fitzgerald)
 UltraSPARC Programmer Reference Manual
 	Chapter 5: Memory Accesses and Cacheability
 	Chapter 15: Sparc-V9 Memory Models