mail-archives/mono-gc-list/2003-August/000036.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
 <HEAD>
   <TITLE> [Mono-gc-list] Fast allocation vs lightweight collection
   </TITLE>
   <LINK REL="Index" HREF="index.html" >
   <LINK REL="made" HREF="mailto:lupus%40ximian.com">
   <META NAME="robots" CONTENT="index,nofollow">

   <LINK REL="Previous"  HREF="000035.html">

 </HEAD>
 <BODY BGCOLOR="#ffffff">
   <H1>[Mono-gc-list] Fast allocation vs lightweight collection
   </H1>
    <B>Paolo Molaro
    </B>
    <A HREF="mailto:lupus%40ximian.com"
       TITLE="[Mono-gc-list] Fast allocation vs lightweight collection">lupus@ximian.com
       </A><BR>
    <I>Thu, 28 Aug 2003 17:50:04 +0200</I>
    <P><UL>
        <LI> Previous message: <A HREF="000035.html">[Mono-gc-list] Fast allocation vs lightweight collection
</A></li>

         <LI> <B>Messages sorted by:</B>
              <a href="date.html#36">[ date ]</a>
              <a href="thread.html#36">[ thread ]</a>
              <a href="subject.html#36">[ subject ]</a>
              <a href="author.html#36">[ author ]</a>
         </LI>
       </UL>
    <HR>
<!--beginarticle-->
<PRE>On 08/26/03 David Jeske wrote:
&gt;<i> On Tue, Aug 26, 2003 at 08:59:02AM +0100, Torstensson, Patrik wrote:
</I>&gt;<i> &gt; Not really true. The cmpxchg (and other) will lock the CPU cache (or
</I>&gt;<i> &gt; cause a cache invalid signal to happen) on a x86. This causes serious
</I>&gt;<i> &gt; performance problems
</I>&gt;<i>
</I>&gt;<i> &gt; (it's better than a kernel lock but still..)
</I>&gt;<i>
</I>&gt;<i> That is quite an understatement. I don't have a manual handy to look
</I>&gt;<i> up the numbers, but my gut ballparks that the kernel context switch,
</I>&gt;<i> probable TLB flush, plus the cache invalidation which is required for
</I>&gt;<i> the kernel locks anyhow will come in at over 10x of the cost of L1/L2
</I>&gt;<i> cache invalidation alone in overall performance cost.
</I>
There is no need to go with kernel-based locks, a spinlock would be
sufficient if there are no suitable atomic instructions or sequences
available for a platform. Still, even if there isn't a function call
with 'lock' in the name, the atomic instructions are expensive
and they add up to the speed overhead that thread-unsafe reference
counting already has.
There is also branch prediction trashing before each inc/dec, since you
need to make sure the reference is not NULL.

&gt;<i> I think we have a disagreement about what it means to be &quot;very
</I>&gt;<i> expensive&quot;. The 1 second worst case pause time of most &quot;modern&quot; GC
</I>&gt;<i> systems is very expensive to me. It usually means I can't write
</I>&gt;<i> software with them and have to resort to C, C++, or a ref-counted
</I>&gt;<i> system like Python. I don't mind a 10%, 20% or sometimes even 30%
</I>&gt;<i> performance hit, as long as it is spread evenly throughout the
</I>&gt;<i> program.
</I>
Many disagree with you on the price people is willing to pay, but fear
not: this is free software and proving your point is just a SMOP:-)
You can start adding an integer counter to the MonoObject struct in
metadata/object.h and adding the proper atomic inc/dec instructions
whenever a reference is written or read.

&gt;<i> For me, ANY pausing scheme is &quot;very expensive&quot;, and ANY incremental
</I>&gt;<i> scheme with bounded worst-case pauses is acceptable.
</I>
I guess you mean unbounded? Reference counting can have unbounded delays
when deallocating, too, though I agree they can be more 'unlikely'.

&gt;<i> Compared to other incremental schemes, reference counting adds some
</I>&gt;<i> cost to the mutator in exchange for greater throughput when memory is
</I>&gt;<i> being turned over quickly.
</I>&gt;<i>
</I>&gt;<i> The write-barrier of a tri-color scheme is possibly less costly, but
</I>&gt;<i> yeilds more work for the incremental collector to do.
</I>
I think the first steps to do if we want to improve or experiment with
the GC in mono are these:
*) make sure we can identify precisely all and only the managed
pointers, in the heap, the stack and the registers. This alone is a big
task. A tweaked libgc that moves the objects around randomly at each
collection can help to shake out the bugs.
*) define an interface for pluggable GCs, so that anyone can get to
experiment with thier preferred GC model.

As I see it, implementing the actual GC algorithm is the easy part
and can only be done after the first step is accomplished.

lupus

--
-----------------------------------------------------------------
<A HREF="mailto:lupus@debian.org">lupus@debian.org</A>                                     debian/rules
<A HREF="mailto:lupus@ximian.com">lupus@ximian.com</A>                             Monkeys do it better

</PRE>
<!--endarticle-->
    <HR>
    <P><UL>
        <!--threads-->
	<LI> Previous message: <A HREF="000035.html">[Mono-gc-list] Fast allocation vs lightweight collection
</A></li>

         <LI> <B>Messages sorted by:</B>
              <a href="date.html#36">[ date ]</a>
              <a href="thread.html#36">[ thread ]</a>
              <a href="subject.html#36">[ subject ]</a>
              <a href="author.html#36">[ author ]</a>
         </LI>
       </UL>
</body></html>