[Cplex] Strands teaser

Blaine Garst blaine at mac.com
Fri Jun 21 00:28:39 CEST 2013


On Jun 20, 2013, at 2:17 PM, "Robison, Arch" <arch.robison at intel.com> wrote:

> From first glance, I believe that a Cilk run-time might be implemented on top of your strands abstraction, since we have an abstraction in the run-time that serves a similar purpose.  [Note: Cilk literature uses “strand” to mean something very different.]  A Cilk runtime cannot be implemented efficiently using FIFO queues.  It uses per-thread deques that are implemented using atomic reads, atomic writes, and mutual exclusion. 

This sounds good.  Rather than explain explain explain how it works here, I would rather spend the time on the paper with its examples.  There was once some proof work done on the minimalism of the reader-writer scheduling policies written using equivalent primitives.  I'm happy to code up some A:B:C comparisons using existing OpenMP & Cilk & the actor strand library across some simplistic micro-benchmarks just to reassure the performance crowd (not that they are easily consoled).



> C11 gives us atomic reads and atomic writes, so if strands support exclusion and some kind of coroutine-like switching (block current strand; resume another specified strand), I think that’s enough to implement a Cilk runtime. 

This co-routine-like switching is the fundamental operation I'm speaking of.  To let Schrodinger's cat out of the box, we actually queue the strand itself.  Each strand has 3 links: one for basic contention and two for the scheduling queue.

>  
> Heidi Pan’s thesis is the best work to date that I know of on the issue of composing OpenMP/Cilk/TBB and other parallel libraries.  Seehttp://lithe.eecs.berkeley.edu/pubs/pan-phd-thesis.pdf  It employs a hierarchical scheme along the lines of what you suggest.

That will be a good read.

>  
> When OpenMP code is called from code written using any other threading package, the results are predictably unsatisfactory, because OpenMP requires (by tradition of paying customers) that it start up a new pool of P-1 threads for each top-level parallel construct entered by a non-OpenMP thread.  For example, if P threads from Cilk [or TBB or PPL] call OpenMP, up to P*(P-1) threads may be running.  The other way around is not so bad: OpenMP calling Cilk results in at most 2P-1 threads running  since Cilk shares a common pool of workers.

Pools of threads are a concession to the simulation-of-multi-core that I mentioned earlier.  They are, for now, a necessary evil, but one that the strand library can mitigate nicely.

One of the driving forces behind this work is the notion that multi-core will become so prevalent so fast that compilers will have the choice of using strands for implicit parallelism.  Also that, in the Actor world, memory coherency can be managed explicitly with less power, chip area, and probably greater speed.

Blaine

>  
> - Arch
>  
> From: cplex-bounces at open-std.org [mailto:cplex-bounces at open-std.org] On Behalf Of Blaine Garst
> Sent: Thursday, June 20, 2013 12:15 PM
> To: cplex at open-std.org
> Cc: Carl Hewitt
> Subject: [Cplex] Strands teaser
>  
> Over the past several years I have been working, with Carl Hewitt, on a "strand" library built around the notion of provably minimal scheduling operations.  I was alluding to this during the conference call.
>  
> The idea is that a strand consists of a stack, register save area, and a few minimal accounting hooks.  The basic primitives for contention arbitration use standard compare-and-swap and look, at a very high level, like mutex and condition variables, but operate directly on the strand rather than a separate mutex variable.  These strands cooperate with the heavier-weight pthreads as necessary but otherwise replace them for computation.  
>  
> Venturing out of strand code into general code that calls OS primitives requires a bit of stack fixup that the compiler can arrange (a few instructions).  The key observation here is that threads have been an operating system invention to simulate multi-core and are now mostly overhead.
>  
> The scheduling primitives (enter exclusion, leave exclusion, defer to queue, resume queue) are used to implement many different policies: we have two forms of reader-writer (which has priority), a low-level gate (blocks until set), etc.  My belief is that these primitives are sufficient to model CSP, Go, Erlang, and probably several other systems (Linda?), and (the question in this forum) the lower level interactions that go on in the Cilk and OpenMP runtime libraries.  Having implemented the primitives, I can attest to their utter simplicity and minimalism, and would love to compare notes with the mechanisms in place for Cilk and OpenMP.  I suspect that OpenMP will require use of real-time scheduling policies for the underlying pthreads, which is straight-forward, which may well be of general interest.
>  
> A key issue with strands is the mechanisms to provide adaptive scheduling.  The mechanism we see is a top-down hierarchy of scheduling "objects" we call Sponsors.   The highest level Sponsor deals with pthread-pool operating system primitives found on Mac OS X (FreeBSD), and *should* be able to work with Microsoft thread pools as well (not sure about the underlying OS support there).   It subdivides computing resources with its tree of sub-Sponsors, each of which only knows about itself, its parent, and its children.  Work stealing across sponsors is a natural tree operation.
>  
> Whereas Carl and I are using these strand primitives to implement Actors, we see them as a general purpose mechanism and have been intending to offer them as such to the C committee for several months now.
>  
>  
> Until I get that paper written, I have a question to ask: what happens when Cilk and OpenMP are used in the same program?   Unless they are built on the same underlying strand mechanism I can't help but believe that results are unpredictable and unsatisfactory.
>  
>  
> Blaine
>  
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/cplex/attachments/20130620/a6012cfa/attachment.html 


More information about the Cplex mailing list