[Cplex] Strands teaser

Robison, Arch arch.robison at intel.com
Thu Jun 20 23:17:03 CEST 2013

>From first glance, I believe that a Cilk run-time might be implemented on top of your strands abstraction, since we have an abstraction in the run-time that serves a similar purpose.  [Note: Cilk literature uses "strand" to mean something very different.]  A Cilk runtime cannot be implemented efficiently using FIFO queues.  It uses per-thread deques that are implemented using atomic reads, atomic writes, and mutual exclusion.  C11 gives us atomic reads and atomic writes, so if strands support exclusion and some kind of coroutine-like switching (block current strand; resume another specified strand), I think that's enough to implement a Cilk runtime.

Heidi Pan's thesis is the best work to date that I know of on the issue of composing OpenMP/Cilk/TBB and other parallel libraries.  See http://lithe.eecs.berkeley.edu/pubs/pan-phd-thesis.pdf  It employs a hierarchical scheme along the lines of what you suggest.

When OpenMP code is called from code written using any other threading package, the results are predictably unsatisfactory, because OpenMP requires (by tradition of paying customers) that it start up a new pool of P-1 threads for each top-level parallel construct entered by a non-OpenMP thread.  For example, if P threads from Cilk [or TBB or PPL] call OpenMP, up to P*(P-1) threads may be running.  The other way around is not so bad: OpenMP calling Cilk results in at most 2P-1 threads running  since Cilk shares a common pool of workers.

- Arch

From: cplex-bounces at open-std.org [mailto:cplex-bounces at open-std.org] On Behalf Of Blaine Garst
Sent: Thursday, June 20, 2013 12:15 PM
To: cplex at open-std.org
Cc: Carl Hewitt
Subject: [Cplex] Strands teaser

Over the past several years I have been working, with Carl Hewitt, on a "strand" library built around the notion of provably minimal scheduling operations.  I was alluding to this during the conference call.

The idea is that a strand consists of a stack, register save area, and a few minimal accounting hooks.  The basic primitives for contention arbitration use standard compare-and-swap and look, at a very high level, like mutex and condition variables, but operate directly on the strand rather than a separate mutex variable.  These strands cooperate with the heavier-weight pthreads as necessary but otherwise replace them for computation.

Venturing out of strand code into general code that calls OS primitives requires a bit of stack fixup that the compiler can arrange (a few instructions).  The key observation here is that threads have been an operating system invention to simulate multi-core and are now mostly overhead.

The scheduling primitives (enter exclusion, leave exclusion, defer to queue, resume queue) are used to implement many different policies: we have two forms of reader-writer (which has priority), a low-level gate (blocks until set), etc.  My belief is that these primitives are sufficient to model CSP, Go, Erlang, and probably several other systems (Linda?), and (the question in this forum) the lower level interactions that go on in the Cilk and OpenMP runtime libraries.  Having implemented the primitives, I can attest to their utter simplicity and minimalism, and would love to compare notes with the mechanisms in place for Cilk and OpenMP.  I suspect that OpenMP will require use of real-time scheduling policies for the underlying pthreads, which is straight-forward, which may well be of general interest.

A key issue with strands is the mechanisms to provide adaptive scheduling.  The mechanism we see is a top-down hierarchy of scheduling "objects" we call Sponsors.   The highest level Sponsor deals with pthread-pool operating system primitives found on Mac OS X (FreeBSD), and *should* be able to work with Microsoft thread pools as well (not sure about the underlying OS support there).   It subdivides computing resources with its tree of sub-Sponsors, each of which only knows about itself, its parent, and its children.  Work stealing across sponsors is a natural tree operation.

Whereas Carl and I are using these strand primitives to implement Actors, we see them as a general purpose mechanism and have been intending to offer them as such to the C committee for several months now.

Until I get that paper written, I have a question to ask: what happens when Cilk and OpenMP are used in the same program?   Unless they are built on the same underlying strand mechanism I can't help but believe that results are unpredictable and unsatisfactory.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/cplex/attachments/20130620/54da0e5f/attachment.html 

More information about the Cplex mailing list