[Cplex] Integrating OpenMP and Cilk into C++

Pablo Halpern lwg at halpernwightsoftware.com
Fri Jun 21 21:45:25 CEST 2013

On 06/20/2013 09:10 PM, Bronis R. de Supinski wrote:
> Pablo:
> Re:
>> This study group is a subgroup of the C standards committee.  Since Cilk
>> Plus is not a standard, it makes sense for this group to be considering
>> which parts of Cilk Plus should be part of the C standard.
> That sounds rather parochial. I thought this group was
> considering how to add higher level parallelism to C.

I don't see how what I said is in any way in conflict with that.  Cilk 
Plus is one way of adding higher-level parallelism to C.

I was responding to Jay's question about why we would want to merge 
concepts from Cilk Plus and OpenMP into C when they already exist 
outside of C.  My point was simply that the best way to get 
interoperability and (eventually) ubiquitous adoption is to standardize 
those parts that we want to be generally available.

>> OpenMP IS a standard, but it is not part of the C standard. As you point
>> out, it does not interoperate with Cilk Plus.  In fact, parts of OpenMP
>> do not compose well with some other parts of OpenMP.
> Exceuse me? Could you be specific? I disagree with you
> vehemently. The fact that OpenMP has a far more extensive
> user history and implementation base makes it a better'
> candidate for adoption.

There is no need to get defensive.  I am not attacking OpenMP.  However, 
it is well known that widely-used OpenMP features, particularly static 
scheduling, do not compose well with libraries that also use 
parallelism. If the author of a piece of code does not know whether that 
code might be called in a parallel context, then he cannot use 
parallelism without risking exponential oversubscription.  (I have seen 
this happen, often). If run on a desktop or mobile system rather than a 
dedicated HPC system, static scheduling creates load imbalances that 
hurt performance.

The subset of OpenMP that uses dynamic scheduling is well-suited for 
general-purpose computing and it is that subset that we are advocating 
folding into the C standard.  As Darryl and Robert's paper shows, this 
part of OpenMP is quite similar to Cilk.

BTW, both Cilk and OpenMP have about 15 years of history behind them, so 
both are better candidates for standardization than any newly-invented 
parallelism mechanisms that we might come up with on the fly.

>> It is not part of the charter of the C committee to dictate how runtime
>> libraries are implemented and whether two parallelism systems with
>> different origins should share a common thread pool.  All of that is a
>> Quality of Implementation (QUI) issue.
> What was proposed is not a quality of implementation
> issue, it is an issue of semantics for high-level
> parallel constructs.

We're in full agreement here.  Again, I was responding to Jay's question 
about why we are considering adding these features to the C language 
instead of leaving them as separate standards (or, in the case of Cilk 
Plus, a non-standard technology).

The QOI issue is that of combining standard and non-standard parallelism 
models, such as Cilk and OpenMP.  A vendor is welcome to use a 
harmonized scheduler for both, but since neither is part of the C 
standard, the C standard would have nothing to say about that.  We could 
make it our business to say something about it, but I don't think that's 
a good idea.


> Bronis
>> It makes sense for this study group to take the best ideas from Cilk
>> Plus and OpenMP and move them into the C standard as a unified thing.
>> Once that's done, the implementers can figure out how to make them play
>> nice with their implementations of Cilk Plus and OpenMP.
>> BTW, Cilk Plus is NOT a threading library.  The runtime may use OS
>> threads, the concepts are parallelism, not concurrency.  The sooner we
>> stop thinking in terms of threads, the sooner we can really learn to
>> write composable and scalable parallel programs.
>> -Pablo
>> On 06/20/2013 10:19 AM, Hoeflinger, Jay P wrote:
>>> Thanks for clarifying.  I think my argument applies to creating
>>> yet-another-threading-model within C just as much as it applies to
>>> making one within C++.  By the way, I am speaking for myself, not Intel
>>> Corporation.
>>> Jay
>>> *From:*John Benito [mailto:benito at bluepilot.com]
>>> *Sent:* Wednesday, June 19, 2013 6:14 PM
>>> *To:* Hoeflinger, Jay P
>>> *Cc:* cplex at open-std.org
>>> *Subject:* Re: [Cplex] Integrating OpenMP and Cilk into C++
>>> This study group is a WG 14 (C) study group, not a WG 21 (C++) study
>>> group.
>>> /John BenitoISO/IEC JTC 1/SC 22/WG 14 - Convener///
>>> On Jun 19, 2013, at 4:04 PM, "Hoeflinger, Jay P"
>>> <jay.p.hoeflinger at intel.com <mailto:jay.p.hoeflinger at intel.com>> wrote:
>>> I'm struggling with what it means to merge OpenMP and Cilk into C++.
>>>   OpenMP and Cilk already exist outside of C++ and can both be used
>>> today in a C++ program.  Pulling syntax or semantics of OpenMP and Cilk
>>> into C++ for some vendors would just mean a large effort to move code
>>> from one part of their compiler and/or runtime to another, with the
>>> result being no more than we already have today.  And today, the
>>> complexity is less because the implementations are partitioned.
>>> The thing that we don't have today, that makes this something useful to
>>> discuss, is some way of making the OpenMP, Cilk, and C++ threading
>>> models work together.  I think that should be the focus of this effort.
>>>   I have lobbied within OpenMP for better interoperability with other
>>> threading models, but nothing has come of that yet.  Perhaps *this*
>>> effort can allow that to start happening.
>>> The interoperability problem comes down to knowing how many threads to
>>> use for an OpenMP parallel region, and how to manage thread usage with
>>> Cilk.  For that, the scheduler needs to predict future thread usage.
>>>   One way to do that is to give the programmer some set of API routines
>>> to give the schedulers hints.  Other API routines could be used to query
>>> the current overall threading state, to allow programmers to adjust
>>> their parallelism accordingly.
>>> So, I say we should keep OpenMP and Cilk separate: to allow C++ to be as
>>> agile as possible, reduce complexity, and allow OpenMP and Cilk to
>>> continue changing in their own organic ways, but add ways that allow
>>> them to work together with C++ threads and each other.
>>> Jay
>>> -----Original Message-----
>>> From: cplex-bounces at open-std.org <mailto:cplex-bounces at open-std.org>
>>> [mailto:cplex-bounces at open-std.org <mailto:bounces at open-std.org>] On
>>> Behalf Of cplex-request at open-std.org <mailto:cplex-request at open-std.org>
>>> Sent: Wednesday, June 19, 2013 4:12 PM
>>> To: cplex at open-std.org <mailto:cplex at open-std.org>
>>> Subject: Cplex Digest, Vol 2, Issue 19
>>> Send Cplex mailing list submissions to
>>> cplex at open-std.org <mailto:cplex at open-std.org>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>> http://www.open-std.org/mailman/listinfo/cplex
>>> or, via email, send a message with subject or body 'help' to
>>> cplex-request at open-std.org <mailto:cplex-request at open-std.org>
>>> You can reach the person managing the list at
>>> cplex-owner at open-std.org <mailto:cplex-owner at open-std.org>
>>> When replying, please edit your Subject line so it is more specific than
>>> "Re: Contents of Cplex digest..."
>>> Today's Topics:
>>>    1. Re: Cplex: suggested topics for discussion on the next
>>>       teleconf. (Jeffrey Yasskin)
>>>    2. Re: Cplex: suggested topics for discussion on thenext
>>>       teleconf. (Darryl Gove)
>>> ----------------------------------------------------------------------
>>> Message: 1
>>> Date: Wed, 19 Jun 2013 10:43:04 -0700
>>> From: Jeffrey Yasskin <jyasskin at google.com <mailto:jyasskin at google.com>>
>>> Subject: Re: [Cplex] Cplex: suggested topics for discussion on the
>>> nextteleconf.
>>> To: Herb Sutter <hsutter at microsoft.com <mailto:hsutter at microsoft.com>>
>>> Cc: Artur Laksberg <Artur.Laksberg at microsoft.com
>>> <mailto:Artur.Laksberg at microsoft.com>>,
>>> "chandlerc at google.com <mailto:chandlerc at google.com>"
>>> <chandlerc at google.com <mailto:chandlerc at google.com>>,Niklas Gustafsson
>>> <Niklas.Gustafsson at microsoft.com
>>> <mailto:Niklas.Gustafsson at microsoft.com>>,"cplex at open-std.org
>>> <mailto:cplex at open-std.org>"
>>> <cplex at open-std.org <mailto:cplex at open-std.org>>
>>> Message-ID:
>>> <CANh-dX=9tNk+mnpxz3WCoNHH=JLmkeJJLn1C_0+kc28EPC9Pjw at mail.gmail.com
>>> <mailto:CANh-dX=9tNk+mnpxz3WCoNHH=JLmkeJJLn1C_0+kc28EPC9Pjw at mail.gmail.com>>
>>> Content-Type: text/plain; charset="windows-1252"
>>> On Wed, Jun 19, 2013 at 7:57 AM, Herb Sutter <hsutter at microsoft.com
>>> <mailto:hsutter at microsoft.com>> wrote:
>>> *[adding 4 folks to the To: line who are working on parallel
>>> ?executors?/schedulers in WG21/SG1, but I?m not sure if they?re on
>>> this list ? they may have relevant comments about the possibility of
>>> standardizing a cross-language scheduling layer at the C level]*
>>> **
>>> 2 thoughts:
>>> (0: your email's quoting is so confusing.)
>>> As Hans wrote:****
>>> I fully agree with the need for a common parallel runtime across
>>> languages. One could go even further and ask for a common runtime
>>> across applications, allowing applications to adapt their degree of
>>> parallelism to the current system load.****
>>> Yes. IMO the three problems we?re solving in the mainstream industry,
>>> in necessary order, are: (A) Make it possible to even express
>>> parallelism reliably. We?ve all been working on enabling that, and we?re
>>> partway there.
>>> Only once (A) is in place do you get to the second-order problems,
>>> which arise only after people can and do express parallelism: (B1)
>>> Make it possible for multiple libraries in the same application to use
>>> parallelism internally without conflicting/oversubscribing/etc. =
>>> common intra-app scheduler. (B2) Ditto across multiple applications,
>>> driving the scheduling into the OS (or equivalent
>>> inter-app/intra-machine scheduler).****
>>> ** **
>>> It would be immensely valuable to standardize (B).****
>>> **
>>> Establishing a common scheduling runtime is quite hard, since you have
>>> to cover both CPU-bound tasks that should be limited to 1-per-core, and
>>> IO-bound tasks that need many more scheduled at once. What existing
>>> examples of this do we have to learn from? Google's internal attempt has
>>> not been very successful, IMO. Are people happy with Microsoft's system
>>> of TaskCreationOptions passed to the global TaskScheduler? Are people
>>> happy with Grand Central Dispatch's global queue options? By "happy
>>> with", I mean, do they use these to control all concurrency in their
>>> systems, or do many of them create other threads manually?
>>> Yes. I would love to see us undertake to first do (2) and enable
>>> different forms of (1) as library and language extensions, then see if
>>> we can standardize (1). As someone noted, there is work on (2) being
>>> done in
>>> WG21/SG1 right now with Google?s (and collaborators?) ?executors?
>>> proposal.
>>> Should I see if those folks can join this group if they aren?t on it
>>> already? (CC?ing three of the people on that effort.)****
>>> **
>>> Google's "executors" get a lot of mileage by assuming that users with
>>> different constraints can instantiate different executors. That
>>> assumption conflicts with the idea that we'll have one scheduling
>>> library to coordinate across multiple processes. If we get a shared
>>> scheduling library, we'd likely wrap it in a set of executors, but it
>>> shouldn't itself be an executor: it needs too many options.
>>> HTH,
>>> Jeffrey
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL:
>>> http://www.open-std.org/pipermail/cplex/attachments/20130619/c3a95149/attachment-0001.html
>>> ------------------------------
>>> Message: 2
>>> Date: Wed, 19 Jun 2013 14:11:56 -0700
>>> From: Darryl Gove <darryl.gove at oracle.com
>>> <mailto:darryl.gove at oracle.com>>
>>> Subject: Re: [Cplex] Cplex: suggested topics for discussion on the
>>> nextteleconf.
>>> Cc: "'chandlerc at google.com'" <chandlerc at google.com
>>> <mailto:chandlerc at google.com>>,Artur Laksberg
>>> <Artur.Laksberg at microsoft.com
>>> <mailto:Artur.Laksberg at microsoft.com>>,Jeffrey Yasskin
>>> <jyasskin at google.com <mailto:jyasskin at google.com>>,
>>> "cplex at open-std.org <mailto:cplex at open-std.org>" <cplex at open-std.org
>>> <mailto:cplex at open-std.org>>,Niklas Gustafsson
>>> <Niklas.Gustafsson at microsoft.com
>>> <mailto:Niklas.Gustafsson at microsoft.com>>
>>> Message-ID: <51C21E9C.8090801 at oracle.com
>>> <mailto:51C21E9C.8090801 at oracle.com>>
>>> Content-Type: text/plain; charset=windows-1252; format=flowed
>>> Hi,
>>> This is an interesting discussion. I'd like to try and capture what I
>>> think the key points are, pulling in some of the earlier discussion on
>>> the alias.
>>> We have some general parallelisation concepts extracted from Cilk and
>>> OpenMP which are tasks, parallel for, parallel region, reductions, etc.
>>> In OpenMP we have a set of what might be called scheduling or execution
>>> directives which have no direct equivalent in Cilk.
>>> In Cilk we have composability because everything resolves down to tasks,
>>> and the tasks can be placed on a "single" queue and therefore it doesn't
>>> matter who produced the tasks because they all end up on the same queue.
>>> In OpenMP we have to manage composability through nested parallelism.
>>> This gives us more control over which threads perform a task, or where
>>> that task is executed, but it makes it difficult if you have nested
>>> parallelism from combined applications and libraries from different
>>> sources - the developer needs to more carefully manage the nesting.
>>> The recent discussions on this alias have talked about "schedulers", and
>>> the Ada paper talked about "parallelism manager". I've not seen a
>>> definitive definition, so I'm mapping them onto what amounts to a thread
>>> pool plus some kind of "how do I schedule the work" manager (which
>>> kind-of looks a bit like a beefed up OpenMP schedule directive).
>>> Conceptually I think we can do the following.
>>> We have a parallelism manager which has a pool of threads. Each thread
>>> could be bound to a particular hardware thread or locality group. The
>>> parallelism manager also handles how a new task is handled, and which
>>> task is picked next for execution.
>>> A parallel program has a default manager which has a single pool of
>>> threads - which would give Cilk-like behaviour. If we encounter a
>>> parallel region, or a parallel-for, the generated tasks are assigned to
>>> the default manager.
>>> However, we can also create a new manager, give it some threads, set up
>>> a scheduler, and then use that manager in a delineated region. This
>>> could enable us to provide the same degree of control as nested
>>> parallelism provides in OpenMP.
>>> For example:
>>> parallel-for(...) {...} // would use the current manager.
>>> Or
>>> p_manager_t pman = my_new_manager();
>>> p_manager_t_ old_pman = _Use_manager(pman);
>>> parallel_for(...) {...} // would use a new manager for this loop
>>> _Use_manager(old_pman);
>>> [Note: I'm not proposing this as syntax or API, just trying out the
>>> concept.]
>>> If the above doesn't seem too outlandish, then I think we can separate
>>> the "parallelism manager" from the parallelism keywords. So we should be
>>> able to put together a separate proposal for the "manager".
>>> This is good because the starting point proposal that Robert and I
>>> provided was based on existing Cilk/OpenMP functionality. This "manager"
>>> concept is less nailed down, so would presumably take a bit more
>>> refinement.
>>> One of the other comments was about how this works on a system-wide
>>> level, where multiple applications are competing for resources. That is
>>> a concern of mine as well. But reflecting on that issue this morning,
>>> it's not that dissimilar to the current situation. We will certainly be
>>> making it easier to develop applications that request multiple threads,
>>> but the instance of Thunderbird that I'm currently running has 98
>>> threads. I rely on the OS to mediate, or I can potentially partition the
>>> system to appropriately allocate resources. Hence I'm not convinced that
>>> we need to prioritise solving this in the general case, and potentially
>>> it becomes a separate "proposal" that works with the "manager" proposal.
>>> Regards,
>>> Darryl.
>>> _______________________________________________
>>> Cplex mailing list
>>> Cplex at open-std.org <mailto:Cplex at open-std.org>
>>> http://www.open-std.org/mailman/listinfo/cplex
>>> _______________________________________________
>>> Cplex mailing list
>>> Cplex at open-std.org
>>> http://www.open-std.org/mailman/listinfo/cplex
>> _______________________________________________
>> Cplex mailing list
>> Cplex at open-std.org
>> http://www.open-std.org/mailman/listinfo/cplex

More information about the Cplex mailing list