[Cplex] Integrating OpenMP and Cilk into C++

Pablo Halpern lwg at halpernwightsoftware.com
Mon Jun 24 22:58:22 CEST 2013


Thank you, Herb.  You said it better than I could.

I'll emphasize one more point: we should not assume that parallelism is 
built on top of threads.  The agent that executes a piece of code in 
parallel with other agents could be a CPU, a hardware thread, a SIMD 
lane, or a GPU thread.  Some of these *might* be represented as threads 
at the OS level, but they might not be.  Even if the parallelism runtime 
is built on top of threads, there might not be a straight-forward 
mapping to threads that the user can take advantage of.

 > In the current proposal I see no way for even an expert programmer to
 > produce a C11 threads, or pthreads etc., library to compose sensibly
 > with the extension.

As far as a *language extension* is concerned, the only composition that 
is needed (and both Cilk and, I believe, OpenMP already have this 
quality) is that if you execute a parallel computation from within a 
thread, then the computation is fully nested within that thread.

That's not to say that there shouldn't be a tuning API that does give 
you access to the underlying agents, but such an API, even if 
standardized, is not part of the language extension, per se..  I think 
we can make progress on the language extension and the tuning API 
somewhat independently.

-Pablo

On 06/23/2013 12:14 PM, Herb Sutter wrote:
>  > I actually agree with you on this point in that expressing
> parallelism is a
>
>  > simpler problem to which we have a couple of reasonable solutions on
>
>  > the table already.  The big issue is that I don't think we can reasonably
>
>  > release such an extension unless it can interoperate sensibly with
> the low
>
>  > level alternatives, especially threading at the existing C11 level.
>
> Agreed. However, the following isn’t quite what’s needed:
>
>  > In the current proposal I see no way for even an expert programmer to
>
>  > produce a C11 threads, or pthreads etc., library to compose sensibly
>
>  > with the extension.  There isn't even a way for such a library to query
>
>  > the number of threads in use by the system.  For that matter, given a
>
>  > threaded application that wants to call a library using the new
> extension,
>
>  > how is it meant to convey the amount of parallelism for the library
> to use,
>
>  > let alone give it existing thread resources to run on?
>
> The problem with querying “# HW threads” is that it’s too low-level, and
> too lumpy.
>
> We definitely don’t want users to have to specify this, and mainstream
> parallel libraries have generally surfaced this knob only as a stopgap
> when they couldn’t tune well enough on their own and are now getting
> away from this.
>
> But I think neither do we want libraries to have to specify this,
> because it doesn’t load balance well. The width of a computation can and
> should be able to change during the computation. For example, at the
> point where I want to begin a parallel computation, N parallel resources
> may be available, but that doesn’t mean I should take N – if during my
> computation M others end, I want to scale up to N+M, and if during my
> computation M others try to begin, I don’t want to starve them but want
> to share and scale down to, say, N/(N+M).
>
> (It also doesn’t interact well with hypervisors or with systems where
> cores can be turned on and off to conserve power, where in both cases
> the HW resources can grow or shrink during execution.)
>
> And Tom then alludes to the right granularity, which is *enqueueing tasks*:
>
>  > This group needs a runtime interface to rely upon so that it can specify
>
>  > a standard ABI for enqueueing tasks etc. and the scheduling interface
>
>  > would serve that purpose if developed.
>
> This enqueueing tasks is exactly the right model, so that the
> point-in-time workload can be dynamically scheduled across the hardware.
>
> Herb
>
> *From:*cplex-bounces at open-std.org [mailto:cplex-bounces at open-std.org]
> *On Behalf Of *Tom Scogland
> *Sent:* Saturday, June 22, 2013 10:25 PM
> *To:* Nelson, Clark
> *Cc:* cplex at open-std.org
> *Subject:* Re: [Cplex] Integrating OpenMP and Cilk into C++
>
> On Sat, Jun 22, 2013 at 12:10 AM, Nelson, Clark <clark.nelson at intel.com
> <mailto:clark.nelson at intel.com>> wrote:
>
>      > In my opinion, if we do not specify a scheduler interface as part
>     of this
>      > effort, we will have done nothing worth doing.  Neither Cilk nor
>     OpenMP are
>      > useful without a runtime managing concurrency, scheduling work
>     and (in some
>      > fashion) allowing the user to control concurrency.  Either one
>     can be used
>      > without one, but then all that's left is an overly verbose serial
>     program
>      > (well, with better than average SIMD usage, but I digress).  I am
>     personally
>      > not interested in specifying a parallel language extension with
>     no standard
>      > way to control its behavior.
>
>     I'm sure you're right -- from your perspective. I absolutely believe
>     that you,
>     personally, would benefit in no way.
>
>     However, I'm pretty confident that there are quite a few less
>     sophisticated
>     programmers in the world who would benefit considerably from having an
>     easier way to write a parallel program than by using pthreads, and
>     an easier
>     way of writing a scalable, composable parallel program than by using
>     OpenMP;
>     and they would benefit even more if that way were in some way standard.
>
>     BTW, when you talk about a "standard way to control its behavior",
>     do you mean
>     "control" in any broader sense than would be covered by "tune"?
>
> "​Tune​" to me implies that it is an alteration to increase performance
> by some metric without affecting correctness.  For example, changing the
> number of threads available to the program as a whole, requesting a
> specific minimum chunk size for a loop, limiting a section of code to a
> specific number of threads, specifying an alternative scheduling scheme,
> etc. all of these could reasonably be called "tuning".  In that sense,
> tuning is sufficient, but I have a feeling that is not the meaning you
> had in mind.
>
>
>      > In that sense I believe we need to specify *something* as a standard
>      > scheduler interface. The question at hand is not necessarily
>     whether we
>      > should attempt to make *a single scheduler* which is perfect for all
>      > occasions, a nigh impossible task on the best of days, but rather
>     a standard
>      > interface and mechanism for composing schedulers within an
>     application,
>      > probably a standard library interface of some sort.  In that
>     fashion we
>      > allow users and runtime implementers to define schedulers which
>     fit their
>      > needs, but still incorporate into the standard system.
>
>     I wholeheartedly support this idea. But I feel it's a separate and
>     deeper
>     topic (and in a lot of ways more interesting :-) than the simple
>     ability to
>     express what I'll call opportunistic parallelism, as in a program
>     that can
>     take advantage of more than one processor, which can still be useful
>     even if
>     it isn't tuned to within an inch of its life.
>
>     Please note that I *didn't* say that this separate and deeper topic
>     should
>     be given lower priority than the other.
>
> ​I actually agree with you on this point in that expressing parallelism
> is a  simpler problem to which we have a couple of reasonable solutions
> on the table already.  The big issue is that I don't think we can
> reasonably release such an extension unless it can interoperate sensibly
> with the low level alternatives, especially threading at the existing
> C11 level.
>
> In the current proposal I see no way for even an expert programmer to
> produce a C11 threads, or pthreads etc., library to compose sensibly
> with the extension.  There isn't even a way for such a library to query
> the number of threads in use by the system.  For that matter, given a
> threaded application that wants to call a library using the new
> extension, how is it meant to convey the amount of parallelism for the
> library to use, let alone give it existing thread resources to run on?
>
> I have no issue with providing a sensible default for when a user does
> not care, but there has to be a way for library writers and experts to
> be able to interface with the new runtime intelligently, and without
> being forced to reimplement everything in terms of the new extension.
>   At a minimum that means offering hooks to set properties such as
> number of threads, binding of threads, and query those same values.
>   Given that interface, we could at least introduce this without
> clobbering everyone else, this is acceptable but sub-optimal territory
> as far as I'm concerned.
>
> Perhaps that's what we should seek to do in this group, and create
> another group to look into the issue of composing schedulers, although
> I'm not sure what it would necessarily accomplish.  This group needs a
> runtime interface to rely upon so that it can specify a standard ABI for
> enqueueing tasks etc. and the scheduling interface would serve that
> purpose if developed.
>
> For that matter, I'm not sure how much more effort would really be
> required.  There is already an interface defined for passing tasks to a
> scheduler, and another (even if its presently hidden) for specifying the
> resources that scheduler is allowed to use.  To use a well established
> example, in OpenMP parallelism is managed in a hierarchy, or a scope if
> you will.  If you run a work-sharing loop in a parallel region with 8
> threads, it will spread across exactly those 8 threads, even if at an
> outer parallel region there are 80 available.  Effectively each parallel
> region is a scheduling scope.  Would it be so bad to allow the
> specification of a, potentially user-defined, scheduler and resources
> (number of threads etc.) for each region?  When I say scheduler here, I
> mean something like the "executor" discussed in another thread on this
> list.  At the present moment, I can't think of any issue I have that
> could not be solved by some combination or implementation of those options.
>
>>
> --
> -Tom Scogland
>
> http://tom.scogland.com
> "A little knowledge is a dangerous thing.
>   So is a lot."
> -Albert Einstein
>
>
>
> _______________________________________________
> Cplex mailing list
> Cplex at open-std.org
> http://www.open-std.org/mailman/listinfo/cplex
>



More information about the Cplex mailing list