[Cplex] Cplex: suggested topics for discussion on the next teleconf.

Bronis R. de Supinski bronis at llnl.gov
Thu Jun 20 05:50:30 CEST 2013


Darryl:

The concept that people like from Cilk is a concern
for real programs, particularly as even individual
node scales increase in parallelism. The requirement
of a single queue -- even just conceptually -- is a
classic scalability bottleneck that anyone who has
worked in large scale computing should recognize.
I would expect it to limit performance significantly
as we see nodes with 100+ threads become common. I
must point out that this issue is not a minor one
since the reason to use parallelism is typically
performance, so limiting it in your parallelism
model pretty much misses the point.

The concept would almost certainly be a huge performance
problem if you tried to extend to multiple applications,
which will create more complexity for the single queue
to manage.

Bronis


On Wed, 19 Jun 2013, Darryl Gove wrote:

> Hi,
>
> This is an interesting discussion. I'd like to try and capture what I
> think the key points are, pulling in some of the earlier discussion on
> the alias.
>
> We have some general parallelisation concepts extracted from Cilk and
> OpenMP which are tasks, parallel for, parallel region, reductions, etc.
>
> In OpenMP we have a set of what might be called scheduling or execution
> directives which have no direct equivalent in Cilk.
>
> In Cilk we have composability because everything resolves down to tasks,
> and the tasks can be placed on a "single" queue and therefore it doesn't
> matter who produced the tasks because they all end up on the same queue.
> In OpenMP we have to manage composability through nested parallelism.
> This gives us more control over which threads perform a task, or where
> that task is executed, but it makes it difficult if you have nested
> parallelism from combined applications and libraries from different
> sources - the developer needs to more carefully manage the nesting.
>
> The recent discussions on this alias have talked about "schedulers", and
> the Ada paper talked about "parallelism manager". I've not seen a
> definitive definition, so I'm mapping them onto what amounts to a thread
> pool plus some kind of "how do I schedule the work" manager (which
> kind-of looks a bit like a beefed up OpenMP schedule directive).
>
> Conceptually I think we can do the following.
>
> We have a parallelism manager which has a pool of threads. Each thread
> could be bound to a particular hardware thread or locality group. The
> parallelism manager also handles how a new task is handled, and which
> task is picked next for execution.
>
> A parallel program has a default manager which has a single pool of
> threads - which would give Cilk-like behaviour. If we encounter a
> parallel region, or a parallel-for, the generated tasks are assigned to
> the default manager.
>
> However, we can also create a new manager, give it some threads, set up
> a scheduler, and then use that manager in a delineated region. This
> could enable us to provide the same degree of control as nested
> parallelism provides in OpenMP.
>
> For example:
>
> parallel-for(...) {...} // would use the current manager.
>
> Or
>
> p_manager_t pman = my_new_manager();
>
> p_manager_t_ old_pman = _Use_manager(pman);
>
> parallel_for(...) {...} // would use a new manager for this loop
>
> _Use_manager(old_pman);
>
> [Note: I'm not proposing this as syntax or API, just trying out the
> concept.]
>
> If the above doesn't seem too outlandish, then I think we can separate
> the "parallelism manager" from the parallelism keywords. So we should be
> able to put together a separate proposal for the "manager".
>
> This is good because the starting point proposal that Robert and I
> provided was based on existing Cilk/OpenMP functionality. This "manager"
> concept is less nailed down, so would presumably take a bit more refinement.
>
> One of the other comments was about how this works on a system-wide
> level, where multiple applications are competing for resources. That is
> a concern of mine as well. But reflecting on that issue this morning,
> it's not that dissimilar to the current situation. We will certainly be
> making it easier to develop applications that request multiple threads,
> but the instance of Thunderbird that I'm currently running has 98
> threads. I rely on the OS to mediate, or I can potentially partition the
> system to appropriately allocate resources. Hence I'm not convinced that
> we need to prioritise solving this in the general case, and potentially
> it becomes a separate "proposal" that works with the "manager" proposal.
>
> Regards,
>
> Darryl.
>
>
>
> On 06/19/13 07:57, Herb Sutter wrote:
>> /[adding 4 folks to the To: line who are working on parallel
>> “executors”/schedulers in WG21/SG1, but I’m not sure if they’re on this
>> list – they may have relevant comments about the possibility of
>> standardizing a cross-language scheduling layer at the C level]/
>>
>> Tom wrote:
>>
>> The way Herb put his arguments brings up an interesting point. During
>> the call there was a great deal of discussion with regards to the scope
>> of our mandate, and whether this should be a language independent or
>> C-only proposal, maybe both are true.
>>
>> Specifically I'm referring to the statement "IMO we cannot live with a
>> long-term requirement that applications use multiple schedulers."
>>
>> (BTW, I’m sorry that I missed the call – I just got busy and forgot. :(
>> Mea culpa.)
>>
>> the idea of designing what we need in two components, a scheduler API
>> and a language extension, seems to solve several of the problems that
>> have been nagging at me.
>>
>> Yes! And starting with the lower one.
>>
>> As Hans wrote:
>>
>> I fully agree with the need for a common parallel runtime across
>> languages. One could go even further and ask for a common runtime across
>> applications, allowing applications to adapt their degree of parallelism
>> to the current system load.
>>
>> Yes. IMO the three problems we’re solving in the mainstream industry, in
>> necessary order, are: (A) Make it possible to even express parallelism
>> reliably. We’ve all been working on enabling that, and we’re partway
>> there. Only once (A) is in place do you get to the second-order
>> problems, which arise only after people can and do express parallelism:
>> (B1) Make it possible for multiple libraries in the same application to
>> use parallelism internally without conflicting/oversubscribing/etc. =
>> common intra-app scheduler. (B2) Ditto across multiple applications,
>> driving the scheduling into the OS (or equivalent
>> inter-app/intra-machine scheduler).
>>
>> It would be immensely valuable to standardize (B).
>>
>> These (A) and (B) map pretty directly to Han’s (1) and (2) below:
>>
>> I like the distinction between language and runtime system. It will
>> probably prove crucial.
>>
>> […]
>>
>> 1) Try to provide one common parallel language extension that reconciles
>> all of these contradictory requirements. This sounds to me like redo-ing
>> OpenMP but then 10 times harder because C/C++ has a much wider scope
>> than OpenMP. I think this is what the Geva/Gove proposal attempts to do.
>>
>> 2) Pitch a common parallel language extension at a lower level
>> (preferably more abstracted and better integrated with the language than
>> pthreads) and invite various libraries that extend this language and use
>> it to implement various behaviours and provide various guarantees that
>> meet different user requirements. The added value of the language
>> extension would be that these libraries would be able to co-exist by
>> definition of the language, and by using the common runtime.
>>
>> I think 2) needs to be solved before taking on 1).
>>
>> Fully agree! This would be immensely valuable – and, better still,
>> language-neutral. But that’s what C is for, providing the lingua franca.
>>
>> I also fear that if 1) is attempted then the end result would not be
>> accepted as the single correct way forward by the community at large.
>> The result would be too much of a comprise, and clearly just one way of
>> doing things, that it would not push alternative libraries (TBB, Qt,
>> Boost.Threads, …) out of the market.
>>
>> Yes. I would love to see us undertake to first do (2) and enable
>> different forms of (1) as library and language extensions, then see if
>> we can standardize (1). As someone noted, there is work on (2) being
>> done in WG21/SG1 right now with Google’s (and collaborators’)
>> “executors” proposal. Should I see if those folks can join this group if
>> they aren’t on it already? (CC’ing three of the people on that effort.)
>>
>> Thanks,
>>
>> Herb
>>
>> *From:*cplex-bounces at open-std.org [mailto:cplex-bounces at open-std.org]
>> *On Behalf Of *Hans Vandierendonck
>> *Sent:* Wednesday, June 19, 2013 5:11 AM
>> *To:* Tom Scogland
>> *Cc:* cplex at open-std.org
>> *Subject:* Re: [Cplex] Cplex: suggested topics for discussion on the
>> next teleconf.
>>
>> My two cents:
>>
>> I fully agree with the need for a common parallel runtime across
>> languages. One could go even further and ask for a common runtime across
>> applications, allowing applications to adapt their degree of parallelism
>> to the current system load.
>>
>> I like the distinction between language and runtime system. It will
>> probably prove crucial.
>>
>> I can imagine that a language extension for parallelism would be used to
>> build libraries or interfaces that simplify or streamline the expression
>> of parallelism for a particular group of programmers or projects. Such
>> libraries already exist and serve different needs, think TBB, Qthreads
>> (Sandia), Qt, Boost.Threads, ... At the moment these are typically
>> implemented on top of pthreads. Would it be the goal of cplex to provide
>> a language definition that subsumes all of these efforts, or is the goal
>> to provide a language definition that can be used as a building block
>> for such libraries and that is a better abstraction than the pthreads
>> library in a number of ways?
>>
>> The high-level question here is: What is the void that this language
>> extension needs to fill?
>>
>> There seems to be a need for different "user requirements" of parallel
>> language extensions. Several opposing views have been expressed: a
>> light-touch expression of parallelism vs. full control over the
>> execution of parallel threads and data placement. Also task-orientation
>> (typically Cilk) versus thread-orientation (OpenMP allows both task- and
>> thread-orientation). I think that these distinct requirements are also
>> reflected in the existence of libraries that provide different
>> abstractions of parallelism. Roughly speaking there are two distinct
>> approaches to work around opposing requirements:
>>
>> 1) Try to provide one common parallel language extension that reconciles
>> all of these contradictory requirements. This sounds to me like redo-ing
>> OpenMP but then 10 times harder because C/C++ has a much wider scope
>> than OpenMP. I think this is what the Geva/Gove proposal attempts to do.
>>
>> 2) Pitch a common parallel language extension at a lower level
>> (preferably more abstracted and better integrated with the language than
>> pthreads) and invite various libraries that extend this language and use
>> it to implement various behaviours and provide various guarantees that
>> meet different user requirements. The added value of the language
>> extension would be that these libraries would be able to co-exist by
>> definition of the language, and by using the common runtime.
>>
>> I think 2) needs to be solved before taking on 1). I also fear that if
>> 1) is attempted then the end result would not be accepted as the single
>> correct way forward by the community at large. The result would be too
>> much of a comprise, and clearly just one way of doing things, that it
>> would not push alternative libraries (TBB, Qt, Boost.Threads, …) out of
>> the market. I may have a limited view of what a standard is, but if
>> there is room for alternatives to the standard, then perhaps the
>> standard is not good enough.
>>
>> Working out 2) requires solving some hard problems that actually also
>> need to be solved in the case of 1), but in the case of 1) it is easy to
>> wipe them under the carpet and consider them part of the implementation
>> of the runtime system. One example is composition of parallel regions.
>> The challenge here would not be so much the functional correctness of
>> composition, which should be automatic given the common language and
>> runtime, but the performance aspect of composition. Important questions
>> are how to assign a number of threads to a parallel region in a way that
>> can adapt to the dynamic context in which the parallel region is called.
>> This may involve taking away threads from a parallel region, or
>> assigning new threads to the region while it executes. In my opinion, a
>> parallel language should define an API to control sharing of threads
>> between parallel regions. An application may perhaps choose to ignore
>> this (I am thinking about HPC here where programmers like full control)
>> but in other cases programmers would prefer to leave such issues to the
>> system. In any case, it is important to define how such mechanisms may
>> operate.
>>
>> I would also like to raise the attention to determinism of programs,
>> which has not been mentioned on this list before. Determinism states
>> that any parallel execution will produce a functional result that is
>> equivalent. It is probably impossible to guarantee determinism when
>> providing a low-level view on parallel tasks or threads, however I
>> believe it would be useful to define exactly the conditions under which
>> a program would exhibit deterministic behaviour. This could include the
>> set of controls on the runtime system that may be used without
>> sacrificing determinism, the type of reduction variables that may be
>> used, which concurrent thread interactions are allowed, etc.
>>
>> Hans.
>>
>> On 19 Jun 2013, at 09:43, Tom Scogland <tom at scogland.com
>> <mailto:tom at scogland.com>> wrote:
>>
>>
>>
>>     The way Herb put his arguments brings up an interesting point.
>>     During the call there was a great deal of discussion with regards to
>>     the scope of our mandate, and whether this should be a language
>>     independent or C-only proposal, maybe both are true.
>>
>>     Specifically I'm referring to the statement "IMO we cannot live with
>>     a long-term requirement that applications use multiple schedulers."
>>
>>     I agree with that statement, and would further argue that it applies
>>     across more disparate languages than just C and C++. It does not say
>>     anything about the actual parallel extension or specification
>>     however, just the runtime system. The current merged proposal
>>     explores extensions for the expression of parallelism which are
>>     completely dependent on a runtime system to run efficiently, or at
>>     all in a concurrent context. Even so, following from its Cilk roots,
>>     the proposal does not specify anything about that runtime system
>>     beyond that it will not violate the guarantees of the language level
>>     constructs.
>>
>>     If the interface of the runtime scheduler is specified such that it
>>     can be language independent, with a common design and layout for
>>     tasks or ranges of tasks, their corresponding data, dependencies and
>>     scheduler controls, that should be sufficient to allow for
>>     interoperability. Note that I said the runtime scheduler, by which I
>>     mean concurrency manager and task scheduler, not parallel language
>>     extension.
>>
>>     Then a language specific syntax can be developed for C, C++, Ada, or
>>     any other language that could submit tasks. Perhaps they could even
>>     be used to implement alternative runtime schedulers as well. In the
>>     end, this gives us a C specific extension for parallelism that could
>>     be composed with similar systems in other languages, libraries and
>>     whatever else.
>>
>>     Clearly this is just my thought process, but the idea of designing
>>     what we need in two components, a scheduler API and a language
>>     extension, seems to solve several of the problems that have been
>>     nagging at me. I think it would also provide a nice separation
>>     between the parallel specification and tuning/concurrency control as
>>     Clark suggested. The default could be to only use the parallel
>>     language extension, simply using whatever scheduler is the
>>     language/compiler/standard library default, allowing the system to
>>     do whatever it wants. But, if the user is so inclined, an
>>     alternative scheduler or control points on the default one could be
>>     tuned through an additional interface.
>>
>>     --
>>     -Tom Scogland
>>
>>     http://tom.scogland.com <http://tom.scogland.com/>
>>     "A little knowledge is a dangerous thing.
>>     So is a lot."
>>     -Albert Einstein
>>
>>     _______________________________________________
>>     Cplex mailing list
>>     Cplex at open-std.org <mailto:Cplex at open-std.org>
>>     http://www.open-std.org/mailman/listinfo/cplex
>>
>> --
>> Hans Vandierendonck
>> PhD, Lecturer (a UK lecturer is equivalent to a US assistant professor)
>> High Performance and Distributed Computing
>> School of Electronics, Electrical Engineering and Computer Science
>> Queen's University Belfast
>>
>> Address:
>> Bernard Crossland Building
>> 18 Malone Road
>> Belfast
>> BT9 5BN
>>
>> http://www.cs.qub.ac.uk/~H.Vandierendonck/
>>
>>
>>
>> _______________________________________________
>> Cplex mailing list
>> Cplex at open-std.org
>> http://www.open-std.org/mailman/listinfo/cplex
>
> -- 
> Darryl Gove
> Phone: +408 276 7421
> Blog : http://blogs.oracle.com/d/
> Books: http://my.safaribooksonline.com/9780321711441
>        http://my.safaribooksonline.com/9780768681390
>        http://my.safaribooksonline.com/0595352510
> _______________________________________________
> Cplex mailing list
> Cplex at open-std.org
> http://www.open-std.org/mailman/listinfo/cplex
>


More information about the Cplex mailing list