[Cplex] Parallel Manager and More

Stephen Michell stephen.michell at maurya.on.ca
Thu Jun 20 20:32:38 CEST 2013


This took a while to respond because I in Europe at stds meetings with sometimes marginal email access.

I want to address Daryl's comments about parallelism manager, etc.

First a qualification: I would like us to be careful not to call it the "Ada model". Right now it is a model that the 3 of us (Miguel, Brad and myself) have 
presented unofficially to Ada Europe and more officially to the 16th International Real Time Ada Workshop (IRTAW 16). It certainly generated some interest, 
but we also took some cannon fire across our bows.  I'll discuss some of the issues later.

We submitted the Ada Europe paper end November, and our thinking evolved after that first paper. In particular, Parallelism Manager and Worker Task (thread 
in C/C++/POSIX) Pool Manager. The job of the Parallelism Manager is to manage the application of the concurrency for each Parallelism OPportunity (POP). 
The Parallelism Manager has no Tasks to schedule on his own. This is very much along the lines of Darryl's separate parallelism managers.

The Worker Task Pool Manager manages a single collection of Tasks that will be the worker tasks that execute the Strands for as many POPs as need the 
service. There can be a single Worker Task Pool Manager or there can be more than one: the Tasks in a managed pool of Tasks can be created dynamically as 
needed, can be fixed, or can be one per cpu.

The point of the above is that there can be multiple pools of tasks. This is particularly useful when preemption and priorities are involved. If a Task reaches a 
POP and invokes the Parallelism Manager associated with that POP, it needs worker tasks at the same priority. One way to do this is to change the priority of 
each Worker as it collecting work. Another way is to have pools of workers at each priority.

Someone complained about the single queue. In our model, there will be 1 queue for each pool of ready worker tasks. There is no efficiency issue because it 
truly is FIFO, 

I have a few things to say about blocking. Originally in our model, we followed the CILK plus model and were permitting Strands to block. One shot across our 
bow was that IRTAW 16 abhors blocking in such cases.We note, however, that some parallel algorithms do not work without blocking (for instance when you 
do a parallel->serial->parallel and it is important that the 2 parallel pieces work on the same data). Our conclusion is that Strand blocking is one of those 
issues that needs a configuration setting.

Note that we do not actually say that the workers will be tasks, or how they are scheduled, unless the programmer takes charge and binds his own parallelism 
manager and worker task pool manager(s) to the program. Until then the compiler and runtime is free to do as necessary.

Hope that helped.

...stephen

> 
> Message: 2
> 
> Date: Wed, 19 Jun 2013 14:11:56 -0700
> 
> From: Darryl Gove darryl
> .gove at oracle.com>
> Subject: Re: [Cplex] Cplex: suggested topics for discussion on the nextteleconf.
> 
> Cc: "'chandler
> c at google.com'" chandler
> c at google.com>,	Artur Laksberg
> 
> Artur.Laksberg at microsoft.com>,	Jeffrey Yasskin jyasskin@
> google.com>,
> "cplex at open
> -std.org" cplex at open
> -std.org>,	Niklas Gustafsson
> Niklas.Gustafsson at microsoft.com>
> Message-ID: 5
> 1C21E9C.8090801 at oracle.com>
> Content-Type: text/plain; charset=windows-1252; format=flowed
> 
> Hi,
> 
> This is an interesting discussion. I'd like to try and capture what I 
> think the key points are, pulling in some of the earlier discussion on 
> the alias.
> 
> We have some general parallelisation concepts extracted from Cilk and 
> OpenMP which are tasks, parallel for, parallel region, reductions, etc.
> In OpenMP we have a set of what might be called scheduling or execution 
> directives which have no direct equivalent in Cilk.
> 
> In Cilk we have composability because everything resolves down to tasks, 
> and the tasks can be placed on a "single" queue and therefore it doesn't 
> matter who produced the tasks because they all end up on the same queue. 
> In OpenMP we have to manage composability through nested parallelism. 
> This gives us more control over which threads perform a task, or where 
> that task is executed, but it makes it difficult if you have nested 
> parallelism from combined applications and libraries from different 
> sources - the developer needs to more carefully manage the nesting.
> 
> The recent discussions on this alias have talked about "schedulers", and 
> the Ada paper talked about "parallelism manager". I've not seen a 
> definitive definition, so I'm mapping them onto what amounts to a thread 
> pool plus some kind of "how do I schedule the work" manager (which 
> kind-of looks a bit like a beefed up OpenMP schedule directive).
> 
> Conceptually I think we can do the following.
> 
> We have a parallelism manager which has a pool of threads. Each thread 
> could be bound to a particular hardware thread or locality group. The 
> parallelism manager also handles how a new task is handled, and which 
> task is picked next for execution.
> 
> 
> 
> A parallel program has a default manager which has a single pool of 
> threads - which would give Cilk-like behaviour. If we encounter a 
> parallel region, or a parallel-for, the generated tasks are assigned to 
> the default manager.
> 
> However, we can also create a new manager, give it some threads, set up 
> a scheduler, and then use that manager in a delineated region. This 
> could enable us to provide the same degree of control as nested 
> parallelism provides in OpenMP.
> 
> For example:
> 
> 
> 
> parallel-for(...) {...} // would use the current manager.
> 
> 
> 
> Or
> 
> 
> 
> p_manager_t pman = my_new_manager();
> 
> p_manager_t_ old_pman = _Use_manager(pman);
> 
> parallel_for(...) {...} // would use a new manager for this loop
> 
> _Use_manager(old_pman);
> 
> [Note: I'm not proposing this as syntax or API, just trying out the 
> 
> concept.]
> 
> 
> 
> If the above doesn't seem too outlandish, then I think we can separate 
> the "parallelism manager" from the parallelism keywords. So we should be 
> able to put together a separate proposal for the "manager".
> 
> This is good because the starting point proposal that Robert and I 
> provided was based on existing Cilk/OpenMP functionality. This "manager" 
> concept is less nailed down, so would presumably take a bit more
> refinement.
> 
> One of the other comments was about how this works on a system-wide 
> level, where multiple applications are competing for resources. That is 
> a concern of mine as well. But reflecting on that issue this morning, 
> it's not that dissimilar to the current situation. We will certainly be 
> making it easier to develop applications that request multiple threads, 
> but the instance of Thunderbird that I'm currently running has 98 
> threads. I rely on the OS to mediate, or I can potentially partition the 
> system to appropriately allocate resources. Hence I'm not convinced that 
> we need to prioritise solving this in the general case, and potentially 
> it becomes a separate "proposal" that works with the "manager" proposal.
> 
> 
> 
> Regards,
> 
> 
> 
> Darryl.
> 
> 
> 
> 
> 
> 
> 
> On 06/19/13 07:57, Herb Sutter wrote:
> 
> > /[adding 4 folks to the To: line who are working on
> parallel
> > ?executors?/schedulers in WG21/SG1, but I?m not sure if
> they?re on this
> > list ? they may have relevant comments about the
> possibility of
> > standardizing a cross-language scheduling layer at the
> C level]/
> >
> 
> > Tom wrote:
> 
> >
> 
> > The way Herb put his arguments brings up an interesting
> point. During
> > the call there was a great deal of discussion with
> regards to the scope
> > of our mandate, and whether this should be a language
> independent or
> > C-only proposal, maybe both are true.
> 
> >
> 
> > Specifically I'm referring to the statement "IMO we
> cannot live with a
> > long-term requirement that applications use multiple
> schedulers."
> >
> 
> > (BTW, I?m sorry that I missed the call ? I just got
> busy and forgot. :(
> > Mea culpa.)
> 
> >
> 
> > the idea of designing what we need in two components, a
> scheduler API
> > and a language extension, seems to solve several of the
> problems that
> > have been nagging at me.
> 
> >
> 
> > Yes! And starting with the lower one.
> 
> >
> 
> > As Hans wrote:
> 
> >
> 
> > I fully agree with the need for a common parallel
> runtime across
> > languages. One could go even further and ask for a
> common runtime across
> > applications, allowing applications to adapt their
> degree of parallelism
> > to the current system load.
> 
> >
> 
> > Yes. IMO the three problems we?re solving in the
> mainstream industry, in
> > necessary order, are: (A) Make it possible to even
> express parallelism
> > reliably. We?ve all been working on enabling that, and
> we?re partway
> > there. Only once (A) is in place do you get to the
> second-order
> > problems, which arise only after people can and do
> express parallelism:
> > (B1) Make it possible for multiple libraries in the
> same application to
> > use parallelism internally without
> conflicting/oversubscribing/etc. =
> > common intra-app scheduler. (B2) Ditto across multiple
> applications,
> > driving the scheduling into the OS (or equivalent
> 
> > inter-app/intra-machine scheduler).
> 
> >
> 
> > It would be immensely valuable to standardize (B).
> 
> >
> 
> > These (A) and (B) map pretty directly to Han?s (1) and
> (2) below:
> >
> 
> > I like the distinction between language and runtime
> system. It will
> > probably prove crucial.
> 
> >
> 
> > [?]
> 
> >
> 
> > 1) Try to provide one common parallel language
> extension that reconciles
> > all of these contradictory requirements. This sounds to
> me like redo-ing
> > OpenMP but then 10 times harder because C/C++ has a
> much wider scope
> > than OpenMP. I think this is what the Geva/Gove
> proposal attempts to do.
> >
> 
> > 2) Pitch a common parallel language extension at a
> lower level
> > (preferably more abstracted and better integrated with
> the language than
> > pthreads) and invite various libraries that extend this
> language and use
> > it to implement various behaviours and provide various
> guarantees that
> > meet different user requirements. The added value of
> the language
> > extension would be that these libraries would be able
> to co-exist by
> > definition of the language, and by using the common
> runtime.
> >
> 
> > I think 2) needs to be solved before taking on 1).
> 
> >
> 
> > Fully agree! This would be immensely valuable ? and,
> better still,
> > language-neutral. But that?s what C is for, providing
> the lingua franca.
> >
> 
> > I also fear that if 1) is attempted then the end result
> would not be
> > accepted as the single correct way forward by the
> community at large.
> > The result would be too much of a comprise, and clearly
> just one way of
> > doing things, that it would not push alternative
> libraries (TBB, Qt,
> > Boost.Threads, ?) out of the market.
> 
> >
> 
> > Yes. I would love to see us undertake to first do (2)
> and enable
> > different forms of (1) as library and language
> extensions, then see if
> > we can standardize (1). As someone noted, there is work
> on (2) being
> > done in WG21/SG1 right now with Google?s (and
> collaborators?)
> > ?executors? proposal. Should I see if those folks can
> join this group if
> > they aren?t on it already? (CC?ing three of the people
> on that effort.)
> >
> 
> > Thanks,
> 
> >
> 
> > Herb
> 
> >
> 
> > *From:*cplex-bounces at open-std.org [cp
> lex-bounces at open-std.org]
> > *On Behalf Of *Hans Vandierendonck
> 
> > *Sent:* Wednesday, June 19, 2013 5:11 AM
> 
> > *To:* Tom Scogland
> 
> > *Cc:* cplex at open
> -std.org
> > *Subject:* Re: [Cplex] Cplex: suggested topics for
> discussion on the
> > next teleconf.
> 
> >
> 
> > My two cents:
> 
> >
> 
> > I fully agree with the need for a common parallel
> runtime across
> > languages. One could go even further and ask for a
> common runtime across
> > applications, allowing applications to adapt their
> degree of parallelism
> > to the current system load.
> 
> >
> 
> > I like the distinction between language and runtime
> system. It will
> > probably prove crucial.
> 
> >
> 
> > I can imagine that a language extension for parallelism
> would be used to
> > build libraries or interfaces that simplify or
> streamline the expression
> > of parallelism for a particular group of programmers or
> projects. Such
> > libraries already exist and serve different needs,
> think TBB, Qthreads
> > (Sandia), Qt, Boost.Threads, ... At the moment these
> are typically
> > implemented on top of pthreads. Would it be the goal of
> cplex to provide
> > a language definition that subsumes all of these
> efforts, or is the goal
> > to provide a language definition that can be used as a
> building block
> > for such libraries and that is a better abstraction
> than the pthreads
> > library in a number of ways?
> 
> >
> 
> > The high-level question here is: What is the void that
> this language
> > extension needs to fill?
> 
> >
> 
> > There seems to be a need for different "user
> requirements" of parallel
> > language extensions. Several opposing views have been
> expressed: a
> > light-touch expression of parallelism vs. full control
> over the
> > execution of parallel threads and data placement. Also
> task-orientation
> > (typically Cilk) versus thread-orientation (OpenMP
> allows both task- and
> > thread-orientation). I think that these distinct
> requirements are also
> > reflected in the existence of libraries that provide
> different
> > abstractions of parallelism. Roughly speaking there are
> two distinct
> > approaches to work around opposing requirements:
> 
> >
> 
> > 1) Try to provide one common parallel language
> extension that reconciles
> > all of these contradictory requirements. This sounds to
> me like redo-ing
> > OpenMP but then 10 times harder because C/C++ has a
> much wider scope
> > than OpenMP. I think this is what the Geva/Gove
> proposal attempts to do.
> >
> 
> > 2) Pitch a common parallel language extension at a
> lower level
> > (preferably more abstracted and better integrated with
> the language than
> > pthreads) and invite various libraries that extend this
> language and use
> > it to implement various behaviours and provide various
> guarantees that
> > meet different user requirements. The added value of
> the language
> > extension would be that these libraries would be able
> to co-exist by
> > definition of the language, and by using the common
> runtime.
> >
> 
> > I think 2) needs to be solved before taking on 1). I
> also fear that if
> > 1) is attempted then the end result would not be
> accepted as the single
> > correct way forward by the community at large. The
> result would be too
> > much of a comprise, and clearly just one way of doing
> things, that it
> > would not push alternative libraries (TBB, Qt,
> Boost.Threads, ?) out of
> > the market. I may have a limited view of what a
> standard is, but if
> > there is room for alternatives to the standard, then
> perhaps the
> > standard is not good enough.
> 
> >
> 
> > Working out 2) requires solving some hard problems that
> actually also
> > need to be solved in the case of 1), but in the case of
> 1) it is easy to
> > wipe them under the carpet and consider them part of
> the implementation
> > of the runtime system. One example is composition of
> parallel regions.
> > The challenge here would not be so much the functional
> correctness of
> > composition, which should be automatic given the common
> language and
> > runtime, but the performance aspect of composition.
> Important questions
> > are how to assign a number of threads to a parallel
> region in a way that
> > can adapt to the dynamic context in which the parallel
> region is called.
> > This may involve taking away threads from a parallel
> region, or
> > assigning new threads to the region while it executes.
> In my opinion, a
> > parallel language should define an API to control
> sharing of threads
> > between parallel regions. An application may perhaps
> choose to ignore
> > this (I am thinking about HPC here where programmers
> like full control)
> > but in other cases programmers would prefer to leave
> such issues to the
> > system. In any case, it is important to define how such
> mechanisms may
> > operate.
> 
> >
> 
> > I would also like to raise the attention to determinism
> of programs,
> > which has not been mentioned on this list before.
> Determinism states
> > that any parallel execution will produce a functional
> result that is
> > equivalent. It is probably impossible to guarantee
> determinism when
> > providing a low-level view on parallel tasks or
> threads, however I
> > believe it would be useful to define exactly the
> conditions under which
> > a program would exhibit deterministic behaviour. This
> could include the
> > set of controls on the runtime system that may be used
> without
> > sacrificing determinism, the type of reduction
> variables that may be
> > used, which concurrent thread interactions are allowed,
> etc.
> >
> 
> > Hans.
> 
> >
> 
> > On 19 Jun 2013, at 09:43, Tom Scogland tom at scogland
> .com
> > tom at scogland
> .com>> wrote:
> >
> 
> >
> 
> >
> 
> >     The way Herb put his arguments brings up an
> interesting point.
> >     During the call there was a great deal of
> discussion with regards to
> >     the scope of our mandate, and whether this should
> be a language
> >     independent or C-only proposal, maybe both are
> true.
> >
> 
> >     Specifically I'm referring to the statement "IMO we
> cannot live with
> >     a long-term requirement that applications use
> multiple schedulers."
> >
> 
> >     I agree with that statement, and would further
> argue that it applies
> >     across more disparate languages than just C and
> C++. It does not say
> >     anything about the actual parallel extension or
> specification
> >     however, just the runtime system. The current
> merged proposal
> >     explores extensions for the expression of
> parallelism which are
> >     completely dependent on a runtime system to run
> efficiently, or at
> >     all in a concurrent context. Even so, following
> from its Cilk roots,
> >     the proposal does not specify anything about that
> runtime system
> >     beyond that it will not violate the guarantees of
> the language level
> >     constructs.
> 
> >
> 
> >     If the interface of the runtime scheduler is
> specified such that it
> >     can be language independent, with a common design
> and layout for
> >     tasks or ranges of tasks, their corresponding data,
> dependencies and
> >     scheduler controls, that should be sufficient to
> allow for
> >     interoperability. Note that I said the runtime
> scheduler, by which I
> >     mean concurrency manager and task scheduler, not
> parallel language
> >     extension.
> 
> >
> 
> >     Then a language specific syntax can be developed
> for C, C++, Ada, or
> >     any other language that could submit tasks. Perhaps
> they could even
> >     be used to implement alternative runtime schedulers
> as well. In the
> >     end, this gives us a C specific extension for
> parallelism that could
> >     be composed with similar systems in other
> languages, libraries and
> >     whatever else.
> 
> >
> 
> >     Clearly this is just my thought process, but the
> idea of designing
> >     what we need in two components, a scheduler API and
> a language
> >     extension, seems to solve several of the problems
> that have been
> >     nagging at me. I think it would also provide a nice
> separation
> >     between the parallel specification and
> tuning/concurrency control as
> >     Clark suggested. The default could be to only use
> the parallel
> >     language extension, simply using whatever scheduler
> is the
> >     language/compiler/standard library default,
> allowing the system to
> >     do whatever it wants. But, if the user is so
> inclined, an
> >     alternative scheduler or control points on the
> default one could be
> >     tuned through an additional interface.
> 
> >
> 
> >     --
> 
> >     -Tom Scogland
> 
> >
> 
> >     http://tom.scogland.com http://tom.scogland.com/>
> >     "A little knowledge is a dangerous thing.
> 
> >     So is a lot."
> 
> >     -Albert Einstein
> 
> >
> 
> >     _______________________________________________
> 
> >     Cplex mailing list
> 
> >     Cplex at open-std.org Cplex at open
> -std.org>
> >     http://www.open-std.org/mailman/listinfo/cplex
> >
> 
> > --
> 
> > Hans Vandierendonck
> 
> > PhD, Lecturer (a UK lecturer is equivalent to a US
> assistant professor)
> > High Performance and Distributed Computing
> 
> > School of Electronics, Electrical Engineering and
> Computer Science
> > Queen's University Belfast
> 
> >
> 
> > Address:
> 
> > Bernard Crossland Building
> 
> > 18 Malone Road
> 
> > Belfast
> 
> > BT9 5BN
> 
> >
> 
> > http://www.cs.qub.ac.uk/~H.Vandierendonck/
> >
> 
> >
> 
> >
> 
> > _______________________________________________
> 
> > Cplex mailing list
> 
> > Cplex at open
> -std.org
> > http://www.open-std.org/mailman/listinfo/cplex
> 
> 
> -- 
> 
> Darryl Gove
> 
> Phone: +408 276 7421
> 
> Blog : http://blogs.oracle.com/d/
> Books: http://my.safaribooksonline.com/9780321711441
> http://my.safaribooksonline.com/9780768681390
> http://my.safaribooksonline.com/0595352510
> 
> 
> 
> 
> ------------------------------
> 
> 
> 
> _______________________________________________
> 
> Cplex mailing list
> 
> Cplex at open
> -std.org
> http://www.open-std.org/mailman/listinfo/cplex
> 
> 
> 
> 
> End of Cplex Digest, Vol 2, Issue 19
> 
> ************************************
> 
> 
> 
> 



More information about the Cplex mailing list