[Cplex] Comments on Michell et al. papaer

Darryl Gove darryl.gove at oracle.com
Tue Jun 18 07:07:07 CEST 2013


Hi,

I didn't notice the "Straw proposal for the development of support for 
language level parallelism that includes other languages" paper until 
just before this mornings meeting. I skimmed it quickly, but I've now 
had a bit longer to examine it. I did not see an announcement or earlier 
discussion of it, so I apologise if my summary and comments are redundant.

Darryl.


URL for paper:
http://wiki.edg.com/twiki/pub/CPLEX/MeetingJun172013/proposal-for-cplex-2013-06-15.pdf

My summary:

- Seems that there is significant commonality between the Cilk/OpenMP 
proposal and the Ada proposal. This is reassuring, it suggests that we 
are identifying some "generic" concepts.

- Ada has an interesting idea of a Manager that controls aspects of the 
parallel regionisation strategy.


My detailed comments:

1.2 a) Development of applications with mixed language parallelism. This 
is a concern. There are potentially immediate issues with mixing OpenMP, 
Cilk, and language level. There are obviously issues with adding 
multiple languages into this. Taking OpenMP, this is solved by having 
all languages use the same framework, and that is an option available to 
us. The critical point here is that there needs to be a mapping from the 
language(s) onto the framework. In the Cilk/OpenMP proposal we suggest 
that there may be separate requirements for documents on the interaction 
between, say, OpenMP and the language. This could be extended to include 
other languages (like Ada), or it could be possible for Ada to describe 
its commonality with (say) OpenMP.

1.2 b) If there are multiple parallel regions within an application then 
it is possible to oversubscribe the machine if the regions all request 
large numbers of threads. This is a concern, which in our proposal is 
the motivation for breaking parallel_for into tasks - rather than using 
the nested parallelisim approach of OpenMP. This would set a bound on 
the total number of threads, and not have threads scale with the number 
of parallel regions. There's a concurrency concern here about how many 
threads should an application request when it is sharing the machine 
with other applications (ie how do multiple apps play nicely together).

1.2 The paper recommends that SC22 should define an overarching 
parallelisation model. My personal concern would be that we could end up 
trying to solve the unsolvable, or potentially spend a long time 
defining all the possible interactions for a "generic" parallelisation 
model that at the end of the day is only used in "specific" situations 
(ie we could end up defining unnecessary flexibility). That said, if 
SC22 were to take it on, then anything we achieve here could feed into 
that model.

2.1 The paper proposes Tasklette, Tasklet, Strand, or Fibre instead of 
Task. [Task in Ada is analogous to Thread]. At the end of the day its 
the concepts that are important, not the names (so long as the names do 
not suck). I would point out that Strands and Fibres have been used to 
describe Threads. I don't see a problem with using different "localised" 
terminology in different languages - in fact, I suspect it is necessary.

2.3 They propose using a "parallel" keyword which could be applied to 
for loops, or to delimit parallel regions. This is rather like the 
"parallel" directive in OpenMP.

2.4 Agreement that exceptional conditions in parallel regions are an 
area for study.

2.6 Ada has support for user-defined reduction operators. The paper 
proposes, for performance reasons, that reduction operators are more 
like the OpenMP reductions, which do not attempt to replicate serial 
semantics, than the Cilk Hyper-objects.

2.7 The paper makes the point that for parallel applications, the 
topology of the system is much more apparent than for serial 
applications. The paper makes the point about memory locality, but it is 
equally true of whether pipelines are shared or whether you get better 
performance from one thread per core, or multiple threads per core. 
OpenMP has added a locality directive (gather or scatter) that tries to 
help the runtime identify the best way of distributing threads across a 
system. I believe that it would be very helpful to provide standardised 
ways of accessing topological or configuration information about the 
system. I also suspect that there are currently very few people who 
would write code that takes advantage of that information.

2.8 The paper expresses concern with the idea of simd-for, in particular 
the mixing of parallelism and simd. This is certainly an area for 
discussion.

2.9 If I'm reading this correctly, Ada already has support for array 
sections.

2.10 The paper proposes that there be a "chunk size" attribute which 
specifies a minimum amount of work to place into a chunk. This seems 
similar to the chunk size property in OpenMP scheduling directives. The 
interesting question to me is whether an application developer will be 
able to better specify the chunk size for their application on arbitrary 
hardware than the developer of the runtime library and compiler for that 
hardware. Perhaps I am too optimistic, but I would expect (or hope) that 
the developer of the compiler and runtime library would be able to come 
up with some very good heuristics for determining chunk size, and that 
these heuristics would evolve with new platforms rather than being baked 
into the application.

2.11/2.12 The paper proposes a "parallelism manager" which would be 
responsible for scheduling etc. I think this is a very interesting idea.

Conclusions:

- I don't see any major discrepancies with the OpenMP/Cilk proposal.

- The idea of using a parallelisation manager is worthy of further 
exploration.


More information about the Cplex mailing list