[Cplex] Comments on Michell et al. papaer
darryl.gove at oracle.com
Tue Jun 18 07:07:07 CEST 2013
I didn't notice the "Straw proposal for the development of support for
language level parallelism that includes other languages" paper until
just before this mornings meeting. I skimmed it quickly, but I've now
had a bit longer to examine it. I did not see an announcement or earlier
discussion of it, so I apologise if my summary and comments are redundant.
URL for paper:
- Seems that there is significant commonality between the Cilk/OpenMP
proposal and the Ada proposal. This is reassuring, it suggests that we
are identifying some "generic" concepts.
- Ada has an interesting idea of a Manager that controls aspects of the
parallel regionisation strategy.
My detailed comments:
1.2 a) Development of applications with mixed language parallelism. This
is a concern. There are potentially immediate issues with mixing OpenMP,
Cilk, and language level. There are obviously issues with adding
multiple languages into this. Taking OpenMP, this is solved by having
all languages use the same framework, and that is an option available to
us. The critical point here is that there needs to be a mapping from the
language(s) onto the framework. In the Cilk/OpenMP proposal we suggest
that there may be separate requirements for documents on the interaction
between, say, OpenMP and the language. This could be extended to include
other languages (like Ada), or it could be possible for Ada to describe
its commonality with (say) OpenMP.
1.2 b) If there are multiple parallel regions within an application then
it is possible to oversubscribe the machine if the regions all request
large numbers of threads. This is a concern, which in our proposal is
the motivation for breaking parallel_for into tasks - rather than using
the nested parallelisim approach of OpenMP. This would set a bound on
the total number of threads, and not have threads scale with the number
of parallel regions. There's a concurrency concern here about how many
threads should an application request when it is sharing the machine
with other applications (ie how do multiple apps play nicely together).
1.2 The paper recommends that SC22 should define an overarching
parallelisation model. My personal concern would be that we could end up
trying to solve the unsolvable, or potentially spend a long time
defining all the possible interactions for a "generic" parallelisation
model that at the end of the day is only used in "specific" situations
(ie we could end up defining unnecessary flexibility). That said, if
SC22 were to take it on, then anything we achieve here could feed into
2.1 The paper proposes Tasklette, Tasklet, Strand, or Fibre instead of
Task. [Task in Ada is analogous to Thread]. At the end of the day its
the concepts that are important, not the names (so long as the names do
not suck). I would point out that Strands and Fibres have been used to
describe Threads. I don't see a problem with using different "localised"
terminology in different languages - in fact, I suspect it is necessary.
2.3 They propose using a "parallel" keyword which could be applied to
for loops, or to delimit parallel regions. This is rather like the
"parallel" directive in OpenMP.
2.4 Agreement that exceptional conditions in parallel regions are an
area for study.
2.6 Ada has support for user-defined reduction operators. The paper
proposes, for performance reasons, that reduction operators are more
like the OpenMP reductions, which do not attempt to replicate serial
semantics, than the Cilk Hyper-objects.
2.7 The paper makes the point that for parallel applications, the
topology of the system is much more apparent than for serial
applications. The paper makes the point about memory locality, but it is
equally true of whether pipelines are shared or whether you get better
performance from one thread per core, or multiple threads per core.
OpenMP has added a locality directive (gather or scatter) that tries to
help the runtime identify the best way of distributing threads across a
system. I believe that it would be very helpful to provide standardised
ways of accessing topological or configuration information about the
system. I also suspect that there are currently very few people who
would write code that takes advantage of that information.
2.8 The paper expresses concern with the idea of simd-for, in particular
the mixing of parallelism and simd. This is certainly an area for
2.9 If I'm reading this correctly, Ada already has support for array
2.10 The paper proposes that there be a "chunk size" attribute which
specifies a minimum amount of work to place into a chunk. This seems
similar to the chunk size property in OpenMP scheduling directives. The
interesting question to me is whether an application developer will be
able to better specify the chunk size for their application on arbitrary
hardware than the developer of the runtime library and compiler for that
hardware. Perhaps I am too optimistic, but I would expect (or hope) that
the developer of the compiler and runtime library would be able to come
up with some very good heuristics for determining chunk size, and that
these heuristics would evolve with new platforms rather than being baked
into the application.
2.11/2.12 The paper proposes a "parallelism manager" which would be
responsible for scheduling etc. I think this is a very interesting idea.
- I don't see any major discrepancies with the OpenMP/Cilk proposal.
- The idea of using a parallelisation manager is worthy of further
More information about the Cplex