Implicit barrier vs nowait in case of two successive pragma omp for
Looking at the document here, the following construct is well defined:
#pragma omp parallel //Line 1
{
#pragma omp for nowait //Line 3
for (i=0; i<N; i++)
a[i] = // some expression
#pragma omp for //Line 6
for (i=0; i<N; i++)
b[i] = ...... a[i] ......
}
since
Here the nowait clause implies that threads can start on the second loop while other threads are still working on the first. Since the two loops use the same schedule here, an iteration that uses a[i] can indeed rely on it that that value has been computed.
I am having a tough time understanding why this would be. Suppose Line 3
were:
#pragma omp for
then, since there is an implicit barrier just before Line 6
, the next for loop will have values at all indices of a
fully computed. But, with the no wait
in Line 3
, how would it work?
Suppose, Line 1
triggers 4 threads, t1, t2, t3
and t4
. Suppose N
is 8 and the partition of indices in the first for loop is thus:
t1: 0, 4
t2: 1, 5
t3: 2, 6
t4: 3, 7
Suppose t1
completes indices 0
and 4
first and lands up at Line 6
What exactly happens now? How is it guaranteed that it now gets to operate on the same indices 0
and 4
, for which the a
values are correctly computed by it in the previous iteration? What if the second for
loop accesses a[i+1]
?
c++ openmp
add a comment |
Looking at the document here, the following construct is well defined:
#pragma omp parallel //Line 1
{
#pragma omp for nowait //Line 3
for (i=0; i<N; i++)
a[i] = // some expression
#pragma omp for //Line 6
for (i=0; i<N; i++)
b[i] = ...... a[i] ......
}
since
Here the nowait clause implies that threads can start on the second loop while other threads are still working on the first. Since the two loops use the same schedule here, an iteration that uses a[i] can indeed rely on it that that value has been computed.
I am having a tough time understanding why this would be. Suppose Line 3
were:
#pragma omp for
then, since there is an implicit barrier just before Line 6
, the next for loop will have values at all indices of a
fully computed. But, with the no wait
in Line 3
, how would it work?
Suppose, Line 1
triggers 4 threads, t1, t2, t3
and t4
. Suppose N
is 8 and the partition of indices in the first for loop is thus:
t1: 0, 4
t2: 1, 5
t3: 2, 6
t4: 3, 7
Suppose t1
completes indices 0
and 4
first and lands up at Line 6
What exactly happens now? How is it guaranteed that it now gets to operate on the same indices 0
and 4
, for which the a
values are correctly computed by it in the previous iteration? What if the second for
loop accesses a[i+1]
?
c++ openmp
add a comment |
Looking at the document here, the following construct is well defined:
#pragma omp parallel //Line 1
{
#pragma omp for nowait //Line 3
for (i=0; i<N; i++)
a[i] = // some expression
#pragma omp for //Line 6
for (i=0; i<N; i++)
b[i] = ...... a[i] ......
}
since
Here the nowait clause implies that threads can start on the second loop while other threads are still working on the first. Since the two loops use the same schedule here, an iteration that uses a[i] can indeed rely on it that that value has been computed.
I am having a tough time understanding why this would be. Suppose Line 3
were:
#pragma omp for
then, since there is an implicit barrier just before Line 6
, the next for loop will have values at all indices of a
fully computed. But, with the no wait
in Line 3
, how would it work?
Suppose, Line 1
triggers 4 threads, t1, t2, t3
and t4
. Suppose N
is 8 and the partition of indices in the first for loop is thus:
t1: 0, 4
t2: 1, 5
t3: 2, 6
t4: 3, 7
Suppose t1
completes indices 0
and 4
first and lands up at Line 6
What exactly happens now? How is it guaranteed that it now gets to operate on the same indices 0
and 4
, for which the a
values are correctly computed by it in the previous iteration? What if the second for
loop accesses a[i+1]
?
c++ openmp
Looking at the document here, the following construct is well defined:
#pragma omp parallel //Line 1
{
#pragma omp for nowait //Line 3
for (i=0; i<N; i++)
a[i] = // some expression
#pragma omp for //Line 6
for (i=0; i<N; i++)
b[i] = ...... a[i] ......
}
since
Here the nowait clause implies that threads can start on the second loop while other threads are still working on the first. Since the two loops use the same schedule here, an iteration that uses a[i] can indeed rely on it that that value has been computed.
I am having a tough time understanding why this would be. Suppose Line 3
were:
#pragma omp for
then, since there is an implicit barrier just before Line 6
, the next for loop will have values at all indices of a
fully computed. But, with the no wait
in Line 3
, how would it work?
Suppose, Line 1
triggers 4 threads, t1, t2, t3
and t4
. Suppose N
is 8 and the partition of indices in the first for loop is thus:
t1: 0, 4
t2: 1, 5
t3: 2, 6
t4: 3, 7
Suppose t1
completes indices 0
and 4
first and lands up at Line 6
What exactly happens now? How is it guaranteed that it now gets to operate on the same indices 0
and 4
, for which the a
values are correctly computed by it in the previous iteration? What if the second for
loop accesses a[i+1]
?
c++ openmp
c++ openmp
asked Nov 13 '18 at 8:46
TryerTryer
1,09311120
1,09311120
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
The material you quote is wrong. It becomes correct if you add schedule(static)
to both loops - this guarantees the same distribution of indices among threads for successive loops. The default schedule is implementation defined, you cannot assume it to be static
. To quote the standard:
Different loop regions with the same schedule and iteration count,
even if they occur in the same parallel region, can distribute
iterations among threads differently. The only exception is for the
static schedule as specified in Table 2.5. Programs that depend on
which thread executes a particular iteration under any other
circumstances are non-conforming.
If the second for loop accesses a[i+1]
you must absolutely leave the barrier there.
add a comment |
To me the statement that there is no potential problem in the example is wrong.
Indeed, scheduling will be the same as it is not explicitly defined. It will be the default one. Furthermore, if the scheduling was of static
type, then indeed, there wouldn't be any issue since the thread that would handle any given data in array a
inside the second loop would be the same as the one which would have written it in the first loop.
But the actual problem here is that the default scheduling is not defined by the OpenMP standard. This is implementation defined... For the (many) implementations where the default scheduling is static
, there cannot be any race condition in the snippet. But if the default scheduling is dynamic
, then, as you notice, a race condition can happen and the result is undefined.
what if the second loop was likefor(int i = 0, i < N/2; i++)
. How would the allocation be maintained consistently?
– Tryer
Nov 13 '18 at 9:04
3
It would not - both loops have to use the same iteration count and same static schedule.
– Zulan
Nov 13 '18 at 9:11
Exactly as @Zulan said.
– Gilles
Nov 13 '18 at 9:18
What about the following type of second loop:for(int i = 1; i < N+1; i++) b[i-1] = a[i-1]
vsfor(int i = 1; i < N+1; i++) b[i] = a[i]
; Both seem to satisfy the standard if I understand it correctly (iteration count isN
), so, how would the iterations be partitioned ?
– Tryer
Nov 13 '18 at 9:18
The fact that there is no race condition in your initial example withstatic
scheduling isn't just due to identical looping, it is also (as I mentioned) because this is the thread which wrote a data ina
that reads it subsequently, and it alone. Your second example doesn't follow this rule, and isn't safe.
– Gilles
Nov 13 '18 at 9:22
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53277030%2fimplicit-barrier-vs-nowait-in-case-of-two-successive-pragma-omp-for%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The material you quote is wrong. It becomes correct if you add schedule(static)
to both loops - this guarantees the same distribution of indices among threads for successive loops. The default schedule is implementation defined, you cannot assume it to be static
. To quote the standard:
Different loop regions with the same schedule and iteration count,
even if they occur in the same parallel region, can distribute
iterations among threads differently. The only exception is for the
static schedule as specified in Table 2.5. Programs that depend on
which thread executes a particular iteration under any other
circumstances are non-conforming.
If the second for loop accesses a[i+1]
you must absolutely leave the barrier there.
add a comment |
The material you quote is wrong. It becomes correct if you add schedule(static)
to both loops - this guarantees the same distribution of indices among threads for successive loops. The default schedule is implementation defined, you cannot assume it to be static
. To quote the standard:
Different loop regions with the same schedule and iteration count,
even if they occur in the same parallel region, can distribute
iterations among threads differently. The only exception is for the
static schedule as specified in Table 2.5. Programs that depend on
which thread executes a particular iteration under any other
circumstances are non-conforming.
If the second for loop accesses a[i+1]
you must absolutely leave the barrier there.
add a comment |
The material you quote is wrong. It becomes correct if you add schedule(static)
to both loops - this guarantees the same distribution of indices among threads for successive loops. The default schedule is implementation defined, you cannot assume it to be static
. To quote the standard:
Different loop regions with the same schedule and iteration count,
even if they occur in the same parallel region, can distribute
iterations among threads differently. The only exception is for the
static schedule as specified in Table 2.5. Programs that depend on
which thread executes a particular iteration under any other
circumstances are non-conforming.
If the second for loop accesses a[i+1]
you must absolutely leave the barrier there.
The material you quote is wrong. It becomes correct if you add schedule(static)
to both loops - this guarantees the same distribution of indices among threads for successive loops. The default schedule is implementation defined, you cannot assume it to be static
. To quote the standard:
Different loop regions with the same schedule and iteration count,
even if they occur in the same parallel region, can distribute
iterations among threads differently. The only exception is for the
static schedule as specified in Table 2.5. Programs that depend on
which thread executes a particular iteration under any other
circumstances are non-conforming.
If the second for loop accesses a[i+1]
you must absolutely leave the barrier there.
answered Nov 13 '18 at 8:57
ZulanZulan
15.3k63070
15.3k63070
add a comment |
add a comment |
To me the statement that there is no potential problem in the example is wrong.
Indeed, scheduling will be the same as it is not explicitly defined. It will be the default one. Furthermore, if the scheduling was of static
type, then indeed, there wouldn't be any issue since the thread that would handle any given data in array a
inside the second loop would be the same as the one which would have written it in the first loop.
But the actual problem here is that the default scheduling is not defined by the OpenMP standard. This is implementation defined... For the (many) implementations where the default scheduling is static
, there cannot be any race condition in the snippet. But if the default scheduling is dynamic
, then, as you notice, a race condition can happen and the result is undefined.
what if the second loop was likefor(int i = 0, i < N/2; i++)
. How would the allocation be maintained consistently?
– Tryer
Nov 13 '18 at 9:04
3
It would not - both loops have to use the same iteration count and same static schedule.
– Zulan
Nov 13 '18 at 9:11
Exactly as @Zulan said.
– Gilles
Nov 13 '18 at 9:18
What about the following type of second loop:for(int i = 1; i < N+1; i++) b[i-1] = a[i-1]
vsfor(int i = 1; i < N+1; i++) b[i] = a[i]
; Both seem to satisfy the standard if I understand it correctly (iteration count isN
), so, how would the iterations be partitioned ?
– Tryer
Nov 13 '18 at 9:18
The fact that there is no race condition in your initial example withstatic
scheduling isn't just due to identical looping, it is also (as I mentioned) because this is the thread which wrote a data ina
that reads it subsequently, and it alone. Your second example doesn't follow this rule, and isn't safe.
– Gilles
Nov 13 '18 at 9:22
|
show 1 more comment
To me the statement that there is no potential problem in the example is wrong.
Indeed, scheduling will be the same as it is not explicitly defined. It will be the default one. Furthermore, if the scheduling was of static
type, then indeed, there wouldn't be any issue since the thread that would handle any given data in array a
inside the second loop would be the same as the one which would have written it in the first loop.
But the actual problem here is that the default scheduling is not defined by the OpenMP standard. This is implementation defined... For the (many) implementations where the default scheduling is static
, there cannot be any race condition in the snippet. But if the default scheduling is dynamic
, then, as you notice, a race condition can happen and the result is undefined.
what if the second loop was likefor(int i = 0, i < N/2; i++)
. How would the allocation be maintained consistently?
– Tryer
Nov 13 '18 at 9:04
3
It would not - both loops have to use the same iteration count and same static schedule.
– Zulan
Nov 13 '18 at 9:11
Exactly as @Zulan said.
– Gilles
Nov 13 '18 at 9:18
What about the following type of second loop:for(int i = 1; i < N+1; i++) b[i-1] = a[i-1]
vsfor(int i = 1; i < N+1; i++) b[i] = a[i]
; Both seem to satisfy the standard if I understand it correctly (iteration count isN
), so, how would the iterations be partitioned ?
– Tryer
Nov 13 '18 at 9:18
The fact that there is no race condition in your initial example withstatic
scheduling isn't just due to identical looping, it is also (as I mentioned) because this is the thread which wrote a data ina
that reads it subsequently, and it alone. Your second example doesn't follow this rule, and isn't safe.
– Gilles
Nov 13 '18 at 9:22
|
show 1 more comment
To me the statement that there is no potential problem in the example is wrong.
Indeed, scheduling will be the same as it is not explicitly defined. It will be the default one. Furthermore, if the scheduling was of static
type, then indeed, there wouldn't be any issue since the thread that would handle any given data in array a
inside the second loop would be the same as the one which would have written it in the first loop.
But the actual problem here is that the default scheduling is not defined by the OpenMP standard. This is implementation defined... For the (many) implementations where the default scheduling is static
, there cannot be any race condition in the snippet. But if the default scheduling is dynamic
, then, as you notice, a race condition can happen and the result is undefined.
To me the statement that there is no potential problem in the example is wrong.
Indeed, scheduling will be the same as it is not explicitly defined. It will be the default one. Furthermore, if the scheduling was of static
type, then indeed, there wouldn't be any issue since the thread that would handle any given data in array a
inside the second loop would be the same as the one which would have written it in the first loop.
But the actual problem here is that the default scheduling is not defined by the OpenMP standard. This is implementation defined... For the (many) implementations where the default scheduling is static
, there cannot be any race condition in the snippet. But if the default scheduling is dynamic
, then, as you notice, a race condition can happen and the result is undefined.
answered Nov 13 '18 at 8:59
GillesGilles
6,59232340
6,59232340
what if the second loop was likefor(int i = 0, i < N/2; i++)
. How would the allocation be maintained consistently?
– Tryer
Nov 13 '18 at 9:04
3
It would not - both loops have to use the same iteration count and same static schedule.
– Zulan
Nov 13 '18 at 9:11
Exactly as @Zulan said.
– Gilles
Nov 13 '18 at 9:18
What about the following type of second loop:for(int i = 1; i < N+1; i++) b[i-1] = a[i-1]
vsfor(int i = 1; i < N+1; i++) b[i] = a[i]
; Both seem to satisfy the standard if I understand it correctly (iteration count isN
), so, how would the iterations be partitioned ?
– Tryer
Nov 13 '18 at 9:18
The fact that there is no race condition in your initial example withstatic
scheduling isn't just due to identical looping, it is also (as I mentioned) because this is the thread which wrote a data ina
that reads it subsequently, and it alone. Your second example doesn't follow this rule, and isn't safe.
– Gilles
Nov 13 '18 at 9:22
|
show 1 more comment
what if the second loop was likefor(int i = 0, i < N/2; i++)
. How would the allocation be maintained consistently?
– Tryer
Nov 13 '18 at 9:04
3
It would not - both loops have to use the same iteration count and same static schedule.
– Zulan
Nov 13 '18 at 9:11
Exactly as @Zulan said.
– Gilles
Nov 13 '18 at 9:18
What about the following type of second loop:for(int i = 1; i < N+1; i++) b[i-1] = a[i-1]
vsfor(int i = 1; i < N+1; i++) b[i] = a[i]
; Both seem to satisfy the standard if I understand it correctly (iteration count isN
), so, how would the iterations be partitioned ?
– Tryer
Nov 13 '18 at 9:18
The fact that there is no race condition in your initial example withstatic
scheduling isn't just due to identical looping, it is also (as I mentioned) because this is the thread which wrote a data ina
that reads it subsequently, and it alone. Your second example doesn't follow this rule, and isn't safe.
– Gilles
Nov 13 '18 at 9:22
what if the second loop was like
for(int i = 0, i < N/2; i++)
. How would the allocation be maintained consistently?– Tryer
Nov 13 '18 at 9:04
what if the second loop was like
for(int i = 0, i < N/2; i++)
. How would the allocation be maintained consistently?– Tryer
Nov 13 '18 at 9:04
3
3
It would not - both loops have to use the same iteration count and same static schedule.
– Zulan
Nov 13 '18 at 9:11
It would not - both loops have to use the same iteration count and same static schedule.
– Zulan
Nov 13 '18 at 9:11
Exactly as @Zulan said.
– Gilles
Nov 13 '18 at 9:18
Exactly as @Zulan said.
– Gilles
Nov 13 '18 at 9:18
What about the following type of second loop:
for(int i = 1; i < N+1; i++) b[i-1] = a[i-1]
vs for(int i = 1; i < N+1; i++) b[i] = a[i]
; Both seem to satisfy the standard if I understand it correctly (iteration count is N
), so, how would the iterations be partitioned ?– Tryer
Nov 13 '18 at 9:18
What about the following type of second loop:
for(int i = 1; i < N+1; i++) b[i-1] = a[i-1]
vs for(int i = 1; i < N+1; i++) b[i] = a[i]
; Both seem to satisfy the standard if I understand it correctly (iteration count is N
), so, how would the iterations be partitioned ?– Tryer
Nov 13 '18 at 9:18
The fact that there is no race condition in your initial example with
static
scheduling isn't just due to identical looping, it is also (as I mentioned) because this is the thread which wrote a data in a
that reads it subsequently, and it alone. Your second example doesn't follow this rule, and isn't safe.– Gilles
Nov 13 '18 at 9:22
The fact that there is no race condition in your initial example with
static
scheduling isn't just due to identical looping, it is also (as I mentioned) because this is the thread which wrote a data in a
that reads it subsequently, and it alone. Your second example doesn't follow this rule, and isn't safe.– Gilles
Nov 13 '18 at 9:22
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53277030%2fimplicit-barrier-vs-nowait-in-case-of-two-successive-pragma-omp-for%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown