Example 1
Consider the following example of a parallel program that sums the
elements of an array:
// File: example1.cpp
//
#include
#include
int main() {
int reps = 10000 ;
long int A[reps] ;
long int sum = 0 ;
for (long int i=0; i < reps ; i++) {
A[i] = i ;
}
omp_set_dynamic(0) ;
#pragma omp parallel shared(A,sum,reps) num_threads(4)
{
#pragma omp single
{ // only one thread has to do this
// omp_set_num_threads(4);
printf("Number of threads = %d\n", omp_get_num_threads() ) ;
}
#pragma omp for schedule(static,5)
for (long int i=0; i < reps; i++) {
sum += A[i] ;
}
} // end of parallel region
printf("sum = %ld\n", sum) ;
return 0 ;
}
Download: example1.cpp
Download this program. Then, compile and run it a few times. On GL, you
need to use the -fopenmp flag to compile:
g++ -fopenmp example1.cpp
What happens when you run this program a few times? Does it give the
same answers? (Hint: no.) The reason for this is that the shared
variable sum is being updated by different threads and sum of
the updates might get lost.
This is a common situation. OpenMP uses the reduction clause to
solve this problem.