I think the general recommendation with OpenMP is that for nested loops the inner loops should not be parallelized. Or more precisely, only one level of parallelism, and usually the outermost is the ...