Had cause to use the STL parallel algorithms this week with excellent results. The algorithm was a small matrix inversion repeated for each in an image (to correct for effects of cross-polarisation).

Used std::for_each, iota views , parallel execution policy and lambda with variable capture.

These are all fairly new features and needed GCC v15 for proper performance, but with this the acceleration was a very impressive factor of 30 on a 32 core machine. The best thing – only four lines of code needed changing:

  • std::for_each iteration over a std::ranges::views::iota(0ul, N)
  • Using std::execution::par policy
  • And capturing the variables including the this pointer [this, &lim, images](size_t j) {

Rough structure:

void A::fn(double *images[]){
    auto r = std::ranges::views::iota(0ul, imgsize);
    std::for_each(std::execution::par,
		  r.begin(), r.end(),
		  [this, images](size_t j) {
		    auto beam = this->Value(j);
	        beam.invert(); // This is 4x4 matrix inversion that was expensive
			correct(images, beam);
		    }
		  );
}