View Single Post
Old 29th August 2023, 20:41     #43877
DrTiTus
HENCE WHY FOREVER ALONE
 
Tasty

Python (pypy3, zoom zoom) vs C++ with -O3 -mavx512bw (AVX512 support and auto-vectorization) [CPU: Intel 11400F]



The same calculation, structured the same way (classes, overloaded operators as function calls, naive coding), but letting the C++ compiler optimize and use SIMD instructions (zero effort on my part). Single thread for fair fight.

Standard CPython (without pypy3), the 64x64 bit convolution was coming back with 769 per second, and I just cancelled it. What a joke. 131x slower. I was actually quite happy with pypy3 - getting a speedup of over 100x for free is pretty sweet. C++ is clearly even a lot better than that, but of course I had to rewrite my code.

I'll probably think C++ on CPU is shit once I get to CUDA or OpenCL doing everything hugely parallel, but that's for another day because I think I have to understand how to do the optimization myself.
__________________
Finger rolling rhythm, ride the horse one hand...
  Reply With Quote