Recently I found myself writing a fixed size version of
std::memcmp(,,) == 0;
Was surprised to find the assembly for each is in fact different. This seems like a recognizable pattern that could be picked up on as an optimization. There are different operator combinations which all accomplish the same goal, but all seem to be a 1 - 1 translation of the c++ code. Not benchmarked anything yet, but it seems like there may be a performance boost that may be here in general.
Also not an sse expert here, so I’d be curious to learn what would be the right way to do this. I’m of course assuming
std::memcmp is already written with care.