There's a common belief that an empty volatile asm in GCC is a scheduling barrier :
__asm__ volatile("")but that appears to not actually be the case (* or rather, it is actually the case, but not according to spec). What I believe it does do is it splits the "basic blocks" for compilation, but then after initial optimization there's another merge pass where basic blocks are combined and they can in fact then schedule against each other.
The GCC devs seem to be specifically defending their right to schedule across asm volatile by refusing to give gaurantees : gcc.gnu.org/bugzilla/show_bug.cgi?id=17884 or gcc-patches/2004-10/msg01048.html . In fact it did appear (in an unclear way) in the docs that they wouldn't schedule across asm volatile, but they claim that's a documentation error.
Now they also have the built-in "__sync" stuff . But I don't see any simple compiler scheduling barrier there, and in fact __sync has problems.
__sync is defined to match the Itanium memory model (!?) but then was ported to other platforms. They also do not document the semantics well. They say :
"In most cases, these builtins are considered a full barrier"
What do you mean by "full barrier" ? I think they mean LL,LS,SL,SS , but do they also mean Seq_Cst ? ( there also seem to be some bugs in some of the __sync implementations bugzilla/show_bug.cgi?id=36793 )
For example on PS3 __sync_val_compare_and_swap appears to be :
sync lwarx .. loop .. stwcx isyncwhich means it is a full Seq_Cst operation like a lock xchg on x86. That would be cool if I actually knew it always was that on all platforms, but failing to clearly document the gauranteed memory semantics of the __sync operations makes them dangerous. (BTW as an aside, it looks like the GCC __sync intrinsics are generating isync for Acquire unlike Xenon which uses lwsync in that spot).
(note of course PS3 also has the atomic.h platform-specific implementation; the atomic.h has no barriers at all, which pursuant to previous blog Mystery - Does the Cell PPU need Memory Control might actually be the correct thing).
I also stumbled on this thread where Linus rants about GCC .
I think the example someone gives in there about signed int wrapping is pretty terrible - doing a loop like that is fucking bananas and you deserve to suffer for it. However, signed int wrapping does come up in other nasty unexpected places. You actually might be wrapping signed ints in your code right now and not know about it. Say I have some code like :
char x; x -= y; x += z;What if x = -100 , y = 90 , and z = 130 ? You should have -100 - 90 + 130 = -60. But your intermediate was -190 which is too small for char. If your compiler just uses an 8 bit machine register it will wrap and it will all do the right thing, but under gcc that is not gauranteed.