Cpusim instruction memory

6/29/2023

If you still don’t grasp intuitively what’s going on in this experiment, I’d suggest a review of my source control analogy post. This time, our little home-grown mutex really does protect sharedValue, ensuring all modifications are passed safely from one thread to the next each time the mutex is locked. I’m not an ARM expert – comments are welcome – but it’s safe to assume this instruction, much like lwsync on PowerPC, provides all the memory barrier types needed for acquire semantics on compare_exchange_strong, and release semantics on store. If (pare_exchange_strong(expected, 1, memory_order_acquire))Īs a result, the compiler now inserts a couple of dmb ish instructions, which act as memory barriers in the ARMv7 instruction set. Using Acquire and Release Semantics Correctlyįixing our sample application, of course, means putting the correct C++11 memory ordering constraints back in place: In other words, the mutex could be effectively unlocked before we’re finished with it! As a result, the other thread would be free to wipe out the change made by this one, resulting in a mismatched sharedValue count at the end of the experiment, just as we’re seeing here. To point out just one possible reordering – and there are several – the memory interaction of str.w r0, (the store to sharedValue) could be reordered with that of str r5, (the store of 0 to flag). Here’s the iPhone, hard at work, running the experiment:Īnd here’s the output from the Output panel in Xcode:Ĭheck it out! The final value of sharedValue is consistently less than 20000000, even though both threads perform exactly 10000000 increments, and the order of assembly instructions exactly matches the order of operations on shared variables as specified in C++.Īs you might have guessed, these results are entirely due to memory reordering on the CPU. It’s available on GitHub if you’d like to view the source code or run it yourself. I’ve put together a sample application which repeats this experiment indefinitely, printing the final value of sharedValue at the end of each trial run. As you can see, we got lucky: The compiler chose not to reorder those operations, even though the memory_order_relaxed argument means that, in all fairness, it could have. Above, I’ve annotated the corresponding sections of assembly code. This would include the two operations on flag, and the increment of sharedValue in between. All we want to know is whether the compiler has reordered any operations on shared variables. If you aren’t very familiar with assembly language, don’t worry. If (pare_exchange_strong(expected, 1, memory_order_relaxed))Īt this point, it’s informative to look at the resulting ARM assembly code generated by the compiler, in Release, using the Disassembly view in Xcode: Void IncrementSharedValue10000000Times(RandomDelay& randomDelay) Here’s some pseudocode:Ĭount = 0 while count, and use a read-modify-write operation: If the lock fails, it will just go back to doing busy work. If the lock succeeds, the thread will increment sharedValue, then unlock. just wasting CPU time) and attempting to lock the mutex at random moments. Instead, each thread will loop repeatedly doing busy work (ie. We won’t let our threads block waiting on the mutex. We’ll spawn two threads, and each thread will run until it has incremented sharedValue 10000000 times. Our experiment will consist of an single integer, sharedValue, protected by a mutex. It runs on a dual-core ARM-based processor, and the ARM architecture is, in fact, weakly-ordered. Fortunately, I happen to have one right here in my pocket: What we really need is a weakly-ordered multicore device. So we can forget about demonstrating this phenomenon on pretty much every modern desktop or notebook computer in the world.

That’s what I’d like to demonstrate in this post using pure C++11.įor normal applications, the x86/64 processor families from Intel and AMD do not have this characteristic. If there’s one thing that characterizes a weakly-ordered CPU, it’s that one CPU core can see values change in shared memory in a different order than another core wrote them. I’ve tried to make these subjects approachable and understandable, but at the end of the day, talk is cheap! Nothing drives the point home better than a concrete example. On this blog, I’ve been rambling on about lock-free programming subjects such as acquire and release semantics and weakly-ordered CPUs.

0 Comments

Cpusim instruction memory

Leave a Reply.

Author

Archives

Categories