Atomix in C++11: A lock-free experiment

Hi all!
Well, it’s been a while, isn’t it?
Some months ago, i got the C++ Concurrency in Action book which is considered one of the best books in the subject of concurrency & multithreading for C++11, but i really didn’t have enough time to study it. As a programmer, well i just have my ears/eyes open for new technologies/concepts, thus i revisited the concurrency subject, and most specifically lock-free programming & data structures, the topic is kinda hot in these days, where high-performance & throughput really matter.

Fuuuck!!

That is an interesting & challenging topic in my opinion!
It is a way of designing algorithms & data structures in a multi-threaded, lock-less environment, where multiple threads manipulate a shared data structure, WITHOUT LOCKS, but by only using atomics!!

Hmmm, no? Don’t know what atomic data types are?? Well, they are
data types such as integers, booleans, floats, even objects (<template me!>) that read modify read operations are applied on them atomically? What?Atomically?Yes, that’s right! They appear as one sole instruction people! What does this mean? And where does this apply?In a modern computer hosting a multiprocessor. A multiprocessor that any specific moment, a number of threads can be spawned simultaneously, run & execute concurrently code on each of the processors of the system.

So, you’ll tell me, atomicity & threads, how are they related?
Well, a simple example is the following:

in a pc with one process/thread executes the following instruction

a = a + 1;// assuming a has value 2

the following things will happen:
a) the value of a (2) will be read
b) it will be added to 1
c) written back to a, its new value is 3
and that is fine & great!

In a multi-threaded system with 2 cores, things could go slightly wrong, assuming the same assignment & initial value of a, things could develop like this:
a) the value of a (2) will be read by processor A
b) the value of a (2) will be read by processor B
c) processor A will add 1 and 2
d) processor A will assign 3 to a
e) processor B will add 1 to 2
f) processor B will assign 3 to a.

What???Oh yes! After executing the assignment, yes, it is possible that a could have the incorrect value 3 instead of 4. Why? Well, in our example, processor A read a‘s value and processor B read a‘s value after some time, which in both cases, in their cache, they have stored the value of a equal to 2, thus the processors didn’t have a way to understand that this specific variable, the variable a could be shared among their fellowship!

What do we need? A way to say to the compiler that a is a shared integer across multiple processors..How? Well, atomics! Atomic semantics allows us to notify the compiler that an atomic variable should be accessed in all processors, & the instruction that modifies should be executed atomically, in a way that the consequences of that instruction will be viewed by all processors, none will have a stale value of a equal to 2. When someone will read modify write the atomic variable a, the others will know that a has now value 3, and then some other thread could attempt to apply the same instruction to augment a by 1 again.

Anyways, guys please allow me to stop now cause i just wanted to share my enthusiasm about the whole atomic lock-free programming thing with the following lock-free push-only stack to understand what i am saying. If you need more context about the lock-free thing, please check the link list in the end of the article…just wanted to share 1 or 2 snippets showing the problem, not explain the whole situation & concepts!
Thank you!

The Experiment
Three C++11 threads are accessing 3 CPU cores to fill an integer stack with 100 aces 100 times (30000+1 nodes), first by creating a new stack node, writing its value (v), connecting its next link to the head pointer & then assign the head to point at the new node..

        struct node{
    
            int v;
            node* next;
        
            node():next(nullptr),v(-1){}
            ~node(){}
        };

        node* head;
        void push(int v){
    
            auto n  = (new node());
            n->v = v;
            n->next = head;
            std::cout<<n->next<<"  "<<head<<std::endl;
            assert (n->next == head);
            head = n;
            assert(head == n);
    
        }

Everything is cool and fine, but holy shit! What happened at the first assert function call?Is it failing?Why?Maybe another thread accessed and modified the head pointer while the current one was holding the address of a pointer that just 1 millisecond ago was the head?The head pointer changed just after the assignment…thus, if initially two threads attempted to push an ace at the same time, then after the concurrent push, it is possible that the stack holds 3 elements, 2 nodes & the initial head of the stack, where both nodes have as right neighbor the initial head sentinel (and the one is the head, the other??)…We could check that by removing the assertions and call length() after the loop execution to see that lesser than 30001 nodes have been pushed to our stack, where are there rest??Dangling pointers maybe??
Catastrophic, but let’s look at the bright side!

Atomics are here for us! By treating the head pointer as an atomic variable, we test to see if the value of the head pointer is the one that our current thread assigned as a right neighbor to the new node, and then atomically assign the new node to its, which would be the new head pointer of the stack. I said “we test”, because as explained earlier, it is possible that at the same time another thread could have accessed the atomic head pointer atomically (load call) just some milliseconds earlier than our current thread, thus the other thread read wrote modified the head pointer just before the current one.

        void push(int v){

            node* n  = new node();
            n->v = v;
            n->next = head.load();
            
            while(!head.compare_exchange_weak(n->next,n )){ 
                      //if head == n->next, then head = n, return true finally
                std::cout<<n->next<<"  "<<head<<std::endl;
            
            };
            //Outside of the cmp_xchng atomic loop, head == n->next 
            //for a slight moment, afterwards head gets value == n
            //std::cout<<head<<"=="<<n<<std::endl;
            
            if(head != n){
                //Head changed again by another thread that called 
               //compare_exchange_weak after the successful cmp_xch 
               //call of the current thread!!
                //Shit!!!
                std::cout<<head<<"   WHAT   "<<n<<std::endl;//exit(1);
            }
            
        }

In the second snippet, it is visible that, compare_exchange_weak fails to evaluate to true some times, due to the fact that another thread modified the head pointer just before the current one. And of course, the comparison after the successful compare & xchange function call could also be true, the new node is not anymore the head pointer of the stack. It was just a biiiit earlier its head, just to allow compare_exchange_weak to evaluate to true, by atomically comparing the head as the right neighbor of the new node, and atomically assign the new node to the head pointer in one instruction. After the success of compare_exchange_weak, anything can happen after all…

That’s it,
i found it an interesting subject to spend some time, and see what’s going on.. Interesting concepts, interesting food for thought i guess..

Where to run:
a) Linux/Unix
b) g++ -v >= 4.8
c) multiprocessor
d) -std=c++11 -pthread

What to run:

https://github.com/gclkaze/tzertzelos-lock-free-stack-experiment.git

How to run:
a) make all
b) ./a.out  (the failing stack)
c)./atom (the concurrent-correct one)

For more info check:
C++ Concurrency in Action
An Introduction to Lock Free Programming
atomic Weapons: The C++ Memory Model and Modern Hardware by Herb Sutter
CppCon 2014: Herb Sutter “Lock-Free Programming (or, Juggling Razor Blades), Part I
CppCon 2014: Jeff Preshing “How Ubisoft Develops Games for Multicore – Before and After C++11”
Lock-Free Data Structures
The Atomic Bitchwax – ll (2000) (Full Album)
The Atomic Bitchwax – III [Full Album]

Remarks:
i) If you liked it or disagree or whatever, give some feedback, maybe i ‘ll start writing more often to the blog.
ii) Yeap, i didn’t mention anything about memory ordering & the memory model, but hey, i just wanted to share 1-2 things 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s