Code Generation with C preprocessor

In this article, i will present you one quite neat technique which could allow someone to utilize C preprocessor‘s power in order to dynamically generate content for his data structures based on the content of a generated text file.

Before showing & explaining the demo code, it would be nice first to share some background information regarding the technique & its objectives. The latest project that we were assigned at work is the implementation of a compiler, which will compile a text file containing source code of that specific language. I won’t dive into detail about the language’s syntax, all i could say that it is an imperative language, such as C.

In order to have a compiler, the first thing you will need to do is to be able to read characters from an input stream & parse that input. The input text should conform according to the rules of the grammar of that specific language. In order to write his own grammar for her language, a compiler maker could use a combination of Lex/Bison/Yacc or Antlr4. In our case, we selected Antlr4, a quite modern, fast & neat parser generator. Antlr accepts as input an Antlr4 grammar file & generates code that could parse input text that conforms to the grammar that we specified in our grammar file. The generated code’s programming language could vary based on your preference, Antlr4 can spit a lexer, a parser, a listener and a visitor class files in C#, Python, Java & Javascript. In addition, it generates a token file, containing all different tokens that are accepted by our grammar’s rules. A token is a word that consists of one or more characters that is accepted by our parser. For instance, if in our grammar, a multiplication rule exists such as the following:

MULTIPLICATION: ‘*’;

the token file will contain the following line:

MULTIPLICATION = 1

which means that the token ‘*’ has been assigned the rule label ‘MULTIPLICATION’ that has token id equal to 1, because our grammar consist of just one rule, the multiplication rule. If we decide later to extend our grammar with an additional rule, let’s say the division rule, we will add the new rule

DIVISION: ‘/’;

and rerun Antlr. The generated token file will now contain two tokens,

MULTIPLICATION = 1
DIVISION = 2

Each time we extend our language with new rules, new tokens are generated. We will need a way to handle dynamically generated tokens for our language; if someone could remove a rule from your grammar by mistake, the syntactic analysis part won’t be in synchrony with the semantic analysis part anymore, and this will be not good at all.

Imagine now that you would like to encapsulate all different tokens in an enum class, which will be named TokenType, which will be a private member of our Token class. You would like this enum class to be populated with all different tokens, each time Antlr parses the grammar file & generates a token file. The enum class will contain exactly the tokens contained in the token file. How someone could do that, without parsing anything in just a few lines of C code? Well, by using the C preprocessor of course!

Demo

1. Preparation – Convert the token file

We reconsidered our small grammar containing only a multiplication & a division rule, and we added some additional rules and we concluded with the following rule set of our demo grammar file (it is not valid Antlr4 code, just demo).

ADDITION : ‘+’;
DEDUCTION : ‘-‘;
MULTIPLICATION : ‘*’;
DIVISION : ‘/’;
LEFT_PARENTHESIS : ‘(‘;
RIGHT_PARENTHESIS : ‘)’;
INCREMENT : ‘++’;
DECREMENT : ‘–‘;
POINTER_ARROW : ‘->’;

From this grammar file, Antlr generated the following token file

ADDITION = 1
DEDUCTION = 2
MULTIPLICATION = 3
DIVISION = 4
LEFT_PARENTHESIS = 5
RIGHT_PARENTHESIS = 6
INCREMENT = 7
DECREMENT = 8
POINTER_ARROW = 9

In order to make our technique to work, we will need to convert our token file in the following format:

TOKEN(ADDITION,1)
TOKEN(DEDUCTION,2)
TOKEN(MULTIPLICATION,3)
TOKEN(DIVISION,4)
TOKEN(LEFT_PARENTHESIS,5)
TOKEN(RIGHT_PARENTHESIS,6)
TOKEN(INCREMENT,7)
TOKEN(DECREMENT,8)
TOKEN(POINTER_ARROW,9)

You will see later why we needed to perform this conversion. The converted token file will be named tokens.inc

2. Populate the token types

Our Token class contains the enum class NodeType, which enumerates all detected token by our parser. This is possible, by including the tokens.inc file which is interpreted by the C preprocessor as a #define function style definition. For each TOKEN(a,b) statement of the tokens.inc file, a new enumeration is added to our enum class NodeType; each label (a) of the enum is initialized by the value of each token (b). In the end, we discard the TOKEN definition. Of course, the label of the definition could be whatever;i just used the word TOKEN for convenience.

#include <iostream>

class Token  {
public:
    enum class TokenType  : int {
    	#ifndef TOKEN
    	    #define TOKEN(name,value) name = value,
    	    #include "tokens.inc"
        #endif
        LAST
    };
private:
    TokenType m_Type;
public:
    Token(TokenType type) : m_Type(type){}
    int getNumberOfTokenTypes() {return static_cast<int>(TokenType::LAST) - 1;}
};

int main(int argc, char**argv)
{
    Token token(Token::TokenType::MULTIPLICATION);
    std::cout << token.getNumberOfTokenTypes() << std::endl;
}

Another nice thing that i wanted to show, is the LAST token type that i have added after the “initialization list” of the token types. By adding that last type, we know the amount of different available token types. The cpp file instantiates a multiplication node and prints the amount of available types (Toke::TokenType::LAST – 1). Consider that if the token type MULTIPLICATION for some reason is not included in the tokens.inc file, the cpp program won’t compile.

In our case & by using the previous grammar, by compiling (g++ -std=c++11) & running the cpp executable, we get:

gclkaze@tzertzelos:~/Tzertzelos/Scriptz/C++/PreprocessorPower$ ./a.out
9

If i remove the last rule (POINTER_ARROW) & run the cpp program, we will get:

gclkaze@tzertzelos:~/Tzertzelos/Scriptz/C++/PreprocessorPower$ ./a.out
8

The amount of available types changes based on the content of the tokens.inc file, that is generated based on the most recent Anltr4-powered grammar.

I found it quite powerful to be able to generate code content in such an easy way, with just using a conditional include statement and a define preprocessor directive & just utilizing the C preprocessor, such a neat & concise solution.

Hope you enjoyed & learned a thing or two,

if you already knew, even better,

kazeone

Links

Antlr4ANother Tool for Language Recognition

Yacc : Yet Another Compiler-Compiler

Bison : The YACC-compatible Parser Generator

Lex : A Lexical Analyzer Generator

C preprocessor

enum class (C++11)

Syntactic analysis(Compiler Construction)

Semantic Analysis (Compiler Construction)

Music

Cult Of Luna – Salvation

Advertisements

Color clustering – Part #1: Clustering by color relevance

Hi all,

in a previous post, i promised you to post an article about color clustering & gif creation. After around half a year, i decided to refactor a bit the code & tell more about it. Although, i wouldn’t like to tire you with a huge article about the whole process. Thus, i have divided it in three parts which consist the Color Clustering Epic.

The parts are the following:

  • Part 1: Color Clustering by color relevance
  • Part 2: Image generation by color aggregation
  • Part 3: Gif generation

Let’s dive into it!

Color Clustering by color relevance

But what exactly do i call Color Clustering? The way that someone could group relevant colors by examining their actual pixel populations in a target image and their rgb relevance, where a pixel’s color is specified by the values of red,green & blue ( 0 <= value <= 255). The rgb relevance of a color with another can be found by:

  • finding the Euclidean distance between the rgb components of one color with another
  • compare the distance with one fixed offset number in order to determine if the colors are relevant.

If the offset number is quite small, then the set of the groups will be quite big. For instance, consider the following image that is quite “monolithic” color-wise

27118-2-1353337757

This image consists of majorly red, black, white & some blue. Let’s take two “almost” red pixels; one from the center of the image, and one to the upper right. We decide upon a small offset equal to 4. Well, the first pixel (p) has values: r=220, g=8,b=10 and the second one (q): r=240,g=40,b=10 . By applying the Euclidean distance

849f040fd10bb86f7c85eb0bbe3566a4

their distance is ~37, and 37 < 4 (offset value). Although, both pixels are mainly red, by picking a small offset, we decide that these two red pixels are irrelevant, when they should be. Thus, we should augment the value of our offset in order to capture relevant colors but without overdoing it. Otherwise, if we pick a quite big offset (as 1000 for example), it is possible that all colors will be marked as relevant of just the first color encountered in the image. In my example. i have assigned to the offset the value 100.

This was the basic idea and what you need to parse in order to understand what i am showing in the follow-up of this article.

Setup

The code has been developed on Ubuntu Linux, and the dependencies are the following

  • Python 2.7
  • PIL Python module for reading & writing pixel values from/to image

Code

1. Read image file

def image_pixels(file):

    img = Image.open(file)

    pixels = img.load() # this is not a list, nor is it list()’able
    width, height = img.size

    all_pixels = []
    for x in range(width):
        for y in range(height):
            cpixel = pixels[x, y]
            all_pixels.append(cpixel)
    return all_pixels,img.size

We use PIL’s Image module in order to read the pixel value tuples & store them in a list.

2. Build color table

def color_counter(pixels):
    T = {}
    colors = 0
    total = 0
   for i in xrange(0,len(pixels)):
        if len(pixels[i]) == 3:
            r,g,b = pixels[i]
        else:
            r,g,b,a = pixels[i];print pixels[i]#;exit(1)
          key = ‘%s,%s,%s’ % (r,g,b)
        if key in T:
            T[key] += 1
            colors += 1
        else:
            T[key] = 1
            total += 1
    print ‘Different:’,colors ,‘Amount of pixels:’,total
     assert(len(pixels) == total)
    return T,colors,total

By having the pixel tuple list, we can build a color table, a dictionary that has as keys all unique colors encountered in the image and as values the population of pixels of each associated color.

3. Color Clustering

By having obtained the color table, we know all colors that are contained in the target image and also the amount of pixels of each color. We are ready to perform the euclidean distance to each color contained in the color table and group all colors in color clusters.

def cluster_relevant_colors(table,offset,img_name,size):
    substitute = copy.deepcopy(table)
    keys = table.keys()
    deleted = 0
    
    # Holds  key-[value] => : ‘color’=> [absorbed color1,absorbed color2,absorbed color3]. 
    # So we know which color was absorbed by another one
absorbed = {} 
  for i in xrange(0,len(keys)):
        key = keys[i]
        if key not in substitute:
            continue
        r1,g1,b1 = [int(c) for c in key.split(‘,’)]
        removed = 0
        on_absorb = False
       for j in xrange(i+1,len(keys)):
             key2 = keys[j]
           if key not in substitute:
               break
           if key2 not in substitute:
               continue
             #Find euclidean distance between these two colors, 
             #to check if they are relevant with respect to their rgb components
             r2,g2,b2 = [int(c) for c in key2.split(‘,’)]
             dr,dg,db = abs(r1  r2), abs(g1  g2), abs(b1  b2)
             ediff = sqrt ( (dr*dr) + (dg*dg) + (db*db) )
           if ediff <= offset:
                on_absorb = True            
                assert (ediff <= offset)
                #Transfer pixel populations + remove the weak color 
               #from the table (it has been absorbed)
               if substitute[key] >= substitute[key2]:
                     absorbed = transfer_absorbed_colors(key2,key,absorbed)
                     substitute[key] += substitute[key2]
                     del substitute[key2]
                elif  substitute[key] < substitute[key2]:
                     absorbed = transfer_absorbed_colors(key,key2,absorbed)
                     substitute[key2] += substitute[key]
                     del substitute[key]
                   break
                else:
                    assert(None)       
                  removed += 1
                  deleted += 1
         #Current color wasn’t absorbed by any color
         if on_absorb == False:
            if key not in absorbed:
                absorbed[key] = []    keyssubstitute = substitute.keys()
    keysabsorbed = absorbed.keys()
    #Output the absorbed table & the remaining colors
    json_out(absorbed, name = get_filename ( img_name,‘json’) )
    json_out(substitute, name = get_filename ( img_name,‘clusterjson’) )

 

The first step is to create a copy of the original color table, in which we will operate (substitute). Then, for each color, we iterate all color keys & calculate the euclidean distance between the current color with color value key and the encountered color during the iteration with color value key2. If the distance is smaller or equal than the provided offset,  one of the colors is absorbed by another. In order to find that, we need to check the populations of each color in our table. We will assume that the color that has larger pixel population will be the dominant color. The dominant color will absorb the second color and its population. In the end, the substitute color table will contain all dominant colors that are quite different with each other as keys, and as values, the total amount of pixels for each color along with the absorbed color table, containing all dominant colors along with the colors that were absorbed by them.

{
    “252,156,105”: 34951,
    “50,68,88”: 92838,
    “255,30,0”: 187878,
    “171,44,131”: 3198,
    “0,4,0”: 339535,
    “100,126,175”: 505,
    “104,9,0”: 136452,
    “156,196,248”: 20,
    “179,113,17”: 34,
    “255,214,190”: 4588,
    “255,86,177”: 1 
}

This json depicts the contents of the substitute dict, that contains all dominant colors along with their populations. As we can see, the majority of the image’s pixels are black (0,4,0: 339.535 pixels), bordeaux (104,9,0:136.452 pixels) and red (255,30,0:187.878 pixels) as expected. All variations of the dominant colors have been absorbed drastically.

The contents absorbed & substitute tables will be used in the next phase/part of the epic. For now, i won’t provide any code until the last & final chapter of the Color Clustering epic.

Although, i will provide some gifs in order to show you what you could generate after the end of the Color Clustering epic. Of course, i do not own any of the initial artwork that was used to generate the gifs.

Hope you enjoyed,

kazeone

 

Appetizers

baroness_blue

Baroness – Blue Record

 

pelican_australasia_aggr

Pelican – Australasia

vol_4_aggr

Black Sabbath – Vol. 4

om_agios_aggr

Om – Advaitic Songs

( We do not own any of the artwork that was used to generate the presented gifs.)

C/C++ function call from Python

Hey there,

in a previous post, i explained how someone could set up things in a C++ class, in order to load a Python module & call a function through that class. Today, i am going to demonstrate the exact opposite! How someone could actually call a C function from a Python script? This is the objective of the current mission!

Setup

The code has been developed on Ubuntu Linux, and the dependencies are the following

  • C / C++ 11
  • Python 2.7

In Windows, instead of building shared library (*.so) files, you will build DLLs.

Preparation

1. Pack all functions in a shared library (*.so)

In our hypothetical scenario, we would like to use  a version of Libc‘s fgetws() where the amount of characters of the buffer to where the input data will be inserted from the keyboard/standard input could vary. Our function should be contained in a shared library that could be loaded from the Python program dynamically, and then call our kgets function that resides inside the library.

1.1 The header file klib.h

extern "C" wchar_t * kgets(int);

1.2 The implementation file klib.cpp

#include <string>
#include "klib.h"

wchar_t * kgets(int size)
{
    wchar_t *b = new wchar_t[size];
    return fgetws(b,size,stdin);
}

Notice that in the declaration of kgets in klib.h, by adding extern “C”, we declare that kgets could be linked from outside, allowing a client C program to call it. By seeing that declaration, the compiler won’t mangle the exported function’s name, allowing the client program to link with it; the function name will be available to the linker.

We compile them through a Makefile:

all:
        g++ -std=c++11 -Wall -fPIC -O2 -c klib.cpp klib.h
        g++ -std=c++11 -shared -o klib.so klib.o
clean:
        rm -f *.so
        rm -f *.o

 

And then we type:

gclkaze@tzertzelos:~/Tzertzelos/Scriptz/PythonC++$ make all
g++ -std=c++11 -Wall -fPIC -O2 -c klib.cpp klib.h
g++ -std=c++11 -shared -o klib.so klib.o
gclkaze@tzertzelos:~/Tzertzelos/Scriptz/PythonC++$ 

 

And our klib.so containing our almighty kgets is ready to be loaded!

2. Load the shared library through Python & call the function

Our Python script uses the ctypes Python module which actually performs the whole thing. It provides library loading & data type conversion utilities, allowing us to first to describe the function that resides inside the shared library, then load it & finally call it. By describing what (exactly) the function call should take as arguments & a return value data type, ctypes ensures to setup things in order to call the function. Of course the given  description/prototype of the function in the Python side, should match one-to-one with the appropriate exported C function that resides in the C side. Otherwise, the function call will fail & our python script will get an exception before calling the exported C function.

Let’s see the client.py script:

 # -*- coding: utf-8 -*-
import ctypes as c
#Load the shared library
klib = c.cdll.LoadLibrary(“./klib.so”)
#Load function
kgets = klib.kgets#Prepare return value & argument
kgets.restype = c.c_wchar_p
kgets.argtypes = [c.c_int]#Get 20 wide string characters from the 
#standard input. Skip new line in the end (‘\n’)
res = kgets(20)[:-1]
print ‘kgets says:’,res

 

(*The call to ctypes.cdll.LoadLibrary is used for shared library files only, not DLLs.)

Execute it:

gclkaze@tzertzelos:~/Tzertzelos/Scriptz/PythonC++$ python client.py 
Tzertzelos Blog!
kgets says: Tzertzelos Blog!

It works!

We were able to:

  • compile fast C functions in a shared library (.so)
  • write a Python script, which by utilizing the power of the ctypes module was able to load our library & call one exported C function

Now go gain some high performance, by extending your C libraries & use Python to wrap them up in shared libraries.

Cheers,

kazeone

 

Linx

Static, Shared Dynamic and Loadable Linux Libraries

Shared Libraries

ctypes — A foreign function library for Python

 

Musix

Oranssi Pazuzu – Värähtelijä