A Generic C++ 11 Column-based Data Table – A full template member method specialization & inheritance use case

c++-pacman

It’s been a while since i wrote about a programming related post, right?


Well today, I am going to show you how to use templates in specializations in order to support different implementations of a member function (by data type) & hide the underlying specialization data type from the owner of the templated resources by using (unique) pointers to the resources base class.

Hmm, all this sound a bit vague, right? Well, consider the following use case!

1. Problem
Make a Data Table class that accepts & stores an arbitrary number of vectors of integers/doubles/wide character strings and is able to access each vector independently at any given moment.Each vector could be viewed as a Data Column, where each data cell in the column has the same data type as the rest cells of the given column. The Data Table class has the additional property that the last column is somewhat special and accepts an old school C style struct (the Object struct) that stores either a column of strings, integers or doubles into our table.

#include <string>
typedef struct Object{
	enum ObjectType{
	    INT,
	    DOUBLE,
	    STRING,
	    ALL
	};
	ObjectType t;
    std::wstring str;
    int i;
    double d;
} Object;


A representation of a Data Table instance could be like the following:

+-----+-------------------------------------------------------+---------+------------+---------------+-----------------+---------------+
| id  | name                                                  | city_id | country_id | creation_date | longitude       | latitude      |
+-----+-------------------------------------------------------+---------+------------+---------------+-----------------+---------------+
|   1 | download festival                                     |     345 |        826 |    1486116627 |  -1.37659071746 | 52.8253416376 |
|   2 | roadburn festival                                     |     249 |        528 |    1486116627 |         5.09283 |      51.55774 |
|  89 | crossing border festival                              |    1222 |        528 |    1486125508 |       4.3161918 |    52.0806249 |
| 174 | sweden rock festival                                  |    1281 |        752 |    1486125656 |              -1 |            -1 |
| 203 | leeds festival                                        |     283 |        826 |    1486125855 |  -1.38795856125 | 53.8692024936 |
| 204 | reading festival                                      |     187 |        840 |    1486125855 | -0.991504837082 | 51.4644779002 |
| 212 | i'll be your mirror - all tomorrow's parties festival |     181 |        826 |    1486585624 | -0.131165437732 | 51.5936679342 |
| 236 | bloodstock open air festival, catton hall             |    1286 |        826 |    1486585635 |  -1.69612337663 | 52.7419809397 |
| 284 | bråvalla festivalen - site office                     |    1297 |        752 |    1486672011 |        16.12269 |      58.60874 |
+-----+-------------------------------------------------------+---------+------------+---------------+-----------------+---------------+


Hmmm, the Data Table & its Data Columns seem like a MYSQL table isn’t it ? 🙂 The first column (id) along country_id, city_id & creation_date is of integer type, the name Data Column is of type wide string and the longitude & latitude have double as their data type.

2. Implementation
We will start with the Table class. As mentioned in the Problem definition section, our Table should store initially any number of Columns of one specific data type plus one additional special Column, the Value Column. These requirements are implemented by keeping a vector of unique pointers to Columns (Table::m_Columns class member variable) & a unique pointer to store the Value Column (Table::m_Value class member variable) respectively. Why am i using unique pointers? Well, the Table class is the owner of the Columns & i want all Column resources to be freed after the Table’s lifetime expires (Table::~Table() is called) at some point.

//Table.h
#include "DataColumnFactory.h"
#include <assert.h>

class Table{
    using ColumnRsrc = std::unique_ptr<Column>;

    std::vector<ColumnRsrc> m_Columns;
    ColumnRsrc m_Value;
    std::wstring m_Name;

    public:
    Table(const std::wstring& name):m_Name(name){}
    ~Table(){}

    template <typename T>
    void add(std::vector<T> t) {
        auto column = DataColumnFactory::getColumn<T>();
        column.get()->add(t);
        m_Columns.push_back(std::move(column) );
    }

    void addValue(Object* o){
        if(!o) return;
        if(!m_Value.get() ){
            m_Value = DataColumnFactory::getColumnByType(o);
        }            
        assert(m_Value.get());
        m_Value.get()->add(o);
    }

    void dump(){
        dumpTableGeneralInfo();

        int i = 0;
    	for(auto &column : m_Columns){
            std::wcout << "Column #" << ++i << std::endl;
    	    column.get()->dump();
    	}

        if(m_Value.get()){
            std::wcout << "Column #" << ++i << std::endl;
            m_Value.get()->dump();
        }
    }

private:
    void dumpTableGeneralInfo(){
        //...implementation skipped
    }
};


The Table class provides two methods for inserting data, the first add() method allows the user to insert Column data which will be stored in the m_Columns member method & the second add() method that accepts as argument an Object type, allowing us to insert data to the Table’s special Value Column. Both methods are using the DataColumnFactory class in order to allocate a new DataColumn instance. The first add() method’s implementation gets a template argument T that is passed to the DataColumnFactory, indicating the type of data that the newly allocated DataColumn will store. The second overload accepts as argument an Object* that is then passed to the DataColumnFactory. The DataCloneFactory knows the Object instance’s internal representation, thus it allocates & returns to our Table instance a DataColumn with type that is dependent to the Object instance’s storage type (INTEGER, STRING or DOUBLE).

//DataCloneFactory.h
#include "Column.h"

class DataColumnFactory{
public:
    template<typename T>
    static std::unique_ptr<DataColumn<T>> getColumn(){
         std::unique_ptr<DataColumn<T>> column(new DataColumn<T>);
         return std::move(column);
    }

    static std::unique_ptr<Column> getColumnByType(Object* o){
        if(!o) return nullptr;
        switch(o->t){
            case Object::ObjectType::INT:{
                return std::move ( DataColumnFactory::getColumn<int>() );
            }
            case Object::ObjectType::DOUBLE:{
                return std::move ( DataColumnFactory::getColumn<double>() );
            }
            case Object::ObjectType::STRING:{
                return std::move ( DataColumnFactory::getColumn<std::wstring>() );
            }
            default: break;
        }
        return nullptr;
    }
};


The DataCloneFactory returns a (moved) DataColumn unique pointer to the Table; as we said, the Table instance is the owner of the columns. The factory contains two static methods that are responsible for allocating the DataColumns. The first templated method DataColumnFactory::getColumn() accepts as template argument T, that is the argument passed through Table::add(std::vector), representing the new Data Column’s data type. getColumn() allocates & returns a unique pointer to the newly allocated DataColumn to the Table instance. The second static method, DataColumnFactory::getColumnByType(Object*) allocates & returns a unique pointer to a new DataColumn, depending on the Object* argument’s type value. This static method returns a unique pointer to a Column instance, the base class of all DataColumns. In both cases, after calling a factory method, the Table instance stores the newly created DataColumn/Column objects to m_Columns & m_Value respectively.

3. Polymorphism
Keep in mind that m_Columns & m_Value are actually of Column (base class)data type, not of type DataColumn (derived class from Column), allowing us to use the columns polymorphically, hiding away from the Table instance their underlying data types. Is it a std::wstring, a double, an integer Column, who knows? Let’s call a member function on them & see how they behave!

//Column.h
#include 
#include 
#include 
#include "Object.h"

class Column{
public:
    virtual ~Column(){}
    virtual void add(Object *) = 0;
    virtual void dump() = 0;
    virtual int size() = 0;
};

template <class T>
class DataColumn : public Column{
    std::vector<T> values;
    public:
    	DataColumn():values({}){}
    	virtual ~DataColumn(){}
    	virtual void add(Object*) override;

    	void add(std::vector<T> v){
	    for(int i = 0;i<v.size();i++){
                values.push_back(v[i]);
	    }
    	}

    	virtual void dump() override{
            for(auto &k : values){
    	        std::wcout << "* " << k << std::endl;
    	    }
            std::wcout << std::endl;
    	}

        virtual int size() override {return values.size();}
};

template<>
void DataColumn<int>::add(Object* o){
    values.push_back(o->i);
}

template<>
void DataColumn<std::wstring>::add(Object* o){
    values.push_back(o->str);	
}

template<>
void DataColumn<double>::add(Object* o){
	 values.push_back(o->d);
}


The Column interface provides 3 methods: add(Object*), dump() & size(). All derived classes need to provide implementations about how to add an Object* to a column, how to print the contents of a Column instance and how to return the amount of Column items (of the same data type) that are contained in the Column. The Column class is an interface & it does not store any state for the Column itself, it just represents what a generic Column should look like & what/which minimum operations should be permitted in order to allow a derived class to become Column instance.

4. Template method class specialization
The DataColumn class that extends the Column interface is a template class that takes as template argument T the data type of the elements that will store. The class contains a vector as private member, that is where the Column elements will be stored. Except the three virtual methods that were inherited from the Column interface, the DataColumn class provides also the method add(std::vector), allowing us to store the Column elements to our vector. This method is used  for storing simple Columns by the Table class. In order to store data retrieved from an Object*, three methods with template specialization are offered. Each method knows which member of the Object* to access in order to get its value from the struct & store it in its vector, depending on the Data Column’s template specialization. Thus, the DataColumn<double>::add(Object*) knows that the d struct member contains the double value and that one needs to be accessed in order to retrieve & store the value to its std::vector values. DataColumn<std::wstring>::add(Object*) in the same way knows that the str member needs to be accessed in order to retrieve & store the wstring value to its std::vector values. Finally, the same also applies for DataColumn<int>::add(Object*) which accesses the member i. By providing full template specialization implementations for each supported data type, we allow the compiler to know at compile time which function to call, allowing us to instead of writing different full implementations of the derived classes DataColumn<std::wstring>, DataColumn<double>, DataColumn<int> to just provide different method implementation depending on template specialization type of the DataColumn, keeping things generic & compact as much as possible.

5. Client Code
After having explained the whole story, let’s see how we could use our Table class in order to store some data.

//templated_inheritance.cpp
#include "Table.h"

int main(){
    //Table with wstring Value Column
    Table stringColumn(L"WStringValueColumn");
    stringColumn.add<int>({1,2,3,4});
    stringColumn.add<double>({1.1,2.2,3.1,4.1});
    stringColumn.add<std::wstring>({L"Tzertzelos Blog", L"serving",L"the underground",L"since 2013!!"});

    Object o;
    o.t = Object::ObjectType::STRING;
    o.str = L"Cult of Luna";
    stringColumn.addValue(&o);
    stringColumn.dump();

    //Table with double Value Column
    Table numericColumn(L"DoubleValueColumn");
    numericColumn.add<int>({5,6,7,8});
    numericColumn.add<double>({1.12,2.22,3.12,4.12});
    numericColumn.add<std::wstring>({L"Tzertzelos", L"B",L"LO",L"G"});

    Object o2;
    o2.t = Object::ObjectType::DOUBLE;
    o2.d = 1.3334;
    numericColumn.addValue(&o2);

    o2.d = 1.3335;
    numericColumn.addValue(&o2);
    numericColumn.dump();


    Table nullTable(L"Null");
    nullTable.dump();
    return 0;
}


This client code instantiates 3 tables, the first table’s special Value Column has wstring data type, the second instance’s Value Column has double data type & the last table is an empty table.

6. Compile & Run
Download the code from here on your Linux box, then just type “make all” in order to compile the code. Then, the out executable will be built. Run it by typing “./out“.


The output will be:

gclkaze@gclkaze-VirtualBox:~/Desktop/Projects/C++/TemplatedInheritance$ ./out
Table WStringValueColumn => 4 X 4
Cells =>13
Column #1
* 1
* 2
* 3
* 4

Column #2
* 1.1
* 2.2
* 3.1
* 4.1

Column #3
* Tzertzelos Blog
* serving
* the underground
* since 2013!!

Column #4
* Cult of Luna

Table DoubleValueColumn => 4 X 4
Cells =>14
Column #1
* 5
* 6
* 7
* 8

Column #2
* 1.12
* 2.22
* 3.12
* 4.12

Column #3
* Tzertzelos
* B
* LO
* G

Column #4
* 1.3334
* 1.3335

Table Null => 0 X Cells =>0

7. Things learned
With the previous exercise, we got an idea about how to:

  • use class templates
  • use member method template specialization
  • use unique pointers
  • use the Factory Design Pattern
  • combine polymorphism with templates.

Enjoy & drop a comment/question/suggestion for improvement if any!
Cheers,
Kazeone

6. References

I. Generic Data Table source in GitHub

II. Template specialization

III. Factory Design Pattern

IV. std::unique_ptr

V. RAII

VI. C++ STL

Advertisements

Project Mono: C++to C# pointer passing in Linux

Hey there,

today’s assignment is to pass C++ object pointers to a C# dll and back, while running in Linux. What? Linux? Yes! You read it right! By using the Mono runtime, you are able to use C# libraries from a C++ program, passing objects & callbacks, setting & retrieving data and so on. The only thing you need is embedding Mono (its logo is shown above and it depicts a gorilla,yeap), C++ and your C# dll, and that is all. Mono will allow you to control the way that your C++ program will interact with the dll, by calling different exposed dll functions from your C++ program.

Why to exchange data between C++ and C#?

Software reusability could be one reason. Imagine that you have a C# application that implements a basic GUI and an engine, which is responsible for all calculations that are performed by the program, based on the user input. At some point, you figure that your application’s performance could be better and that some of your calculations could be optimized. After optimizing your C# code, you decide that your program could be more performant if it is implemented in a more lower level language, such as C and C++. Although, you wouldn’t like to implement both the GUI and the engine subsystems from scratch. Indeed, the GUI could be reused while the engine could be reimplemented in C++. You could wrap all your GUI operations in a DLL, implement the engine in C++, glue Mono between them in order to exchange information and you are ready to go!

The availability of a specific library could be very limited to only some platforms/languages and that platform (C#, Windows) is not your project’s native (C++,Linux). Instead of reinventing the whole functionality in C++ & Linux, why not just fetch the C# dll, wrap it up in another C# dll (that you will write) and then expose all the API/exported functions to your C++ programs/environment? This is a way better solution, allowing you to:

  • reuse code,
  • treat the C# dll functionalities as a black box,
  • limits the amount of new code to just code that sets up Mono & performs calls through Mono.

Assignment Description

Create a C++ program in Linux that passes a C++ object pointer (Container*) to a C# method, which fills up the object with data that were produced in the C# side. So basically, you will need something like this:

Untitled Diagram (1)

One nice question that could come in mind is, why is there a shared library (*.so) and not just one executable that calls Mono-runtime and the DLL, why do we need the shared library in the middle? Because we decided that only the shared library implements the actual store data function, which also is exposed to the DLL. After calculating the data, the DLL calls the shared library’s exported store function, by passing the pointer that was received during the initial Mono call from the shared library to the DLL. The C# side does not know the actual data structure implementation of the container that was passed to it, actually it doesn’t know that an actual object pointer was passed into it at the first place. Why is that? Because we decided to pass the pointer as an unsigned int of 64 bits (uint64_t)! The C# side just needs to call the exposed stored data function, pass the pointer-integer and the data into the C++ function. The exposed C++ static function will convert the integer back to pointer (Container*), add the data in the container and return. After the initial call of the C++ shared library function to the C# dll, the container object in the C++ side is filled up with the data that the C# side just generated. Sounds interesting & challenging enough? Let’s see how it can be done!

Requirements

Windows

Linux

  • C++ 11
  • Mono runtime: apt-get install mono-devel
  • Makefile

 

Demo Time

1. The DLL (C# .Net side)

Our DLL contains one class, the class Program that contains three functions:

  • superCalculation, the actual functionality that we would like to reuse from C++ through Mono to C#
  • storeResults, the function that is exposed through our C++ shared library and knows how to restore the actual C++ object pointer and store the data that we received from a call to the superCalculation method
  • process, the entry point method that is called from the C++ side and triggers the calculation and the data storing scheme
using System;
using System.Collections.Generic;
using RGiesecke.DllExport;
using System.Runtime.InteropServices;

namespace TestDLL
{
    class Program
    {
        public static List<int> superCalculation()
        {
            //Actual name of the function should be: "dateNowToIntegerList"

            List<int> result = new List<int>();
            string dateString = DateTime.Now.ToString();
            for(int i = 0;i<dateString.Length;i++)
            {
                if (Char.IsNumber(dateString[i]) == false) continue;
                result.Add(Int32.Parse( dateString[i].ToString() ));
            }

            return result;
        }

        [DllImport("./Kazelib.so", EntryPoint = "storeResults")]
        public extern static void storeResults(System.UInt64 container, int value);
        
        [DllExport("process", CallingConvention = CallingConvention.Cdecl)]
        public static void process(System.UInt64 container)
        {
            List<int> results = superCalculation();
            for (int i = 0; i < results.Count; i++)
            {
                storeResults(container, results[i]);
            }
        }
    }
}

 

By using the attribute DllImport, you allow the C# compiler to know that at some point, a storeResults method implementation will be available to the DLL from somewhere outside the DLL. By setting the name of the shared library and the entrypoint string, you map the method implementation & a method name symbol  with an external library, allowing the compiler to know where exactly to find the implementation of the storeResults method, that is inside the shared library “Kazelib.so”, in the entry point method “storeResults” of the shared library.

 On the other hand, by using the attribute DLLExport before the declaration of the method process, you inform the compiler that this method should be publicly exposed to the outside world, allowing other programs that load the TestDLL to call the process method from the outside. The attribute DLLExport is available through the RGiesecke module that we fetched earlier through NuGet.

After creating a new project in Visual Studio that contains the code above, build it. You will get a nicely packed dll. Now copy the dll to your Linux box, in a new directory, where our Mono project will live.

2. The shared library (C++ Linux)

Our shared library consists of one header file (Kazelib.h) and one implementation file (Kazelib.cpp).

2.1 The header (Kazelib.h)

The header contains the implementation of the Container class, which is the object that will store the results that we retrieved by calling Mono and C#.

#include <vector>
#include <cstdint>
#include <iostream>

class Container {
    private:
        std::vector<int> m_Values;
    public:
        Container(){}
        
        void storeValue(int value)
        {
        	m_Values.push_back(value); 
        }

        void dump()
        {
        	for(auto &i : m_Values)
        	{
        		std::cout << i << std::endl;
        	}
        }
};

//Entry point for C++
extern "C" void process(void);

//Used by C# to store the results
extern "C" void storeResults(uint64_t container,int value);

The header exposes two functions, the process and the storeResults public functions. The first will be exposed to our client program to call; the user program after loading Kazelib.so, will call the function process in order to trigger the whole function call sequence. The storeResults function is exposed to the C# universe, allowing us to call it from the C# side. As parameters, storeResults will accept the container object pointer-integer and an integer value that will be stored in the instance of the container object.

2.2 The implementation (Kazelib.cpp)

The logic written in the implementation file knows how the Container object looks like (it includes Kazelib.h) and knows how to set up Mono and invoke the exposed C# method in order to fill the Container object. Kazelib’s entry point is the function process, which is publicly exposed for anyone that would like to use Kazelib.so. A call to the function will call the private function processThroughMono that will perform the following steps:

  • initialize Mono runtime by loading our dll by name
  • load the appropriate assembly from the dll
  • traverse through the symbols of the assembly by scope, in order to find the MonoMethod object process located in namespace <TestDLL>, class <program>, method name <process> ( the “TestDLL.Program:process” string)
  • instantiate a Container object
  • reinterpret cast it to 64-bit unsigned integer
  • fill the container as void* argument that will be passed to the C# side
  • call process through Mono
  • close Mono runtime

Tip: during a lifetime of a program, the function mono_jit_init should be called only once, this means that you cannot turn Mono on and turn off and then turn on etc. It is allowed only once, meaning that a second call in the same process will fail.

#include "Kazelib.h"
#include <mono/jit/jit.h>
#include <mono/metadata/assembly.h>
#include <assert.h>
#include <mono/metadata/debug-helpers.h>

static void processThroughMono(std::string& dll)
{    
    //Initialize mono runtime
    MonoDomain* domain = mono_jit_init (dll.c_str());
    MonoAssembly *assembly;

    assembly = mono_domain_assembly_open (domain, dll.c_str());
    assert(assembly);

    MonoImage *image =  mono_assembly_get_image  (assembly);
    assert(image);
    MonoMethodDesc* desc = mono_method_desc_new ("TestDLL.Program:process", true);
    assert(desc);

    MonoMethod* method = mono_method_desc_search_in_image (desc, image);
    assert(method);

    //Create our container
    Container* c = new Container();

    //Cast to uint64
    uint64_t ptr = reinterpret_cast<uint64_t>(c);

    //Fill it as an argument before the mono method invokation
    void* args[1];
    args[0] = &ptr;

    //Invoke C# code
    mono_runtime_invoke (method, nullptr, args, nullptr);

    //Clean mono runtime
    mono_jit_cleanup (domain);

    //We did it!
    c->dump();

}

void process()
{
    std::cout << "process()!!" << std::endl;
    std::string dll("test.dll");
    processThroughMono(dll);
}

void storeResults(uint64_t container,int value)
{
    Container* c = reinterpret_cast<Container*>(container);
    c->storeValue(value);
}

The implementation of the exposed function to the C# universe (storeResults) casts the integer back to an object and then stores the integer value that was passed to the object. The DLL will call it as many times it needs to fill all integer results from the superCalculation C# call.

3. C++ Client Code (main.cpp)

The client code loads (dlopen & dlsym) the shared library by name (process) and then triggers the whole function call sequence by calling the shared library’s process function. Then it unloads (dlclose) the shared library and returns.

#include <iostream>
#include <assert.h>
#include <dlfcn.h>

bool process(std::string &lib, std::string &function)
{
    void *handle;
    void (*process)(void);
    char *error;

    handle = dlopen (lib.c_str(), RTLD_LAZY);

    if (!handle) {
        fputs (dlerror(), stderr);
        return false;
    }

    process = (void (*)(void)) dlsym(handle, function.c_str());
    if ((error = dlerror()) != nullptr)  {
        fputs(error, stderr);
        return false;
    }
    process();
    
    dlclose(handle);
    return true;
}

int main(int argc, char** argv)
{
    std::string lib("./Kazelib.so");
    std::string function("process");
    assert ( process(lib, function) );
}

4. Compile (Makefile)

The Makefile consists of the following commands:

all:
g++ -std=c++11 -Wall -fPIC -O2 -c Kazelib.cpp Kazelib.h  `pkg-config –cflags –libs mono-2`
g++ -std=c++11 -shared -o Kazelib.so Kazelib.o   `pkg-config –cflags –libs mono-2`
g++ -std=c++11 main.cpp -ldl

The “-ldl” switch is used in order to link our executable against the dynamic linking library, allowing the compiler to find the symbols of dlopen, dlsym and dlclose. The `pkg-config –cflags –libs mono-2` part of the command allows us to find & link against lib Mono’s runtime, allowing us to use any Mono symbol.

5. Run

gclkaze@tzertzelos:~/Desktop/Projects/Tzertzelos/Mono$ make all
g++ -std=c++11 -Wall -fPIC -O2 -c Kazelib.cpp Kazelib.h  `pkg-config –cflags –libs mono-2`
g++ -std=c++11 -shared -o Kazelib.so Kazelib.o   `pkg-config –cflags –libs mono-2`
g++ -std=c++11 main.cpp -ldl
gclkaze@tzertzelos:~/Desktop/Projects/Tzertzelos/Mono$ ./a.out
process()!!
5
1
5
2
0
1
6
1
1
0
7
2
5
gclkaze@tzertzelos:~/Desktop/Projects/Tzertzelos/Mono$

 

Yes, we did it! We were able to pass our new container C++ object through Mono to C#, fill it with data & print its contents in the C++ side by calling the dumps method call. If you may have noticed, the actual “super calculation” performed in the C# side consist of the conversion of the current date time during execution into an integer list, thus, every time you execute the client program, the Container object will be filled with different numbers. It seems that the previous screenshot was taken at exactly 15th of May 2016,11:07:23.

Summary of what we have learned

I hope i gave you an idea about the following:

  • what is Mono & how can we use it in Linux
  • load shared library and call a function from it
  • compile shared library in Linux
  • the DLLImport & DLLExport attributes
  • cool usage of reinterpret cast
  • export functions from C++ to C#
  • export functions from C# to C++
  • how to pass pointers from C++ to C# and back to C++

In the Linx section below, you will find a github link that points to the actual repo that contains all source files that were shown during this demo.

As you read, this post was quite colossal in terms of information and tried to keep it short as much as possible. Based upon all of this information, i could easily think and write about at least 5 articles for different subjects encountered in this post. I tried to focus in the high-level goal and its sequence of high-level steps and not in the lower-level ones, such as detailed information about “MonoImage” and “MonoDesc” objects. I am sure you will find what you need by clicking the attached links.

Now, go & reuse some code by using Mono in Linux!

Kaze

Linx

Send email from gmail with Python

Hey,

today’s assignment is to send a text email to an arbitrary recipient by using an existing gmail account. We will send the email by using the blog’s favorite scripting language, Python. For you who wonder if we could attach an image or an arbitrary file to an email & send it with the main content, yes, we can, but in this article i would like to focus simply on text emails, without attachments.

Requirements

  • An existing gmail account
  • Python’s S(imple)M(ail)T(ransfer)P(rotocol)lib module. smtplib is already installed with Python, thus you don’t need to “pip search” & “pip install” it.

How

With the following script, we can send an email to any number of recipients (“tom@gmail.com”) through our existing gmail account (“kaze@gmail.com”). We need to build our email by filling the email content string (the variable message), then logon to the gmail server through the server’s smtp daemon, send the message & finally close connection with the smtp daemon of the gmail server. We use the smtp protocol because in the gmail case, gmail’s servers (to which we need to login in order to send the email) have attached & running an smpt daemon-service, allowing them to “listen” to smtp email requests.

def send_email(user, pwd, recipient, subject, body):
   import smtplib
     gmail_user = user
     gmail_pwd = pwd
     FROM = user
     TO = recipient if type(recipient) is list else [recipient]
     SUBJECT = subject
     TEXT = body     # Prepare actual message
     message = “”“\From:%sTo:%s\nContent-type:text/html\n
                 MIME-Version:1.0\n
                 Subject:%s\n\n%s”“” % (FROM, “, “.join(TO), SUBJECT, TEXT)
     #print message
    try:
         server = smtplib.SMTP_SSL(“smtp.gmail.com”)
         server.login(gmail_user, gmail_pwd)
         server.sendmail(FROM, TO, message)
         server.close()
         #print ‘successfully sent the mail’
        return True
     except Exception as e:
         print repr(e)
           #print “failed to send mail”
         return Falseif __name__ == ‘__main__’:
    send_email(‘kaze@gmail.com’,
    ‘TzertzelosCrew.666.’,
    [‘tom@gmail.com’],
    ‘Not a spam!’,
    ‘You listen,i code!’)

 

Gmail Login Failure

In case the login() call fails with the following error

Please log in via your web browser and then try again

visit: https://www.google.com/settings/security/lesssecureapps

The page says: “Some apps and devices use less secure sign-in technology, which makes your account more vulnerable. You can turn off access for these apps, which we recommend, or turn on access if you want to use them despite the risks”

and click on “Turn On” (if you want Google to allow you to send the email through your app/script).

At the moment, i have no clue how to send an email without turning that account feature to “Off“. Consider this as an assignment for yourself 🙂

Why wanting to send emails?

Because you would like to get notified by one of your programs in case of

  • an abnormal behavior,
  • a task completion,
  • a data change,

in general, in case specific conditions have been met inside your code & something interesting just happened, while you are away from the console that executes your precious script. By using smtplib through the previous function, you could just fill the subject & content depending the occasion inside your code.

Now, go & spam yourselves.

kazeone

 

Code Generation with C preprocessor

In this article, i will present you one quite neat technique which could allow someone to utilize C preprocessor‘s power in order to dynamically generate content for his data structures based on the content of a generated text file.

Before showing & explaining the demo code, it would be nice first to share some background information regarding the technique & its objectives. The latest project that we were assigned at work is the implementation of a compiler, which will compile a text file containing source code of that specific language. I won’t dive into detail about the language’s syntax, all i could say that it is an imperative language, such as C.

In order to have a compiler, the first thing you will need to do is to be able to read characters from an input stream & parse that input. The input text should conform according to the rules of the grammar of that specific language. In order to write his own grammar for her language, a compiler maker could use a combination of Lex/Bison/Yacc or Antlr4. In our case, we selected Antlr4, a quite modern, fast & neat parser generator. Antlr accepts as input an Antlr4 grammar file & generates code that could parse input text that conforms to the grammar that we specified in our grammar file. The generated code’s programming language could vary based on your preference, Antlr4 can spit a lexer, a parser, a listener and a visitor class files in C#, Python, Java & Javascript. In addition, it generates a token file, containing all different tokens that are accepted by our grammar’s rules. A token is a word that consists of one or more characters that is accepted by our parser. For instance, if in our grammar, a multiplication rule exists such as the following:

MULTIPLICATION: ‘*’;

the token file will contain the following line:

MULTIPLICATION = 1

which means that the token ‘*’ has been assigned the rule label ‘MULTIPLICATION’ that has token id equal to 1, because our grammar consist of just one rule, the multiplication rule. If we decide later to extend our grammar with an additional rule, let’s say the division rule, we will add the new rule

DIVISION: ‘/’;

and rerun Antlr. The generated token file will now contain two tokens,

MULTIPLICATION = 1
DIVISION = 2

Each time we extend our language with new rules, new tokens are generated. We will need a way to handle dynamically generated tokens for our language; if someone could remove a rule from your grammar by mistake, the syntactic analysis part won’t be in synchrony with the semantic analysis part anymore, and this will be not good at all.

Imagine now that you would like to encapsulate all different tokens in an enum class, which will be named TokenType, which will be a private member of our Token class. You would like this enum class to be populated with all different tokens, each time Antlr parses the grammar file & generates a token file. The enum class will contain exactly the tokens contained in the token file. How someone could do that, without parsing anything in just a few lines of C code? Well, by using the C preprocessor of course!

Demo

1. Preparation – Convert the token file

We reconsidered our small grammar containing only a multiplication & a division rule, and we added some additional rules and we concluded with the following rule set of our demo grammar file (it is not valid Antlr4 code, just demo).

ADDITION : ‘+’;
DEDUCTION : ‘-‘;
MULTIPLICATION : ‘*’;
DIVISION : ‘/’;
LEFT_PARENTHESIS : ‘(‘;
RIGHT_PARENTHESIS : ‘)’;
INCREMENT : ‘++’;
DECREMENT : ‘–‘;
POINTER_ARROW : ‘->’;

From this grammar file, Antlr generated the following token file

ADDITION = 1
DEDUCTION = 2
MULTIPLICATION = 3
DIVISION = 4
LEFT_PARENTHESIS = 5
RIGHT_PARENTHESIS = 6
INCREMENT = 7
DECREMENT = 8
POINTER_ARROW = 9

In order to make our technique to work, we will need to convert our token file in the following format:

TOKEN(ADDITION,1)
TOKEN(DEDUCTION,2)
TOKEN(MULTIPLICATION,3)
TOKEN(DIVISION,4)
TOKEN(LEFT_PARENTHESIS,5)
TOKEN(RIGHT_PARENTHESIS,6)
TOKEN(INCREMENT,7)
TOKEN(DECREMENT,8)
TOKEN(POINTER_ARROW,9)

You will see later why we needed to perform this conversion. The converted token file will be named tokens.inc

2. Populate the token types

Our Token class contains the enum class NodeType, which enumerates all detected token by our parser. This is possible, by including the tokens.inc file which is interpreted by the C preprocessor as a #define function style definition. For each TOKEN(a,b) statement of the tokens.inc file, a new enumeration is added to our enum class NodeType; each label (a) of the enum is initialized by the value of each token (b). In the end, we discard the TOKEN definition. Of course, the label of the definition could be whatever;i just used the word TOKEN for convenience.

#include <iostream>

class Token  {
public:
    enum class TokenType  : int {
    	#ifndef TOKEN
    	    #define TOKEN(name,value) name = value,
    	    #include "tokens.inc"
        #endif
        LAST
    };
private:
    TokenType m_Type;
public:
    Token(TokenType type) : m_Type(type){}
    int getNumberOfTokenTypes() {return static_cast<int>(TokenType::LAST) - 1;}
};

int main(int argc, char**argv)
{
    Token token(Token::TokenType::MULTIPLICATION);
    std::cout << token.getNumberOfTokenTypes() << std::endl;
}

Another nice thing that i wanted to show, is the LAST token type that i have added after the “initialization list” of the token types. By adding that last type, we know the amount of different available token types. The cpp file instantiates a multiplication node and prints the amount of available types (Toke::TokenType::LAST – 1). Consider that if the token type MULTIPLICATION for some reason is not included in the tokens.inc file, the cpp program won’t compile.

In our case & by using the previous grammar, by compiling (g++ -std=c++11) & running the cpp executable, we get:

gclkaze@tzertzelos:~/Tzertzelos/Scriptz/C++/PreprocessorPower$ ./a.out
9

If i remove the last rule (POINTER_ARROW) & run the cpp program, we will get:

gclkaze@tzertzelos:~/Tzertzelos/Scriptz/C++/PreprocessorPower$ ./a.out
8

The amount of available types changes based on the content of the tokens.inc file, that is generated based on the most recent Anltr4-powered grammar.

I found it quite powerful to be able to generate code content in such an easy way, with just using a conditional include statement and a define preprocessor directive & just utilizing the C preprocessor, such a neat & concise solution.

Hope you enjoyed & learned a thing or two,

if you already knew, even better,

kazeone

Links

Antlr4ANother Tool for Language Recognition

Yacc : Yet Another Compiler-Compiler

Bison : The YACC-compatible Parser Generator

Lex : A Lexical Analyzer Generator

C preprocessor

enum class (C++11)

Syntactic analysis(Compiler Construction)

Semantic Analysis (Compiler Construction)

Music

Cult Of Luna – Salvation

Color clustering – Part #1: Clustering by color relevance

Hi all,

in a previous post, i promised you to post an article about color clustering & gif creation. After around half a year, i decided to refactor a bit the code & tell more about it. Although, i wouldn’t like to tire you with a huge article about the whole process. Thus, i have divided it in three parts which consist the Color Clustering Epic.

The parts are the following:

  • Part 1: Color Clustering by color relevance
  • Part 2: Image generation by color aggregation
  • Part 3: Gif generation

Let’s dive into it!

Color Clustering by color relevance

But what exactly do i call Color Clustering? The way that someone could group relevant colors by examining their actual pixel populations in a target image and their rgb relevance, where a pixel’s color is specified by the values of red,green & blue ( 0 <= value <= 255). The rgb relevance of a color with another can be found by:

  • finding the Euclidean distance between the rgb components of one color with another
  • compare the distance with one fixed offset number in order to determine if the colors are relevant.

If the offset number is quite small, then the set of the groups will be quite big. For instance, consider the following image that is quite “monolithic” color-wise

27118-2-1353337757

This image consists of majorly red, black, white & some blue. Let’s take two “almost” red pixels; one from the center of the image, and one to the upper right. We decide upon a small offset equal to 4. Well, the first pixel (p) has values: r=220, g=8,b=10 and the second one (q): r=240,g=40,b=10 . By applying the Euclidean distance

849f040fd10bb86f7c85eb0bbe3566a4

their distance is ~37, and 37 < 4 (offset value). Although, both pixels are mainly red, by picking a small offset, we decide that these two red pixels are irrelevant, when they should be. Thus, we should augment the value of our offset in order to capture relevant colors but without overdoing it. Otherwise, if we pick a quite big offset (as 1000 for example), it is possible that all colors will be marked as relevant of just the first color encountered in the image. In my example. i have assigned to the offset the value 100.

This was the basic idea and what you need to parse in order to understand what i am showing in the follow-up of this article.

Setup

The code has been developed on Ubuntu Linux, and the dependencies are the following

  • Python 2.7
  • PIL Python module for reading & writing pixel values from/to image

Code

1. Read image file

def image_pixels(file):

    img = Image.open(file)

    pixels = img.load() # this is not a list, nor is it list()’able
    width, height = img.size

    all_pixels = []
    for x in range(width):
        for y in range(height):
            cpixel = pixels[x, y]
            all_pixels.append(cpixel)
    return all_pixels,img.size

We use PIL’s Image module in order to read the pixel value tuples & store them in a list.

2. Build color table

def color_counter(pixels):
    T = {}
    colors = 0
    total = 0
   for i in xrange(0,len(pixels)):
        if len(pixels[i]) == 3:
            r,g,b = pixels[i]
        else:
            r,g,b,a = pixels[i];print pixels[i]#;exit(1)
          key = ‘%s,%s,%s’ % (r,g,b)
        if key in T:
            T[key] += 1
            colors += 1
        else:
            T[key] = 1
            total += 1
    print ‘Different:’,colors ,‘Amount of pixels:’,total
     assert(len(pixels) == total)
    return T,colors,total

By having the pixel tuple list, we can build a color table, a dictionary that has as keys all unique colors encountered in the image and as values the population of pixels of each associated color.

3. Color Clustering

By having obtained the color table, we know all colors that are contained in the target image and also the amount of pixels of each color. We are ready to perform the euclidean distance to each color contained in the color table and group all colors in color clusters.

def cluster_relevant_colors(table,offset,img_name,size):
    substitute = copy.deepcopy(table)
    keys = table.keys()
    deleted = 0
    
    # Holds  key-[value] => : ‘color’=> [absorbed color1,absorbed color2,absorbed color3]. 
    # So we know which color was absorbed by another one
absorbed = {} 
  for i in xrange(0,len(keys)):
        key = keys[i]
        if key not in substitute:
            continue
        r1,g1,b1 = [int(c) for c in key.split(‘,’)]
        removed = 0
        on_absorb = False
       for j in xrange(i+1,len(keys)):
             key2 = keys[j]
           if key not in substitute:
               break
           if key2 not in substitute:
               continue
             #Find euclidean distance between these two colors, 
             #to check if they are relevant with respect to their rgb components
             r2,g2,b2 = [int(c) for c in key2.split(‘,’)]
             dr,dg,db = abs(r1  r2), abs(g1  g2), abs(b1  b2)
             ediff = sqrt ( (dr*dr) + (dg*dg) + (db*db) )
           if ediff <= offset:
                on_absorb = True            
                assert (ediff <= offset)
                #Transfer pixel populations + remove the weak color 
               #from the table (it has been absorbed)
               if substitute[key] >= substitute[key2]:
                     absorbed = transfer_absorbed_colors(key2,key,absorbed)
                     substitute[key] += substitute[key2]
                     del substitute[key2]
                elif  substitute[key] < substitute[key2]:
                     absorbed = transfer_absorbed_colors(key,key2,absorbed)
                     substitute[key2] += substitute[key]
                     del substitute[key]
                   break
                else:
                    assert(None)       
                  removed += 1
                  deleted += 1
         #Current color wasn’t absorbed by any color
         if on_absorb == False:
            if key not in absorbed:
                absorbed[key] = []    keyssubstitute = substitute.keys()
    keysabsorbed = absorbed.keys()
    #Output the absorbed table & the remaining colors
    json_out(absorbed, name = get_filename ( img_name,‘json’) )
    json_out(substitute, name = get_filename ( img_name,‘clusterjson’) )

 

The first step is to create a copy of the original color table, in which we will operate (substitute). Then, for each color, we iterate all color keys & calculate the euclidean distance between the current color with color value key and the encountered color during the iteration with color value key2. If the distance is smaller or equal than the provided offset,  one of the colors is absorbed by another. In order to find that, we need to check the populations of each color in our table. We will assume that the color that has larger pixel population will be the dominant color. The dominant color will absorb the second color and its population. In the end, the substitute color table will contain all dominant colors that are quite different with each other as keys, and as values, the total amount of pixels for each color along with the absorbed color table, containing all dominant colors along with the colors that were absorbed by them.

{
    “252,156,105”: 34951,
    “50,68,88”: 92838,
    “255,30,0”: 187878,
    “171,44,131”: 3198,
    “0,4,0”: 339535,
    “100,126,175”: 505,
    “104,9,0”: 136452,
    “156,196,248”: 20,
    “179,113,17”: 34,
    “255,214,190”: 4588,
    “255,86,177”: 1 
}

This json depicts the contents of the substitute dict, that contains all dominant colors along with their populations. As we can see, the majority of the image’s pixels are black (0,4,0: 339.535 pixels), bordeaux (104,9,0:136.452 pixels) and red (255,30,0:187.878 pixels) as expected. All variations of the dominant colors have been absorbed drastically.

The contents absorbed & substitute tables will be used in the next phase/part of the epic. For now, i won’t provide any code until the last & final chapter of the Color Clustering epic.

Although, i will provide some gifs in order to show you what you could generate after the end of the Color Clustering epic. Of course, i do not own any of the initial artwork that was used to generate the gifs.

Hope you enjoyed,

kazeone

 

Appetizers

baroness_blue

Baroness – Blue Record

 

pelican_australasia_aggr

Pelican – Australasia

vol_4_aggr

Black Sabbath – Vol. 4

om_agios_aggr

Om – Advaitic Songs

( We do not own any of the artwork that was used to generate the presented gifs.)

C/C++ function call from Python

Hey there,

in a previous post, i explained how someone could set up things in a C++ class, in order to load a Python module & call a function through that class. Today, i am going to demonstrate the exact opposite! How someone could actually call a C function from a Python script? This is the objective of the current mission!

Setup

The code has been developed on Ubuntu Linux, and the dependencies are the following

  • C / C++ 11
  • Python 2.7

In Windows, instead of building shared library (*.so) files, you will build DLLs.

Preparation

1. Pack all functions in a shared library (*.so)

In our hypothetical scenario, we would like to use  a version of Libc‘s fgetws() where the amount of characters of the buffer to where the input data will be inserted from the keyboard/standard input could vary. Our function should be contained in a shared library that could be loaded from the Python program dynamically, and then call our kgets function that resides inside the library.

1.1 The header file klib.h

extern "C" wchar_t * kgets(int);

1.2 The implementation file klib.cpp

#include <string>
#include "klib.h"

wchar_t * kgets(int size)
{
    wchar_t *b = new wchar_t[size];
    return fgetws(b,size,stdin);
}

Notice that in the declaration of kgets in klib.h, by adding extern “C”, we declare that kgets could be linked from outside, allowing a client C program to call it. By seeing that declaration, the compiler won’t mangle the exported function’s name, allowing the client program to link with it; the function name will be available to the linker.

We compile them through a Makefile:

all:
        g++ -std=c++11 -Wall -fPIC -O2 -c klib.cpp klib.h
        g++ -std=c++11 -shared -o klib.so klib.o
clean:
        rm -f *.so
        rm -f *.o

 

And then we type:

gclkaze@tzertzelos:~/Tzertzelos/Scriptz/PythonC++$ make all
g++ -std=c++11 -Wall -fPIC -O2 -c klib.cpp klib.h
g++ -std=c++11 -shared -o klib.so klib.o
gclkaze@tzertzelos:~/Tzertzelos/Scriptz/PythonC++$ 

 

And our klib.so containing our almighty kgets is ready to be loaded!

2. Load the shared library through Python & call the function

Our Python script uses the ctypes Python module which actually performs the whole thing. It provides library loading & data type conversion utilities, allowing us to first to describe the function that resides inside the shared library, then load it & finally call it. By describing what (exactly) the function call should take as arguments & a return value data type, ctypes ensures to setup things in order to call the function. Of course the given  description/prototype of the function in the Python side, should match one-to-one with the appropriate exported C function that resides in the C side. Otherwise, the function call will fail & our python script will get an exception before calling the exported C function.

Let’s see the client.py script:

 # -*- coding: utf-8 -*-
import ctypes as c
#Load the shared library
klib = c.cdll.LoadLibrary(“./klib.so”)
#Load function
kgets = klib.kgets#Prepare return value & argument
kgets.restype = c.c_wchar_p
kgets.argtypes = [c.c_int]#Get 20 wide string characters from the 
#standard input. Skip new line in the end (‘\n’)
res = kgets(20)[:-1]
print ‘kgets says:’,res

 

(*The call to ctypes.cdll.LoadLibrary is used for shared library files only, not DLLs.)

Execute it:

gclkaze@tzertzelos:~/Tzertzelos/Scriptz/PythonC++$ python client.py 
Tzertzelos Blog!
kgets says: Tzertzelos Blog!

It works!

We were able to:

  • compile fast C functions in a shared library (.so)
  • write a Python script, which by utilizing the power of the ctypes module was able to load our library & call one exported C function

Now go gain some high performance, by extending your C libraries & use Python to wrap them up in shared libraries.

Cheers,

kazeone

 

Linx

Static, Shared Dynamic and Loadable Linux Libraries

Shared Libraries

ctypes — A foreign function library for Python

 

Musix

Oranssi Pazuzu – Värähtelijä

Redis data structure change notifications

Hi all,

everyone knows Redis, right? Redis is an open source (BSD licensed) data structure server with extended pub/sub features, allowing applications to subscribe in Redis channels & be able to be notified when a channel update occurs, while other apps (in the other side) generate the updates, publishing their update in particular Redis channels.

Although, in this post, i am already assuming that you are familiar with Redis and already know how to code a script that publishes to/subscribes in a channel, & you already know how to manipulate some of Redis data structures, such as the Hashes, the Sets & the List.

We are interested in receiving data structure updates in order to know that something has changed & if our app is interested in that particular change, then it should do something useful with it. In addition, we would like our app to be interested in only one specific data structure of redis, one specific structure that holds data of interest for our app.

Redis supports pubsub for channels but, can a data structure has its own channel? Yes, by using Redis Keyspace Notifications, from version 2.8.0. Thus, when a data structure update change occurs, a Redis event is generated & propagated through the channel. Your app just needs to listen to that channel in order to be notified about the event.

The name of that particular channel is formed as : __keyspace@0__:" "data structure name"Thus, if your app is interested in that specific list named “registeredArtists“, then your app should subscribe & listen to the channel __keyspace@0__:registeredArtists.

Demo

Requires:

  • linux
  • redis
  • Python
  • python-pip redis module

1.1 Client

In order to show you the functionality, i prepared a small Python demo for it:

from redis import StrictRedis
from time import sleep
from sys import argv  def show_msg(msg):
 print ‘Latest List Operation =>’,msg

 def listen_for_data_structure_operations(dt_key,host=‘localhost’):
     redis = StrictRedis(host=host)
     s = redis.pubsub()
     s.subscribe(**{ ( ‘__keyspace@0__:%s’ % dt_key) :show_msg})

    while True:
         msg = s.get_message()
       if msg : print msg
         sleep(1)

if __name__ == ‘__main__’:
    if len(argv) > 1:
          key = argv[1]
    else:
          key = ‘alist’
      listen_for_data_structure_operations(key)

 

If you run the code with argument the name of a redis data structure, your script will subscribe to the channel of changes/operations applying in that particular data structure. The data structure does not necessarily need to exist in order to subscribe to it.

 gclkaze@tzertzelos:~/Tzertzelos/Redis$ python dtUpdateTest.py registeredArtists
{‘pattern’: None, ‘type’: ‘subscribe’, ‘channel’: ‘__keyspace@0__:registeredArtists’, ‘data’: 1L}

1.2 Server

Now, use the terminal to connect to redis, and

gclkaze@tzertzelos:~/Tzertzelos/Scriptz/Redis$ redis-cli
127.0.0.1:6379> LPUSH registeredArtists “Motorhead”
(integer) 1

OK, our new list named registeredArtists has length equal to 1, and its only element is the string Motorhead.

1.3 Client-side update

The notification generated after the LPUSH in the server should be visible in the client side!

gclkaze@tzertzelos:~/Tzertzelos/Scriptz/Redis$ python dtUpdateTest.py registeredArtists
{‘pattern’: None, ‘type’: ‘subscribe’, ‘channel’: ‘__keyspace@0__:registeredArtists’, ‘data’: 1L}

Latest List Operation =>  

{‘pattern’: None, ‘type’: ‘message’, ‘channel’: ‘__keyspace@0__:registeredArtists’, ‘data’: ‘lpush’}

Yea! Our client was able to capture the event generated by the LPUSH that applied to the list that resides in the server.

Redis’ pubsub feature really simplifies the implementation of a push mechanism between theserver & the app, that is actually a core feature of the system.

Now go & experiment with Redis!

kazeone

Linx

An introduction to Redis data types and abstractions

Redis Commands

Redis Keyspace Notifications

Musix

Truckfighters – Gravity X (2005)