MYSQL statement generation from object


The last one and a half year, my main project at work has to do with various sports data, different leagues, different teams, different matches, different countries and tons & tons of statistical information about teams & players that i’ve never heard before. Importing large amount of data of uknown structure every day to my super-dooper MYSQL database has become my daily routine. The process goes like that: fetch uknown data from a source through a SOAP call (or other means), examine the data structure, check if the data can be distributed to an existing database table, if not, create a table with the specified attributes, add some unique keys that will differentiate each row from the others aaaaaannnnndddd fill it with stuff!! Yeah! And pretty much this is it regarding data fetching. Although, one day, i got a request to fetch and store data that had to do with detailed team statistics, but they were soooooooo detailed that i almost cried after observing the received object through SOAP. The object was actually a huge data set, where all attributes couldn’t make any sense to me, because i am not a sports guy (OK i am, but not a friend of the most popular ones) and every value was a double number. We are talking about one object that had 200+ (?!!) attributes. So, i was thinking, OK, i am gonna store it and stuff, but someone needs to write the FUCKING CREATE TABLE/INSERT STATEMENTS HERE! OMG! This would take me 3 days, only to write & parse correctly the MYSQL queries, plus after the vast amount of MYSQL programming errors, i would turn blind. So i thought “Wait a minute! My friend here is gonna do it for me! I just need to describe him what i want to do!”.

MYSQL create table statement generation

How? Well, let’s start with the most hardcore one, the generation of a MYSQL create table, by having as input an object received through a SOAP call of a known & trusted source/web service. If it isn’t trusted, then through the following method, someone can trick you to generate table that holds weird information such as binaries, bank accounts, stolen passwords, monitored ips, whatever he wants, so let’s be precise to who is trusted and who is not. Let’s start by downloading the data and storing them in an object variable in our program. The following UML shows the steps i followed, in order to be able to generate a nice & safe MYSQL .

create_table


After ensuring that our attribute lists are filled with the correct keys, we pass the initial dictionary (our fetched object) along the not_null, index , auto_incr and primary_key lists to the MYSQL create table statement generation function to do the magic.

import datetime
import math
def generate_create_table_from_dict(d,db_name,db_table,autoincrement_id=True,index=None,primary_key=None,not_null=False):
    if not d:
        return 
    print ‘Generate create table from dict’
    #print d
    keys = d.keys()
    Q = ‘CREATE TABLE ‘ + db_name + ‘.’ + db_table + ‘ (‘
    if autoincrement_id == True:
        Q += ‘id INT NOT NULL AUTO_INCREMENT,’    for k in keys:
         if isinstance(d[k],float) == True:
             if not_null:
                 if k in not_null:
                     Q += k +‘ DOUBLE NOT NULL,’
                 else:
                     Q += k +‘ DOUBLE,’
         elif isinstance(d[k],int) == True:
             if not_null:
                 if k in not_null:
                     Q += k +‘ INT NOT NULL,’
                 else:
                     Q += k +‘ INT,’
         elif isinstance(d[k],string) == True:
             if not_null:
                 if k in not_null:
                     Q += k +‘ TEXT NOT NULL,’ # Disadvantage: why blob and not VARCHAR(k) -> save some space?
                 else:
                     Q += k +‘ TEXT,’
         elif isinstance(d[k],datetime.datetime) == True:
             if not_null:
                 if k in not_null:
                     Q += k +‘ DATETIME NOT NULL,’
                 else:
                     Q += k +‘ DATETIME,’
         elif isinstance(d[k],datetime.date) == True:
             if not_null:
                 if k in not_null:
                     Q += k +‘ DATE NOT NULL,’
                 else:
                     Q += k +‘ DATE,’    if primary_key:
        Q += ‘ PRIMARY KEY (‘+primary_key+‘)’
    if not index:
        Q += ‘);’
    else:
        Q += ‘,’
        Q += ‘UNIQUE KEY(‘+ ‘,’.join(index)+‘));’    #print Q
    return Q


The above script constructs a create table statement by taking into account the content of passed attributes list. Although, be careful before executing the sql query, check that everything looks fine, the unique key components are there and everything have been assigned the correct data type. Otherwise, you will suffer during the whole process and you will need to perform the whole sequence of steps again before trying to fetch & store your fresh data. In addition, the previous generation function should be supervized and executed only once! After all, your table has been created & set correctly in your database, right? We won’t call the previous call again in our code while fetching new data, it is not needed.

MYSQL insert statement generation
In order to generate the insert statement, we will need the list of the table’s attributes and an additional list of attributes which are actually the unique key or unique key components that are ensuring that each row is unique with respect to the key or combination of keys. The other row attributes can be updated normally, in case an object with the matched key properties is fetched through the SOAP call.
insert_stmnt

OK, we are set. Now let’s pass everything through the following code generation function!

import datetime
import math
def generate_insert_statement_from_dict(d,db_name,db_table,not_on_update=None):
    if not d:
        return 
    print ‘generate insert stmnt from dict’
    keys = d.keys()    Q = ‘INSERT INTO ‘ + db_name + ‘.’ + db_table + ‘ (‘
    PARAMS = ‘ VALUES ( ‘
    for i in xrange(0,len(keys)):
        if i == len(keys)  1:
            Q += keys[i]+‘)’
            PARAMS += ‘%s)’
        else:
            Q += keys[i]+‘,’
            PARAMS += ‘%s,’    if not not_on_update:
        return (Q+PARAMS),tuple(d[keys[i]] for i in xrange(0,len(keys)))
    else:
        ON_DUPLICATE = ” ON DUPLICATE KEY UPDATE “
        for i in xrange(0,len(keys)):
            if keys[i] in not_on_update:
                continue
            if i == len(keys)  1:
                ON_DUPLICATE += keys[i]+‘ = VALUES (‘+keys[i]+‘);’
            else:
                ON_DUPLICATE += keys[i]+‘ = VALUES (‘+keys[i]+‘),’     
         t = list(d[keys[i]] for i in xrange(0,len(keys)))    
        for i in xrange(0,len(t)):
            #print t[i],type(t[i])
            if math.isnan(t[i]) == True:
                t[i] = 0
        t = tuple(t)
        return (Q+PARAMS+ON_DUPLICATE,t)


After receiving the results (the MYSQL insert query), triple check if the keys are there and are not updated. Then pass it to MYSQL through command line to see if the commit was successful. The previous function can be called every time new data of the specified data structure arrives in you data link layer. Otherwise, after the statement generation, you could assign the whole string to a variable, which would be publicly available/global during runtime, allowing you to fill it with the received values and then execute it as a query to the MYSQL database, without having to generate the statement over and over again for every new object of that specific type. In any case, it saved me hours and hours of writing, debugging, trying to parse these huge objects to MYSQL queries and as it seems, it is going to save me additional hours in the future as the demand of rapid data structure analysis & storage grows.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s