Python source to html translator for WordPress


Yesterday, i was planning to begin writing an article about code generation, more specifically about mysql code generation triggered by trusted input, containing unknown data/data types. Although, in order to present the idea, i needed to show you guys some actual scripts (in python), in order to understand what i am saying. Yeah, i could do that by only showing you some control flows in UML, but fortunately for you, i am a practical guy! I wanted my python code to show nice in HTML, plus i am a bit lazy to do manually all the styling by myself. But wait a minute! My friend here is going to do it for me, allowing me to convert python source code text to nice html, by also taking into account the language’s constructs & special notation.

Here is the script that converts python code to HTML, which spits to a file “out.html”, allowing you to just copy & paste in WordPress or any page and was used to output its own source code that follows.
Le script takes only one argument, the name (absolute path) of the Python source file that you want to convert to HTML (logical :)). 

# -*- coding: utf-8 -*-
from sys import argv
script = “”” ““”
print argv[1]
#exit(1)
try:
    fp = open(argv[1],‘r’)
    lines = fp.readlines()
    fp.close()
    script = .join ( lines)
except Exception as e:
    print repr(e)
    exit(1)
def html_escape(text):
    return “”.join(html_escape_table.get(c,c) for c in text)
#BEGINNING OF THE SCRIPT
simple_keywords = [‘if’,‘else:’,‘not’,‘and’,‘or’,‘return’,‘print’,‘elif’,‘def’,‘for’,‘in’,‘while’,‘do’,
‘continue’,‘as’,‘except’,‘try:’,‘from’,‘import’,‘pass’,‘break’]
string_context = [“‘”,‘”‘]
comment_keywords = [“#”]comment_color = “#8A4B08”;
background_style = [‘<div<b> </b>style=”font-size:<b> </b>9px;background-color:black;color:white;font-family:<b> </b>courier<b> </b>new,<b> </b>courier,<b> </b>monospace;”>’,‘</div>’]
comment_style = [‘<span<b> </b>style=”font-family:<b> </b>courier<b> </b>new,<b> </b>courier,<b> </b>monospace;<b> </b>color:<b> </b>%s”>’ % comment_color,‘</span>’]
keyword_style = [‘<span<b> </b>style=”font-family:<b> </b>courier<b> </b>new,<b> </b>courier,<b> </b>monospace;<b> </b>color:<b> </b>#00BFFF;<b> </b>padding:<b> </b>.667em<b> </b>.917em;”>’,‘</span>’]
function_name_style = [‘<span<b> </b>style=”font-family:<b> </b>courier<b> </b>new,<b> </b>courier,<b> </b>monospace;<b> </b>color:<b> </b>#0000FF;”>’,‘</span>’]
true_false_style = [‘<span<b> </b>style=”font-family:<b> </b>courier<b> </b>new,<b> </b>courier,<b> </b>monospace;<b> </b>color:<b> </b>purple;”>’,‘</span>’]
string_style = [‘<span<b> </b>style=”font-family:<b> </b>courier<b> </b>new,<b> </b>courier,<b> </b>monospace;<b> </b>color:<b> </b>#FF0040;”>’,‘</span>’]
function_sign_style = [‘<span<b> </b>style=”font-family:<b> </b>courier<b> </b>new,<b> </b>courier,<b> </b>monospace;<b> </b>color:<b> </b>#088A4B;”>’,‘</span>’]
function_sign_headers = [,]content = script
buf = 
i = 0
quotes = [“‘”,‘”‘] 

while i < len(content):
    if content[i] == ‘<b> </b>’ and content[i-3:i] == ‘def’ and content[i-4] not in [“‘”,‘”‘] and content[i+1] not in [“‘”,‘”‘] : # Mark function signature start/end 
        for j in xrange(i+1,len(content)):
            if content[j] == ‘(‘:
                injection = function_sign_headers[0] + content[i:j] + function_sign_headers[1]
                buf += injection
                i = j  1
                break
    else:
        buf += content[i]
    i += 1

content = buf
#content = content.replace(‘\t’,’  ‘)
content = content.replace(‘<b> </b>’,‘<b> </b>’)#‘ ‘)
content = content.replace(‘\n’,‘<br/>’)

buf = 
#print content;exit(1)
for sk in simple_keywords:
    buf = 
    i = 0
    t = 
    on_string = False
    
    while i < len(content):
        #if i == 0:
        #    print content[i],on_string,content[i] in quotes;#exit(1)
        if content[i] in quotes and on_string == False: #ENTERS
            t = content[i]
            on_string = True
            buf += content[i]
            i += 1
            continue
        elif content[i] in quotes and on_string == True and content[i] == t: #GOES OUT
            buf += content[i]
            i += 1
            on_string = False
            t = ‘-‘
            continue
        elif content[i] in quotes and on_string == True and content[i] != t: # STILL IN
            buf += content[i]
            i += 1
            continue
       
        elif content[i] not in quotes and on_string == True: # ON STRING
            buf += content[i]
            i += 1
            continue
        else:
            assert(content[i] not in quotes)
            assert(on_string == False)
            
            l = len(sk)
            end = i + l
            if content[i:end] == sk and content[end] in [  ‘&’ , ‘<‘ ]:
                # and content[i-1] not in quotes and content[end+1] not in quotes:
                if i != 0:
                    if content [ i  1 ] != ‘;’ and content [ i  1] != ‘>’:#and content [ end  ] != ‘:’:
                        #print content [ end]
                        buf += content[i]
                        i += 1
                        #print content [i  10 : i + 10] 
                        #print “SK = “,sk,content [ i  1]
                        continue
                buf = buf +  keyword_style[0] +sk+keyword_style[1]
                i += l 
                continue
            else:
                buf += content[i]
        i += 1
    content = buf
    buf =

buf = 

i = 0
on_string = False

while i < len(content) :
    if content[i] == ‘”‘ and on_string == False and content[i-1]  not in [‘=’] and content [i + 1] not in [‘>’]: # String style handling
          buf += string_style[0] + content[i] 
          on_string = True
          #print content[i-2],content[i-1],content[i],content[i+1]
          for j in xrange(i+1,len(content)):
              if content[j] != ‘”‘:
                  buf +=  html_escape ( content[j] )
              else:
                  buf +=   content[j]  + string_style[1]
                  i = j  
                  on_string = False
                  break

    elif content[i] == “‘” and on_string == False  : # String style handling
          buf += string_style[0] + content[i]
          on_string = True
          
          for j in xrange(i+1,len(content)):
              if content[j] !=  “‘”:
                  #buf += content[j] #html_escape ( content[j] )
                  buf += html_escape ( content[j] )

              else:
                  buf += content[j] + string_style[1]
                  i = j 
                  on_string = False
                  break

    elif content[i] == “#” and on_string == False:#and content[i-6:i] == ‘ ‘: # Comment style handling
        k = i
        ok = False
        ok2 = False
        while k >= 0:
            if content[k]  == ‘”‘:
                t = ‘”‘
                break
            if content[k] == “‘”:
                t = “‘” 
                break
            if content[k] in [‘>’]:
                ok = True
                break
            k -= 1
            if k <= 0:
                ok = True
        k = i

        while k < len(content):
            if content[k]  == ‘”‘:
                t2 = ‘”‘
                break
            if content[k] == “‘”:
                t2 = “‘” 
                break
            if content[k] in [‘<‘]:
                ok2 = True
                break
            k += 1
        #print ‘ENTERS’,ok,ok2;exit(1)
        if ok == False or ok2 == False:
            buf += content[i]
            i += 1
            continue
        #print ok,ok2
        b = 
        j = i
        buf += comment_style[0] + content[i]
        j += 1
        while j < len(content):
            if content[j:j+5] == ‘<br/>’:
                buf += comment_style[1]
                i = j  1
                break
            else:
                b += content[j]
                #print b
                if  b.endswith(‘color:’) == True:
                    while content[j] != ‘;’:
                        j += 1
                    buf += ‘:<b> </b>%s’ % comment_color
                    #print ‘YEA’;exit(1)
                buf += content[j]
            j += 1  

    else:
        buf += content[i]
    i += 1

content = buf
#print ‘—>’
content = content.replace(True,true_false_style[0]+True+true_false_style[1]) # True/False style handling
content = content.replace(False,true_false_style[0]+False+true_false_style[1])
content = content.replace(function_sign_headers[0],function_sign_style[0]) #Finish the function signature styling job
content = content.replace(function_sign_headers[1],function_sign_style[1]) 
content = background_style[0]+content+background_style[1] # Finish background styling

fp = open(‘out.html’,‘w+’)
fp.write(‘<html>’)
fp.write(content)
fp.write(‘</html>’)
fp.close()
print ‘the end’
print content


The only thing i failed to convert correctly was the following table:

html_escape_table = {
‘”‘: “&quot;”,
“‘”: “&apos;”,
“>”: “&gt;”,
“<“: “&lt;”}

so add it in the begin of the file manually please.
The most difficult part was of course string handling, and trying to detect if you are inside a ‘”‘ string, “‘” string or inside html code, like the styling information on the top of the script. To escape html chars/or not to escape html chars, to replace whitespaces here,there,why, i am i still inside a python or html string???

STOP THE MADNESS!
Yeah, i could say that the code is totally ninja and is not monkey proofed, but in the end it helped me a lot. So, just take it, share it, improve it, do whatever you like. Check also the comments, it will be helpful. I am going to do the same as long as Tzertzelos’ project is running, sharing with you cool stuff & crazy ideas!
Cheers \m/


ps: I used my tool to pass itself as a parameter to itself, Pap’s paradox is no more!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s