[nycphp-talk] Domain Name Compression Tables

Gary A. Mort
Thu Sep 27 13:19:17 EDT 2012

     Got an oddball question and since it tracks back to programming for
websites, figured this would be the place to ask: 

  I was wondering if anyone here knows of some standardized lookup tables
for Domain Name Compression. I'm storing a tuple of information to be used
to generate a TOTP token[TOTP key is stored/indexed by Domain Name and
Username it applies to]. Storage limit is, for practical purposes, 64

  If I SHA-1 the data, it comes out nicely at 20 bytes each, which leaves
me 4 bytes to play with. 

  So I've added a format byte for each bit of data...and will think of
something to use that last byte for[probably a rough running counter of the
number of times the data has been accessed/used] 

  For formatting, I'm currently using: x01: SHA-1 x02: Raw Ascii, left
justified x03: Packed Ascii, left justified x04: raw UTF8 x05: reserved to
for some way of packing UTF8 

  With the above format, it means that when possible the data can be stored
in a format where it can be retrieved and reviewed...but if the data is too
long then it sha-1 is the backup so the data can still be USED even if it
can't be queried. 

  Now, I'd like to expand the ability to maintain data in a useful format.
For domains, for example, there are only 316 top level domains at the
moment:,1108865 so some pseodo
code in PHP to handle it could be: 

 $myTLD='com'; if in_array($myTLD, $domains) {
$tldIndex=array_search($myTLD, $domains); // 55 if using the IANA file as
of today // a tld label which is 16 bits long, starting with 1 always uses
table lookup $tldLabel=0xb1000000000000000; // convert the index to a
binary number, as the index does not go above 512 this is limited to at
most 10 bits $tldIndexB=decbin($tldIndex); $tldLabel= $tldLabel| $tldIndexB
} else { // assuming ASCII TLD's here, highest binary value is lowercase z,
decimal 122 // binary 01111010 - ie first bit is ALWAYS 0 $tldLabel =
pack("c*", $myTLD);} }  Note: I'm not actually going through the pain of
decoding/encoding this on the PHP just helps me to think to plot
out the function in PHP and then translate it Javascript and C.  So using
DNS Name Notation
for compression this leads to being able to at encode some longer domains
in a manner which can be reversed for user display. 

  It strikes me that this should be a process done frequently enough that
there is a standard lookup table for common domain label entrees so that
all applications can share it. 

  I realize that the dictionary can be included in with the data, but for
this purpose it doesn't make a lot of since. I can only store and update 3
64 byte chunks, wheras I have available 2k of program space on the chip
<>...which should give me
over 1k of space for static lookup storage since the TOTP calculation
function isn't too memory intensive. 

  For the curious, I'm implementing a very simply TOTP keytool on a
Launchpad. Chrome has a nice experimental feature to access the com port on
the computer - so it's child's play to take the existing code snippets and
create a PHP plugin to allow TOTP logons and a supporting Chrome extension
to do auto-detection and entering of the stored passwords. 

  Plus it's a lot of fun figuring out how to merge my love of Website
Engineering with my love of electronics. I'm currently working on a
Javascript and PHP library to fiddle with the IOIO
<> which is a hell of lot harder
because there I have to figure out his existing protocol and actually
convert the commands into packed byte arrays inside Javascript and PHP. So
far I did some coding at last weeks JoomlaDay and I was only able to get
some one off's done in order to access the thing remotely and toggle the
LED <>.  
For the curious, my actually line count of written code is rather low since
pySerial and pyWebsockets already had most of what I needed.   Now I need
to generate some Javascript and PHP classes to actually use as a library
and use the protocol.  

  Thanks for your time. -Gary
