I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

October 06, 2005

Equipping our ASCII Armor

On one of our e-commerce web sites, we needed a unique transaction ID to pass to a third party reporting tool on the checkout pages. We already had a GUID on the page for internal use. And you know how much we love GUIDs!

22da5537-de54-459d-9b33-f40f2101143b

A GUID is 128 bits, or 16 bytes. And the third party can accept 20 bytes.

This seems workable until you realize that those 20 bytes have to be represented as a plain text string to be transmitted via HTTP in a form post or querystring.

So the question is, how do we represent a 128-bit integer in a plain text string that fits in 20 characters? In other words, we need to equip our ASCII Armor.

There's a Guid.ToByteArray() method which returns an array of 16 bytes (0-255). So we could just use ASCII values 0-255 to represent each byte, right? But wait a minute. ASCII 13 is carriage return! And good luck sending ASCII 0 (aka null) to anyone. Hmm.

We're forced to use only printable ASCII characters. Which means we'll have to use more bytes to represent the same data; it's unavoidable. Let's experiment with a few forms of ASCII armor and see how close we can get.

A Hex encoded GUID..

Dim g As Guid = Guid.NewGuid
Dim sb As New Text.StringBuilder
For Each b As Byte In g.ToByteArray
  sb.Append(String.Format("{0:X2}", b))
Next
Console.WriteLine(sb.ToString)

.. uses ASCII values 0-9, A-F and results in a 32 byte string:

EBB7EF914C29A6459A34EDCB61EB8C8F

A UUEncoded GUID..

Dim u As New UUEncode
Dim g As Guid = Guid.NewGuid
Dim s As String
s = u.Encode(g.ToByteArray)
Console.WriteLine(s)

.. uses ASCII values 32-95 (decimal) and results in a 25 byte string:

0@-_;,9X-@D2BT\V!0V$/TP``

A Base64 encoded GUID..

Dim g As Guid = Guid.NewGuid
Dim s As String
s = Convert.ToBase64String(g.ToByteArray)
Console.WriteLine(s)

.. uses ASCII values a-z, A-Z, 0-9 and results in a 22 byte* string:

7v26IM9P2kmVepd7ZxuXyQ==

An ASCII85 encoded GUID...

Dim a As New Ascii85
Dim g As Guid = Guid.NewGuid
Dim s As String
s = a.Encode(g.ToByteArray)
Console.WriteLine(s)

.. uses ASCII values 33-118 (decimal) and results in a 20 byte string:

[Rb*hlkkXVW+q4s(YSF0

So it is possible to fit a complete GUID in 20 printable ASCII characters using the latest and greatest ASCII Armor. But just barely!

In the process of writing this entry, I couldn't find any C# or VB.NET implementions of ASCII85, so I wrote one. I'll have source code up for that shortly.

* The trailing "==" in Base64 is an end of line marker and should not count towards the character total.

Posted by Jeff Atwood    View blog reactions

 

« Avoiding "Blank Page Syndrome" C# implementation of ASCII85 »

 

Comments

Nice work! I never even HEARD of ASCII85. Learn something new every day.

"Base64 ought to be enough for anybody."

Haacked on October 7, 2005 04:24 PM

"ASCII values 0-255" eh? Repeat after me: ASCII is a 7-bit encoding. ASCII is a 7-bit encoding.

Indeed, the ASCII spec only defines character values 32-126 - these 95 values are the only valid ASCII values. Anything else isn't ASCII.

However, given 20 ASCII characters, that is still about 131.397 bits of information. So you've still got over 3 bits to spare after your GUID! Just enough space to store a fairly small number.

Ian Griffiths on October 8, 2005 05:48 AM

Nice touch updating the wikipedia entry for ascii85. Of course, this would all be easier if we'd all use base 85, but no one will listen to me.

Jon Galloway on October 9, 2005 02:36 AM

Ian, thanks for the clarification as always. Just when I thought I was a computer "scientist"..

Jon, I'm waiting for the inevitable BASE95 or ASCII95. There must be some good reason that Adobe chose to use just 85 of the 95 possible printable ASCII characters, but I can't think of what that could be right now..

Jeff Atwood on October 9, 2005 02:58 AM

> Jon, I'm waiting for the inevitable BASE95 or ASCII95. There must be
> some good reason that Adobe chose to use just 85 of the 95 possible
> printable ASCII characters, but I can't think of what that could be
> right now..

Seems to me that the need for certain special case characters would preclude them from using the full 95 characters. I know from reading the wiki entry you linked that at least <, ~, >, and z are generally off limits.

Am I off base?

P.S. This is my first comment here, I love you blog though - truly a pleasure to read. It has been a staple of my google homepage for quite some time :-)

Matt Mousseau on August 22, 2007 03:22 PM

Aww geez...your comment system stripped the less than and greater than symbols out.

"that at least , and z are generally off limits." is supposed to read "that at least (less than), (greater than) and z are generally off limits."

Can I use html encoded versions? Test: > and <

Matt Mousseau on August 22, 2007 03:26 PM

We use base64 for processing the PDF file. it strips off all the CARRIAGE RETURNs or "0D" that are not followed by "0A" or linefeed.

Please can anyone tell me what can done to fix this issue as the End result is that the PDF is corrupted and will not open.
I can figure out that the carriage returns are missing by viewing it with a binary viewer and also find that in some places where there is linefeed carriage return linefeed...an extra carriage return is inserted in the beginning.

Please help

vixmus on July 21, 2008 12:21 AM

Base64 may be less efficient in terms of space, but it's far more efficient in terms of speed. Division is a slow operation (it becomes noticeable when done with frequency). Obviously when transmitting data over the network, this could be relevant... unless you remember if you have such a restriction, it means either you're using XML (which is both slow and verbose) or you're dealing with an old, old legacy system (which is slow and you have no say on the verboseness).

I work with handheld development... and both space and speed are frequent issues. However, in such a situation, I'd stick to base64 or maybe even use hex encoding.

Ekevoo on October 5, 2008 09:49 PM







(hear it spoken)


(no HTML)




Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.