On one of our e-commerce web sites, we needed a unique transaction ID to pass to a third party reporting tool on the checkout pages. We already had a GUID on the page for internal use. And you know how much we love GUIDs!
22da5537-de54-459d-9b33-f40f2101143b
A GUID is 128 bits, or 16 bytes. And the third party can accept 20 bytes.
This seems workable until you realize that those 20 bytes have to be represented as a plain text string to be transmitted via HTTP in a form post or querystring.
So the question is, how do we represent a 128-bit integer in a plain text string that fits in 20 characters? In other words, we need to equip our ASCII Armor.
There's a Guid.ToByteArray() method which returns an array of 16 bytes (0-255). So we could just use ASCII values 0-255 to represent each byte, right? But wait a minute. ASCII 13 is carriage return! And good luck sending ASCII 0 (aka null) to anyone. Hmm.
We're forced to use only printable ASCII characters. Which means we'll have to use more bytes to represent the same data; it's unavoidable. Let's experiment with a few forms of ASCII armor and see how close we can get.
A Hex encoded GUID..
Dim g As Guid = Guid.NewGuid
Dim sb As New Text.StringBuilder
For Each b As Byte In g.ToByteArray
sb.Append(String.Format("{0:X2}", b))
Next
Console.WriteLine(sb.ToString)
.. uses ASCII values 0-9, A-F and results in a 32 byte string:
EBB7EF914C29A6459A34EDCB61EB8C8F
A UUEncoded GUID..
Dim u As New UUEncode Dim g As Guid = Guid.NewGuid Dim s As String s = u.Encode(g.ToByteArray) Console.WriteLine(s)
.. uses ASCII values 32-95 (decimal) and results in a 25 byte string:
0@-_;,9X-@D2BT\V!0V$/TP``
A Base64 encoded GUID..
Dim g As Guid = Guid.NewGuid Dim s As String s = Convert.ToBase64String(g.ToByteArray) Console.WriteLine(s)
.. uses ASCII values a-z, A-Z, 0-9 and results in a 22 byte* string:
7v26IM9P2kmVepd7ZxuXyQ==
An ASCII85 encoded GUID...
Dim a As New Ascii85 Dim g As Guid = Guid.NewGuid Dim s As String s = a.Encode(g.ToByteArray) Console.WriteLine(s)
.. uses ASCII values 33-118 (decimal) and results in a 20 byte string:
[Rb*hlkkXVW+q4s(YSF0
So it is possible to fit a complete GUID in 20 printable ASCII characters using the latest and greatest ASCII Armor. But just barely!
In the process of writing this entry, I couldn't find any C# or VB.NET implementions of ASCII85, so I wrote one. I'll have source code up for that shortly.
* The trailing "==" in Base64 is an end of line marker and should not count towards the character total.
Nice work! I never even HEARD of ASCII85. Learn something new every day.
"Base64 ought to be enough for anybody."
Haacked on October 7, 2005 5:24 AM"ASCII values 0-255" eh? Repeat after me: ASCII is a 7-bit encoding. ASCII is a 7-bit encoding.
Indeed, the ASCII spec only defines character values 32-126 - these 95 values are the only valid ASCII values. Anything else isn't ASCII.
However, given 20 ASCII characters, that is still about 131.397 bits of information. So you've still got over 3 bits to spare after your GUID! Just enough space to store a fairly small number.
Ian Griffiths on October 8, 2005 6:48 AMIan, thanks for the clarification as always. Just when I thought I was a computer "scientist"..
Jon, I'm waiting for the inevitable BASE95 or ASCII95. There must be some good reason that Adobe chose to use just 85 of the 95 possible printable ASCII characters, but I can't think of what that could be right now..
Jeff Atwood on October 9, 2005 3:58 AM Jon, I'm waiting for the inevitable BASE95 or ASCII95. There must be
some good reason that Adobe chose to use just 85 of the 95 possible
printable ASCII characters, but I can't think of what that could be
right now..
Seems to me that the need for certain special case characters would preclude them from using the full 95 characters. I know from reading the wiki entry you linked that at least , ~, , and z are generally off limits.
Am I off base?
P.S. This is my first comment here, I love you blog though - truly a pleasure to read. It has been a staple of my google homepage for quite some time :-)
Matt Mousseau on August 22, 2007 4:22 AMAww geez...your comment system stripped the less than and greater than symbols out.
"that at least , and z are generally off limits." is supposed to read "that at least (less than), (greater than) and z are generally off limits."
Can I use html encoded versions? Test: and
Matt Mousseau on August 22, 2007 4:26 AMWe use base64 for processing the PDF file. it strips off all the CARRIAGE RETURNs or 0D that are not followed by 0A or linefeed.
Please can anyone tell me what can done to fix this issue as the End result is that the PDF is corrupted and will not open.
I can figure out that the carriage returns are missing by viewing it with a binary viewer and also find that in some places where there is linefeed carriage return linefeed...an extra carriage return is inserted in the beginning.
Please help
vixmus on July 21, 2008 1:21 PMBase64 may be less efficient in terms of space, but it's far more efficient in terms of speed. Division is a slow operation (it becomes noticeable when done with frequency). Obviously when transmitting data over the network, this could be relevant... unless you remember if you have such a restriction, it means either you're using XML (which is both slow and verbose) or you're dealing with an old, old legacy system (which is slow and you have no say on the verboseness).
I work with handheld development... and both space and speed are frequent issues. However, in such a situation, I'd stick to base64 or maybe even use hex encoding.
Ekevoo on October 5, 2008 10:49 AMA very simple solution to this problem is to use a base 32 rather than base 16 representation to cut the length down from 32 characters to 16 characters. Hexadecimal encoding uses chars [0-9A-F] for an alphabet size of 16 to represent 4 byte runs of a binary stream.
You can represent 8 byte runs of the same binary stream using chars [A-Za-z]. This will cut the ascii representation in half. It would be fairly straight forward to implement such an encoding, simply use the same logic you would to convert to hexadecimal but substitute the larger alphabet for the representation and cut the stream into 8 byte rather than 4 byte chunks.
Steve Owens on May 28, 2009 9:24 AMNice touch updating the wikipedia entry for ascii85. Of course, this would all be easier if we'd all use base 85, but no one will listen to me.
Jon Galloway on February 6, 2010 9:46 PMThe reason there's not a ascii95 is because 85^5 (5 bytes of encoded string) is only a little bigger than 256^4 (4 bytes of unencoded string), which makes it a particularly effective encoding for blocks of such a small size. In fact, you have to get up to 21 bytes before ascii95 would be an improvement on ascii85 (21 unencoded bytes -> 26 ascii95, but 27 ascii85). By that point, the numbers are rather too large to deal with reasonably - we're barely at 64-bit computers, let alone 168 bits!
Since ascii95 wouldn't really be a feasible improvement on ascii85, we may as well just use 85 and then have those 10 characters for whatever we want, like 'z' and '~'.
(Plus we don't want to use space, so it would really only be ascii94. That doesn't change the math, though.)
I'm not sure what that Base85 is based off of, but it happens that there are 85 chars that can be used as un-escaped content in XML.
Bcsd on August 6, 2010 12:58 PMBecause with base85 five ASCII characters represent exactly four bytes of binary data (32 bits).
alexanderpas on August 31, 2010 3:13 AMThe comments to this entry are closed.
|
|
Traffic Stats |