I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

July 31, 2009

The Paper Data Storage Option

As programmers, we regularly work with text encodings. But there's another sort of encoding at work here, one we process so often and so rapidly that it's invisible to us, and we forget about it. I'm talking about visual encoding -- translating the visual glyphs of the alphabet you're reading right now. The alphabet is no different than any other optical machine readable input, except the machines are us.

But how efficient is the alphabet at encoding information on a page? Consider some of the alternatives -- different visual representations of data you could print on a page, or display on a monitor:

5081 punch card
up to 80 alphanumeric characters

codinghorror-5081-punch-card.png

Maxicode
up to 93 alphanumeric characters

codinghorror-maxicode.png

Data Matrix
up to 2,335 alphanumeric characters

codinghorror-datamatrix.png

QR Code
up to 4,296 alphanumeric characters

codinghorror-qr-code.png

Aztec Code
up to 3,067 alphanumeric characters

codinghorror-aztec-code.png

High Capacity Color Barcode
varies by # of color and density; up to 3,500 characters per square inch

codinghorror-microsoft-tag.png

Printed page
about 10,000 characters per page

alice-printed-page.png

Paper the way we typically use it is criminally inefficient. It has a ton of wasted data storage space. That's where programs like PaperBack come in:

PaperBack is a free application that allows you to back up your precious files on ordinary paper in the form of oversized bitmaps. If you have a good laser printer with the 600 dpi resolution, you can save up to 500,000 bytes of uncompressed data on a single sheet.

You may ask - why? Why, for heaven's sake, do I need to make paper backups, if there are so many alternative possibilities like CD-R's, DVD±R's, memory sticks, flash cards, hard disks, streaming tapes, ZIP drives, network storage, magneto-optical cartridges, and even 8-inch double-sided floppy disks formatted for DEC PDP-11? The answer is simple: you don't. However, by looking on CD or magnetic tape, you are not able to tell whether your data is readable or not. You must insert your medium into the drive, if you even have one, and try to read it.

Paper is different. Do you remember punched cards? For years, cards were the main storage medium for the source code. I agree that 100K+ programs were... inconvenient, but hey, only real programmers dared to write applications that large. And used cards were good as notepads, too. Punched tapes were also common. And even the most weird encodings, like CDC or EBCDIC, were readable by humans (I mean, by real programmers).

Of course, bitmaps produced by PaperBack are also human-readable (with the small help of any decent microscope). I'm joking. What you need is a scanner attached to your PC.

PaperBack, like many of the other visual encodings listed above, includes provisions for:

  • compression -- to increase the amount of data stored in a given area.
  • redundancy -- in case part of the image becomes damaged or is otherwise unreadable.
  • encryption -- to prevent the image from being readable by anyone except the intended recipient.

paperback-options.png

Sure, it's still paper, but the digital "alphabet" you're putting on that paper is a far more sophisticated way to store the underlying data than traditional ASCII text.

This may all seem a bit fanciful, since the alphabet is about all us poor human machines can reasonably deal with, at least not without the assistance of a computer and scanner. But there is at least one legitimate use for this stuff, the trusted paper key. There's even software for this purpose, PaperKey:

The goal with paper is not secure storage. There are countless ways to store something securely. A paper backup also isn't a replacement for the usual machine readable (tape, CD-R, DVD-R, etc) backups, but rather as an if-all-else-fails method of restoring a key. Most of the storage media in use today do not have particularly good long-term (measured in years to decades) retention of data. If and when the CD-R and/or tape cassette and/or USB key and/or hard drive the secret key is stored on becomes unusable, the paper copy can be used to restore the secret key.

For paper, on the other hand, to claim it will last for 100 years is not even vaguely impressive. High-quality paper with good ink regularly lasts many hundreds of years even under less than optimal conditions.

Another bonus is that ink on paper is readable by humans. Not all backup methods will be readable 50 years later, so even if you have the backup, you can't easily buy a drive to read it. I doubt this will happen anytime soon with CD-R as there are just so many of them out there, but the storage industry is littered with old now-dead ways of storing data.

Computer encoding formats and data storage schemes come and go. This is why so much archival material survives best in the simplest possible formats, like unadorned ASCII. Depending on what your goals are, a combination of simple digital encoding and the good old boring, reliable, really really old school technology of physical paper can still make sense.

[advertisement] Interested in agile? See how a world-leading software vendor is practicing agile.

Posted by Jeff Atwood    View blog reactions
« Coding Horror: Movable Type Since 2004
Software Pricing: Are We Doing It Wrong? »
Comments

Paper backups... there's an app for that.

Evan Meagher on July 31, 2009 2:48 PM

Frist!

Frist! on July 31, 2009 2:51 PM

I wish someone would make a whole disk (like disaster recovery) version of paperbak...

I would love to say, see that ream of paper...yeah thats my backup.

Or in actuallity having a single pdf that contained everything would be supremely awesome

Hurricane on July 31, 2009 2:54 PM

The reason you don't use paper is because the acid in the paper makes everything disappear (relatively) quickly.

w on July 31, 2009 2:55 PM

There's a coding error on the IBM JCL punched card: should be no space between
SLINK,
and
TESTPGM=

Erica on July 31, 2009 2:55 PM

That's pretty cool information. Thanks Jeff! I never knew you could store so much on one of those image maps. Crazy!

Joe Mo on July 31, 2009 2:56 PM

This assumes either the software or the computer system is useable in a hundred years...

David Roberts on July 31, 2009 3:11 PM

I've been wondering how efficient paper-based backup could be. That's pretty impressive. ~Half a megabyte of uncompressed data is nothing to sneeze at. You could squeeze a whole lot of 7zip compressed plain-text documents into that space; easily an entire book on a single page. With formatting things get larger, but c'mon. An entire book on a single page.

The problem of course is making sure it's still readable in a couple hundred years, but if you use decent paper and slap on a few cover pages or so documenting your algorithms I don't see why it would be an issue.

James on July 31, 2009 3:37 PM

Paper that lasts a century is easy enough to make or buy, but it's definitely not the common photocopy paper you use by the ream every day. That stuff is acidic and the ink is also fragile, so it's a race between the paper eating the ink and the ink decaying on its own. Give it 20 years and you won't be able to move it without it falling apart either.

So as long as you're using a high quality acid-free archive quality paper, your laser printer is in good condition (especially the seal roller) and you store the paper in a way that minimises exposure to air and moisture (making a thick stack is good), you should be fine for 100 years.

Personally, I'd be printing the coding scheme in human-readable text on every 100th page, and making very sure that each page is independently readable. So backing up your huge pr0n collection... not so much.

Moz on July 31, 2009 3:39 PM

The post reminds me of the opening chapters of Code by Charles Petzold (http://www.amazon.co.uk/Code-Language-DV-Undefined-Charles-Petzold/dp/0735611319/ref=sr_1_1?ie=UTF8&s=books&qid=1249079981&sr=8-1)

Chris on July 31, 2009 3:40 PM

Well, the intro was pretty cool. Human OCR is pretty awesome.

Storing data on paper... All kinds of problems: 1. Expensive. 2. Volatile (Paper that lasts needs to be bound, which adds more to #1). 3. Storage Space (as in actual room for the paper). 4. Security Problems.

Also, what terrible storage space! 500k per piece of paper? Let's see, to store 1GB I would need... 2,000 sheets of paper. To back up my server, I need, 2 MILLION SHEETS OF PAPER (at least).

Although, that may be fun for some spy type stuff.

Practicality on July 31, 2009 3:50 PM

Actually, the longevity is likely better than you'd think. My company just went through 27 years of project archive boxes to reduce their volume. The oldest boxes had numerous program listings printed by dot matrix on "ordinary" grade line printer paper, and had been stored in "ordinary" grade banker's boxes. There was some fading and some yellowing, but by and large everything that came out of the oldest boxes was about as legible is it had been when stored.

The biggest issue is longevity of the file format itself. Its probably best to store things in lowest-common denominator formats. Plain text where possible, for instance. Otherwise, formats that are widely documented such as JPEG and PDF are probably safe enough. For a less well known format (such as that collection of digital negatives in Nikon raw files) it might be a good idea to include a source kit to a program that can read the file.

RBerteig on July 31, 2009 4:00 PM

Unless I'm being daft, but a page would equal less than half a MB?

Gr33n3gg on July 31, 2009 4:10 PM

"Or in actuallity having a single pdf that contained everything would be supremely awesome"

no comment.

Acrobat on July 31, 2009 4:10 PM

Oops. The rats ate my backup.

David A. Lessnau on July 31, 2009 4:20 PM

Treekiller!!

Peter on July 31, 2009 4:41 PM

This reminds me of the Rosetta Project run by the Long Now Foundation.

When it came time to make a long lasting archive of all of their data about human languages, they ended up going with essentially high precision bronze-age technology. Words engraved in metal.

They see your PaperBack and raise you StoneBack.

More on the Rosetta project here if you are in to that kind of thing.
http://rosettaproject.org/disk/concept/

Tim Maly on July 31, 2009 4:46 PM

Twibright Labs, which developed the RONJA (Reasonable Optical Near Joint Access) open-source system for open-space optical networking, has created a similarly open-source system called OPTAR for storing data on paper. The system stores 200KB on an A4 page, with error correction. The provide the following reasons for the technology:
- Long-life storage. They point out that microfiche panels (which could be used to store quite a bit of data via an imagesetter) have an estimated life of 500 years in air conditioning, much longer than common data storage formats.
- Legal requirements. The law requires that certain kinds of records be kept on paper (for example, notary journals and financial reports per Sarbanes-Oxley). OPTAR satisfies those requirements while storing data in a directly machine-readable format.
- Inclusion of digital information on printed materials. The example of ringtones printed on paper is given, where a cellular phone camera could read the data.

Archival life and legal compliance seem the most compelling reasons, particularly the applications in satisfying corporate compliance law without sacrificing computer readability of paper records that can serve as backup.

They also mention usage in IP-over-avian-carrier implementations, per RFC1149 and RFC2549.

J. B. Crawford on July 31, 2009 5:08 PM

I like big butts and I cannot lie!

Wade-0 on July 31, 2009 5:22 PM

Wonderful! In 100 years your great-grandkids can find your paper backups of your never-published memoirs in the attic! And if they're very lucky they'll remember something about you using this PaperBack program to generate them. Now all they have to do is load the program on their quantum computers!

Let's see, there's an old CD of that program here, but its data layer degraded before their parents were born. Oh good, there's also a paper backup of the program.... in PaperBack format.

Hm. Maybe your PaperBacked memoirs can be hung on the wall as antique art instead.....

rfunk on July 31, 2009 5:51 PM

"Oh good, there's also a paper backup of the program.... in PaperBack format."

BWAAAAAAAAAAAAAAHAHAHAHAHAHAHAHAH. Ive been there xD.

Anonymous on July 31, 2009 6:18 PM

[quote]Another bonus is that ink on paper is readable by humans. Not all backup methods will be readable 50 years later, so even if you have the backup, you can't easily buy a drive to read it. I doubt this will happen anytime soon with CD-R as there are just so many of them out there, but the storage industry is littered with old now-dead ways of storing data.[/quote]

Yet in 50 years no one will know how to read the 50 yo so called compressed paper alphabet and we will be back to square one.

JF on July 31, 2009 7:02 PM

That's actually really interesting, I'm going to look for a Linux compatible paper storage solution.

Zoasterboy on July 31, 2009 7:34 PM

Google realized this and have launched GMail Paper.

http://mail.google.com/mail/help/paper/more.html

I just signed up. Can't wait for that first ream to come in!

Paul on July 31, 2009 8:03 PM

I'm guessing you could probably dump an ASCII version of the encoding algorithm onto a couple of sides of A4

Dan on July 31, 2009 11:48 PM

I've been interested in paper-based backup solutions for a while. I find it interesting that we can still use punched card and tape from 40 years ago without any problem, assuming we have working readers, but most original magnetic storage media has degraded so much that we can't read it at all.

One other thought with this solution. I agree that we have to be careful what format the documents are stored in. We also need to make sure that we don't assume 8-bit encoding, or ASCII, or anything else that could so easily be overlooked, that we take for granted.

Lawrence Woodman on July 31, 2009 11:52 PM

I just scanned and restored a file that I saved to paper at my office this afternoon. It works like a charm, at least at 200 dpi.

One advantage of optical storage at this scale as opposed to a CD/DVD is that it obviously contains information. The dot pattern is a modulated gray that under only a little magnification reveals structure. It would be obvious to an archeologist that there was information to be recovered.

That said, storage of a Rosetta stone in the form of source code as plain text on paper (PaperBack is under GPL so its possible, and in C++ so its likely to be well enough known to have existed) would certainly be a good idea.

RBerteig on August 1, 2009 12:15 AM

Paper also has the advantage that humans can use nothing but their eyes to determine the degradation of their backup. You can look at a sheet of paper and say "This is still very good, it can easily last a few years longer" or "Wow, this is about to fade, we better renew the backup". With a CD or DVD, you can't even tell with your eyes whether it's still good or already broken.

J. Stoever on August 1, 2009 2:26 AM

I think you're way out where you shouldn't be. Isn't this a case of programmers being so much programmers they get things backwards now?

BmB on August 1, 2009 2:54 AM

Actually, this is something that national libraries are bound to look into.

In most countries a national library is charged to preserve all books, newspapers and magazines that have been published in the country, forever (In the US the Library of Congress fulfills a similar role, as far as I know).

Nowadays this has been expanded to include digital materials -- websites that are located in the country and suchlike. Also, in order to increase the usability, there are programs in virtually any national library, to digitize existing paper books.

There are obviously many problems dealing with ever increasing volume of data, even more so in digital form, where your best bet for web harvesting, for example, can only be to carefully select and harvest some snapshots of what you deem relevant of your country's web.

However, for national libraries, the problem of storing digital for many hundreds of years -- which is nothing unheard of for books, indeed -- is particularly expensive to solve.

Preserving paper is something libraries are good at already, so something that lessens the upkeep (particularly energy) costs for preserving digitized materials is quite welcome.

Emils on August 1, 2009 2:58 AM

That there are more efficient encodings than alphabetic is hardly news - the computer screen in front of me has over two million pixels, enough to represent around 300KB of data in even the least efficient encoding. Certainly, you ought to be able to manage a few megabytes on the higher DPI of paper.

But really, why? We already use those efficient encodings to store gigabytes, or even terabytes on magnetic or optical media, either of which takes up vastly less space than it's paper equivalent. Far easier to achieve redundancy, too - magnetic drives may not last as long as paper, but they're a hell of a lot easier to clone before they fail...

Simon on August 1, 2009 4:31 AM

May not be so obvious, but it's got to be a joke.

Same problems today as they were yesterday, which result in trying to go back to old solutions. PaperBack is probably just a very conscious ironic statement.

Particularly elaborate it seems, as it apparently also demonstrates how some people are willing to be misguided by all the lights, taut it as cool and useful, and ignore such important concepts as Green Technology, or even realize the storage capacity is not even close to be comparable to today's data storage medium. 2 DVD-ROMs surface for instance, fit perfectly on top of a A4 page. Let's be modest and assume only a DVD-4. That's close to 10GBs as opposed to the A4 page 500Kb.

Mario on August 1, 2009 5:03 AM

The problem is longevity. We have warehouses of tapes from the NASA apollo missions that are completely unreadable. What makes you think the DVD format is so special that it will still be around in 30 years? When's the last time you saw an 8-inch floppy drive at COMP-USA? Paper has two things going for it that no digital technology can match:

So look at the facts:

1. We know that paper can last a long time, under the right conditions, (and even under not so ideal conditions), because, well, we have some really old paper.
2. Most of our digital technology is really new. The bits of digital technology that are not new, are completely useless nowadays to all but the most rare of experts on the technology, and the luck to get some of the old machines working. (which is not so straightforward, always, as they may need parts that are no longer manufactured)
3. You toiled years on that program, or that data, or that novel, you want it to still be around in 30 years or what? Look at what's still around from 30 years ago, and make your choice.


Breton on August 1, 2009 5:34 AM

Replication of data is the answer - not paper. Replication in format across multiple distribution mediums, down through the years, is the real long term storage strategy we must be moving towards.

Paper didn't really exist as a feasible way to keep information prior to the printing press. Today we can store impossible amounts of data in cheap and effectively ways that would even impress dear old Gutenberg.

We need to look at improving the quality of how we electronically save data so that it may be archived effectively for a long period of time. Keeping the data 'live' i.e. (on active medium which is current) seems to be the standard approach.

The answers are already appearing with cloud computing and other services that can be used for this type of operation. The company is paid to provide a service and it is the business of the company to transfer data that is archaic in an obsolete file format to a usable later version based on the ongoing importance of the information i.e. someone who is willing to pay.

These are the modern day printers, busily making copies of information that is still deemed important to someone, somewhere for a fee.

Jeff's cute idea is irrelevant and outdated. Sure, look to the past for many things to help guide us in the future (e.g. morals, principles, lessons of past failure, strategies for governance etc) but not technology. We haven't reached the computing end game yet and going back to paper is a complete cop out.

Vince on August 1, 2009 5:37 AM

Sorry, got my wires crossed a bit.

paper's got two things going for it: Longevity, and Visibility. You can see what's printed on the paper. You can't see the orientation of the iron atoms glued to a disc of plastic, or on the surface of a platter, or in the burnt away bits of coating on the surface of a 'cd-r', which does not even have the advantage of being pressed in metal as a manufactured CD or DVD has.

Breton on August 1, 2009 5:38 AM

@vince. Tell that to the thousands of geocities customers. I really have absolutely no faith in the longevity of my data in the cloud, and what you suggest in terms of replication is really like relying on a dead man's switch to prevent the launch of a nuclear missile.

We need passive systems that can retain data without intervention. There's really a finite amount of time that we can devote to retaining data in this way that you suggest. We can probably even come up with some mathematical formulation, charting the growth in the amount of data which needs to be preservered, against the amount of time it will take to replicate it. How quickly will the former outpace the latter?

Breton on August 1, 2009 5:42 AM

I'm not sure what is meant with longevity. Modern DVDs produced with super cyanide (Tayto's) or metal-stabilized Cyanine (TDK's) have lifespans of approximately 70 years. Far larger than the lifespan of the technologies used to read these formats. Even the 30 years lifespan of cheap DVDs is an almost guarantee of a larger lifespan than that of their format support.

Similarly, the paperback solution faces a high media longevity, but no promise of media support. In fact, it presents the huge problem of being also dependent on the encoder/decoder availability in 100 years, probably demanding a legacy system just for the purposes of restore.

And then there is also the huge archiving problem it presents. Let's not be shy here, you would need 8,000 A4 pages to match the capacity of a single 4GB DVD. And then you then need to ask, how long will it take to scan 8,000 pages so I can have my restore?

A marginal gain in longevity does not immediately make all the obvious problems, irrelevant. So, I keep sustaining this is just a joke.

Mario on August 1, 2009 6:17 AM

Right at the top of the page it says "Olly, the author of OllyDbg, presents his new open source joke"

Maybe everyone is taking it too seriously?

Todd on August 1, 2009 6:35 AM

I don't get the point of all of this... And no, I don't think our alphabet to be inefficent, it's simply tuned for fast loading. Have "phun" reading data saved that way!

MaxDZ8 on August 1, 2009 6:43 AM

Ah, paper. The perfect and longest lasting storage medium!

Well, I certainly know that Adobe and Monotype keep all their fonts on punched paper tape (with instructions on how to read the tape written in pencil at the top)! Highly efficient and you can easilly build a paper tape reader with nothing more than basic workshop skills. Knowing the speed and reading method (always simple) it is possible to reconstruct the binary file.

I also know that in the 90's they stored all the source code for applications such as Photoshop and Pagemaker on paper tape as well (don't know if this is still the case).

Makes a lot of sense. I have some inherited punched paper tape data from 40 years ago and its still in perfect condition. I have the reader (mechanically very fast if a bit noisy as it rattles through) and it still works. An RS232 port provides the output. I can read the data into a pc and write the raw data into a file on disk or process it further directly as a data stream.

There's a lot more of this kind of thing going on still than you might realise.

GregF on August 1, 2009 7:08 AM

I think the more important question is, what information would I like to store 50 or 100 years? Programs that are 20 years old are already out of date and basically useless. The important programs they will be left running and therefore be accessible.

Anders on August 1, 2009 7:21 AM

"Modern DVDs produced with super cyanide (Tayto's) or metal-stabilized Cyanine (TDK's) have lifespans of approximately 70 years."

According to that rosetta project site, paper can last for thousands of years.

"Similarly, the paperback solution faces a high media longevity, but no promise of media support. In fact, it presents the huge problem of being also dependent on the encoder/decoder availability in 100 years, probably demanding a legacy system just for the purposes of restore." not a problem to sneeze at for sure. But the source code is available, and can be bundled with the data. NASA recently made available the assembler source code for its apollo guidance computers. If you set your mind to it, you could easily write an emulator to run it, or hand port it to some modern language.

"And then there is also the huge archiving problem it presents. Let's not be shy here, you would need 8,000 A4 pages to match the capacity of a single 4GB DVD. And then you then need to ask, how long will it take to scan 8,000 pages so I can have my restore?"

Do you need your porn collection in 1000 years? I'm perhaps being overly modest, but I think I would struggle finding even 500MBs of stuff that I would want to last for that long. It would likely be source code, writing, and possibly drawings/paintings that are stored on paper in original form anyway. It's not that much of an issue for me to personally curate and maintain my own data while I'm still alive, but I don't expect anyone to care after I'm dead.

In any case, I don't think libraries will be discarding their microfiche machine in favor of DVDs any time soon. Well, not the clever ones anyway.


Breton on August 1, 2009 7:31 AM

Not sure what your argument is Breton. If it is making a case for paper backup, I have news for you. We invented computers.

Mario on August 1, 2009 8:01 AM

Well, I'm not sure I really have much of an argument. Merely an open question: How much about our current history will still be around in 30, 50, 100, 500, or 1000 years? Are we entering a new dark ages? The longevity of the mediums we're currently storing our personal histories into is unproven, because it has only existed for less than a century. We simply don't know which parts of it will last. However, we already know from experience which parts certainly WON'T last: Magnetic records. We have found the diaries of people who lived in the year 500. in 1500 years time, will there be any evidence that you ever even existed? What will the people of the future be able to discover about you? How would they do it? How much effort would they need to go to in order to decode it?

Breton on August 1, 2009 8:11 AM

I can't recall anyone referring to CDC Display Code as "CDC" - it was always "Display Code." Other 6-bit encodings like BCL ("Burroughs Common Language") generally did go by their acronyms though.

There is still a TON of stuff around in EBCDIC though. No, not your Facebook ramblings but stuff like your bank accounts, driving record, criminal history, ... and EBCDIC is far from an IBM mainframe only encoding but people don't seem to know it. Heck, Windows supports EBCDIC yet at least for transcoding purposes, via Kernel32 no less (see MultiByteToWideChar with codepage 37 for one case).

ASCII was considered an interchange format, not something native for normal internal use.

DevMan on August 1, 2009 8:14 AM

"Well, I'm not sure I really have much of an argument. Merely an open question: How much about our current history will still be around in 30, 50, 100, 500, or 1000 years? Are we entering a new dark ages? The longevity of the mediums we're currently storing our personal histories into is unproven, because it has only existed for less than a century. We simply don't know which parts of it will last. However, we already know from experience which parts certainly WON'T last: Magnetic records. We have found the diaries of people who lived in the year 500. in 1500 years time, will there be any evidence that you ever even existed? What will the people of the future be able to discover about you? How would they do it? How much effort would they need to go to in order to decode it?"

Interesting questions indeed. I don't think anyone has an answer. However the, allow me, "computer revolution" is not much different from the printing press revolution when a new information storage media was invented. I'd say it is safe to assume as we continue this path we will constantly reformulate our current technology bringing new and better way to archive information. We do not tend however to go back. Especially because information density keeps growing and old methods become incompatible with that growth. It's hard to beat stone carving when it comes to longevity, and it's conceivable to think a robot and a piece of software could laser carve more information on a lime stone the size of my post-it than this PaperBack thing could on 10,000 A4 pages. Still... we don't see many of those around.

Storage media for long-term backup purposes is meant to be kept in protected environments that usually extends their life far beyond the announced material properties under normal conditions. In very good conditions, information on a DVD could last centuries. But even disregarding that possibility, it is widely acceptable and a common practice that when new better technology is invented and there is a port for that technology, all backups are converted into the new media. Much like paper has been being converted to digital format.

Not that other way around, eh.

Mario on August 1, 2009 8:50 AM

"But even disregarding that possibility, it is widely acceptable and a common practice that when new better technology is invented and there is a port for that technology, all backups are converted into the new media. Much like paper has been being converted to digital format."

You mean it's a widely accepted belief that this happens? I think this is a vast overstatement of the reality of the situation. For instance, there's piles of music that is available on vinyl records, and not available on CD, and never will be, because it's not popular enough at this moment of transition. What if we change our minds later, and decide that music actually is interesting? Will it be too late to save it? Many of the records that will be lost will be rather interesting historical records of music in a particular time, as it was performed then.

There are piles and piles of films that were made in the 20th century, which requires some serious funding and efforts to be preserved. So much so that it must be demonstrated to have genuine historical and cultural value before it can be selected for preservation. We still haven't recovered the entirity of the movie "metropolis". There's sections missing.

I think the ease with which you are able to copy a file on a computer has the effect of distorting your perspective about how difficult it actually is to preserve what's important. How difficult it is to even decide what's important. Some of the most tragic losses have happened because it didn't seem important at the time.

Breton on August 1, 2009 9:05 AM

Oh damn, I haven't even mentioned the effect of DRM, or hard drive crashes. We're already hearing about people losing valuable family photos to technical mishaps. You might comment that it's their fault for not backing up- But it is just much harder to crash a paper photo album. They don't have a tendency to spontaneously combust every 3-4 years like hard drives do.

Breton on August 1, 2009 9:10 AM

We are talking of backups. Not preservation of originals. That's the only way to guarantee the preservation of information far beyond the point the original gets lost. As we speak, an European consortium of museums is undertaking a project (I believe called Europe 2.0) for backing up of their entire contents in digital format.

PaperBack is almost certainly an intentioned joke. Look at the pictures on the original article by Jeff. The last picture is a step backwards from the first. It's irony at its best.

Mario on August 1, 2009 9:15 AM

Is any of this getting through to you? What good are backups if they don't last as long as the originals? We can spend thousands, perhaps millions of dollars to create a digital archive that is nothing but dust in 50 years, or needs another million dollar effort. Are magnetic disks really the best archival medium we can think of? Have we really thought the problem through?

Paper might sound like a joke to someone with digital myopia, but I'm not laughing.

Breton on August 1, 2009 9:33 AM

> Is any of this getting through to you? What good are backups if they don't last as long as the originals?

I don't think you understand the purpose of a digital backup, neither you are familiar with its standard procedures.

Sorry, but I need to be somewhere else.

Mario on August 1, 2009 10:05 AM

PaperBack also sounds like the only encoding mechanism that can be severely compromised by fly poop, dust, fingerprints etc.

Most of the encoding techniques are hardware based and just feed the encoded information to software to use. A software based solution requires that the software continue to be maintained over the many many OS changes etc.

I'll bet the original barcode scanners still work, you could probably get original punch-card readers on EBay - or even build your own. You can EASILY read 100 year old documents.

Getting software that is even 10 years old to work can often be a challenge.

You might want to calculate that into the value of a software based solution.

Oh, and a page full of static might be harder to recognize as valuable information later on.

Xepol on August 1, 2009 11:21 AM

Funny quote from PaperBack site...

"Actual version is for Windows only, but it's free and open source, and there is nothing that prevents you from porting PaperBack to Linux or Mac, and the chances are good that it still will work under Windows XXXP or Trillenium Edition."

XXXP, Trillenium Edition, nice.

CAPTCHA: essayist helped

Craig on August 1, 2009 11:38 AM

People are talking about archiving on DVD here but I've got several high quality DVDs from a couple years ago that already can't be read due to problems. No way I'd trust DVDs for anything long term. To me they're a crap storage system that we'll be lucky to leave behind.

Chris S on August 1, 2009 12:30 PM

> I like big butts and I cannot lie!

LOL, I was waiting for someone to notice this. Thank you Wade-O!

Jeff Atwood on August 1, 2009 12:37 PM

So how much paper would it take to back up... the internet?

Manu on August 1, 2009 3:37 PM

Jeff celebrates April 1st 4 months later in August 1st.

Dennis Gorelik on August 1, 2009 3:49 PM

This is a _bad_ idea. Have you ever printed a stack of paper and come back to it 3 years later? All the ink has started sticking to the page above and when you pull it apart it looks pretty bad. I can't imagine doing this!

Doug on August 1, 2009 6:48 PM

The part on restoring, and what use is a backup if you cannot restore. They say to use just a regular scanner but based on experience with OCR the error rate for this would be rather high.

will on August 2, 2009 2:14 AM

Wow, this was boring

JPH on August 2, 2009 5:28 AM

> This is a _bad_ idea. Have you ever printed a stack of paper and come back to
> it 3 years later? All the ink has started sticking to the page above and when
> you pull it apart it looks pretty bad. I can't imagine doing this!

This problem was solved decades (if not centuries) ago. Air is what deteriorates paper, and if you prevent air from reaching it, the paper is preserved. Today we laminate, but for many people simple sheet protectors are good enough.

I think, right now, this is more practical for the "future civilization" use case for long-term archives of data. For it to be useful to a future civilization, we'd have to provide instructions for encoding/decoding and preserve them along with it, and assume this future civilization has certain technologies. The only things I can think might be useful for preserving for future generations involves text passages, and we already have standards for expressing ideas using text, so it may not be worth storing barcodes at all. Still, it seems like it'd be easy to put a few thousand sheets of paper and a few dozen pages describing how to decode them in an airtight, fire-resistant box. Compare this to storing a couple of DVDs in an environmentally-controlled container along with instructions for how to build a DVD drive. Of course, for short-term backups of things we back up frequently like payroll data, it seems ridiculous to rely on a technology that would have such a long lead time on restoration. DVD is probably not going to be unusable in the next decade or so, so if that's your lifetime there's no need to look for another solution.

Owen on August 2, 2009 1:21 PM

I have this idea of an application: it will control a huge printer that will spray the mineral paint over the wall of a cave, forming a human-readable backup of your data.

Version 1.0 will aim to store up to 500 MB per cave and your data will be safe for at least 32,000 years, or your money happily refunded!

Eugene on August 2, 2009 8:54 PM

Is there a world-wide standard specification for this paper-based compression/encryption system? Like Microsoft's 'open' xml?

What about i18n?

Sam on August 3, 2009 12:09 AM

This post made me think of this:

Storing hundreds of gigabytes of data on a sheet of paper - http://www.arabnews.com/?page=4§ion=0&article=88962&d=18&m=11&y=2006

It sounds to good to be true, and it is - http://arstechnica.com/hardware/news/2006/11/8288.ars

Jesper on August 3, 2009 1:18 AM

As said before, April 1st was months ago.

Herman on August 3, 2009 1:51 AM

Hmm... so how about combining the PaperBack encoding with a MakerBot-like device and create OpenRosettaStone? PlasticBack FTW!

oliver on August 3, 2009 4:44 AM

>There's a coding error on the IBM JCL punched card: should be no
>space between
>SLINK,
>and
>TESTPGM=

>Erica on July 31, 2009 2:55 PM


Not necessarily an error, that could just be a comment.

Cory A. on August 3, 2009 5:42 AM

I've noticed a few complaints about acid in paper... very recently I went to look about "archival" paper and found that really almost any laser paper that was sold at my local Staples or whatever was acid-free at this point. As far as I can tell the obsession with acid in your printing paper is rather out of date.

Shmork on August 3, 2009 6:17 AM

Can you provide the stackoverflow datadump in this format? :D

Brian on August 3, 2009 8:22 AM

Made me think of Futurama (S02E03) oddly:

BENDER: "They must just wanna see that episode. Let's find a tape and give it to 'em."

AMY: "There aren't any copies left."

FARNSWORTH: "No, there wouldn't be. Most videotapes from that era were damaged in 2443 during the Second Coming of Jesus."

rwaggie on August 3, 2009 8:36 AM

PaperBack reminds me of a pair of programs I had on my Amiga back in the day of 300 baud modems and dot matrix printers.

You would run the program (I forget the name) and open the file you wanted to send to your buddy. It would encode the file to something like data matrix, then print it all out on the printer.

You would take all the sheets and mail them to your friend who would run the second program that would scan everything back in and decode it to the original file.

I'm pretty sure that's the original way PGP made it over to Europe back when you couldn't export it by disk or modem. A nice little loophole in the law. ;)

Paul Davis on August 3, 2009 8:46 AM

If you want to archive to a medium that will never detiorate, TwinkyBack is the only way to go.

EBGreen on August 3, 2009 9:41 AM

This is one of the stupidest ideas I've heard in a long time...

K|O|GI on August 3, 2009 10:09 AM

Sweet.

I made a change to one word. Please back it up again.

Cordially,
Your paper supplier

AC on August 3, 2009 10:37 AM

My company uses xml to archive stuff, we assume it will still be readable 50 years from now. When you store stuff in an rdbms like MS SQL or MySQL you are far less save.

Theo on August 3, 2009 1:50 PM

Apr 1 was....months ago?

I suggest backing up your data by smearing feces on a wall. You would be surprised at the compression levels you can achieve. I'll bet your life's work can be represented as a single shit smear.

Anon on August 3, 2009 8:01 PM

Man this is a nice little program, here I've gone and made a Wikipedia page for it: http://en.wikipedia.org/wiki/Paperback_%28software%29

Warll on August 3, 2009 11:51 PM

How does paperback deal with aged paper? Faded ink? Torn or curled pages?

We need some kind of permanent, cheap, long term storage, like some kind of sci-fi crystal storage device. That, or DVDs that are readable after 10 years.

Jamie on August 4, 2009 4:38 AM

Is it just me or do those things look like Conway's Game of Life?

Ryan on August 4, 2009 7:41 AM

You idiots, don't waste trees. How often have you actually *needed* to retrieve anything that you have backed up? If you just can't live without your digital pleasures, you should probably re-evaluate your life.

Mike Judge on August 4, 2009 8:19 AM

First you need to decide what it is that needs long-term archiving. No one sensible will whiz off 1e6 sheets to save /tmp on a daily basis.

Really important things that people might care about 50, 100, 250 years from now, well, we often call those 'books' and publish them en masse redundantly distributed in an unencoded format.

Its of course impossible to know what will be important "in the future" but hey that's not a new problem and outside the scope of storage medium (the "save it all" arg is silly if you look at all the cache and buffer crap).


We forget that "data" isn't the problem, but "information" is. For truly long term storage you need to use an editorial mindset to try to choose wisely what might be readable long term. Eg. NASA space probes, rather than assuming 800bpi tape would last forever, summations of data sets would have been nice to have, now, even though at the time they could know it's just a subset.


tomic on August 4, 2009 1:39 PM

One Indian student seems to have used such idea to compress huge amount of data few years back. Searching in google provide me with the following results.

http://en.wikipedia.org/wiki/Rainbow_Storage


Very interesting to read:
http://digg.com/tech_news/Indian_student_develops_paper_based_storage_system
http://www.theregister.co.uk/2006/11/23/rvd_system/

carlin on August 4, 2009 1:57 PM

@Chris S:

DVDs and even older CDs have to be kept in dark(er) place and preferably inside plastic case (or on a spindle). Just recently, I found a batch of bakup CDs from 1998-1999 and they were in perfect working order. I had some no-names ones, and couple of Verbatim/BASF.

I also had a look at some CDs/DVDs that have been on the CD shelf and they do show some yellowing, although they still work.

Media is very sensitive to (extended exposure to) UV radiation and should be handled/kept accordingly.

captcha: In tops

securityhorror on August 4, 2009 2:35 PM

RE "I like big butts and I cannot lie!"

That's what is in each of the bar codes!

Philip on August 4, 2009 7:03 PM

Real man don't do backups. Real Man use the internet.

sunfire on August 5, 2009 4:42 AM

The character counts that you quoted for Data Matrix, QR Code, and Aztec Code seemed surprisingly large until I realized that you quoted the *maximum* number of characters that can be stored in those formats. The images that you show are much smaller than those needed to store the maximum number of characters. The formats are in fact approximately 1 bit per little square, as one would expect.

Michael Haggerty on August 5, 2009 6:56 AM

Since I'm too lazy to go check if nobody else is suggesting it, may I just add: if you're going that far, why not back up to microfiche? With the pixel-codes? I've seen the whole King James bible (in text!) stored on a piece of microfiche the size of a largish postage stamp.

Penguin Pete on August 5, 2009 1:36 PM

Who guarantees that these applications will be ported to *all* future systems and platforms ?

Manuel on August 5, 2009 1:54 PM

This is an interesting technology for the > 20 year storage market.

For data such as emacs, which is widely distributed and hence will probably never "die", archival isn't very useful - people will be using it until something revolutionary comes along.

For data such as(example), the original video feed from the Moon landers; that should be put in a very platform-independent, long-lasting, readable-to-all-for-centuries format. Quality paper is a great option for the several-hundred year range.

Paul Nathan on August 5, 2009 2:04 PM

I tried PaperBack, and was somewhat disappointed. First, forget your inkjet. Laser printer is a must. Second, ability to recover data from damaged paper is rather low...

Azarien on August 5, 2009 3:42 PM

If you want something digital but even more enduring than ASCII, check out "Arecibo Ascii" (USCII). I made an online converter to demonstrate the idea:

http://hostilefork.com/demos/uscii-5x7-english-c0/encode.html

"While ASCII uses small arbitrary values (e.g. 65 for 'A', 66 for 'B'), USCII encodings use much larger values (e.g. 15621226033 for 'A', 16400753439 for 'B'). These unusual numbers were chosen because they mathematically contain bitmaps of the symbols they represent! Using a technique modeled after SETI's Arecibo Message, semiprimes are employed to suggest the two-dimensional decoding of these bit patterns."

But perhaps I should find better things to do with my free time. :)

Hostile Fork on August 5, 2009 4:28 PM

I just wanted to add two more points:

1. Thank you for returning to a topic that more technical and less esoteric. I’ve been agitating for this for a number of posts. This is a great discussion point and you've added info many will find useful.
2. If it wasn’t for paper backup, I would have been in the poop recently.

I was working on an important report and I had just plugged in the printer to print the final version when my computer released its smoke. For those who don’t know, computers aren’t based on electricity, they are based on smoke and when it escapes they no longer work. Whatever happened blew every device connected via USB, including the backup USB device still in the USB port. I had my laptop on me, but the Mac Book Air didn’t have a DVD/CD drive and with my USB backup toast I thought I was doomed.

Luckily I had printed a draft. I was able to scan in the report and via OCR recover the entire last draft I had done. I re-did the corrections and just met the submission time. It didn’t even require re-formatting, it picked up everything.

Without paper back I would have been STUFFED!!!

Philip on August 5, 2009 5:52 PM

Instead of using merely black and white dots, why not use several colors? I'm thinking RGB + B/W. So instead of storing the info in a binary form, you can have a 5-bit system. The 5-bit integer string of 0s, 1s, 2s, 3s, and 4s that comes out as a result of reading the paper can then be converted to a more conventional binary string of 0s and 1s. Wouldn't we be able to store loads more and make paper backups actually viable?

It's getting kind of late here, but if anyone's interested (and I might be), I'll do the rough guesstimating math tomorrow.

Andrew Szeto on August 5, 2009 11:35 PM

In response to myself, I just read the Rainbow Storage link. Yep, turns out my idea was already taken. Darn.

Andrew Szeto on August 5, 2009 11:38 PM

nice article thanks for sharing, wish you continued success

atatürk havalimanı araç on August 6, 2009 5:01 AM

This should be a joke, it does not make any sense if you can't restore PaperBack software after 100 years. If I have a chance to store PaperBack software, I would certainly prefer to store my important files, not the PaperBack software.

Moosty on August 6, 2009 5:28 AM

Am I the only one who sees here a possibility to steal sensitive data this way and smuggle it pass security checks?

Carlos on August 6, 2009 7:30 AM

A tiny bit late to the party. Dan covered this months ago: http://www.dansdata.com/gz094.htm

James on August 6, 2009 11:10 AM

Given that laser printer emissions are toxic, it's a good idea to take a pass on this technique.

John on August 8, 2009 3:58 AM

...And photocopiers for that matter. Similar technology.

John on August 8, 2009 4:09 AM

I find this a very interesting conversation actually. My Grandmother, bless her uncommunicative mouth, recently passed away leaving us with a nice steamer truck full of family memorabilia. Almost all of it was useless because there is no oral history to go along with it. So we have hundreds of wonderful old photos of people we will now never ever know. And these photos from the early part of the 1900’s are fading as well. So in less than a single generation every memory of my family is now gone. Storage techniques are exactly like this. When was the last time you pulled out your old family pictures from the late 50’s early 60’s? Got any Super 8 film movies or VHS tapes of your family and friends or important milestones in your history? While it is important to have records of bank statements and credit records, once you die and your contemporaries join you in the permanent dirt bath, your life history will be gone in a very short period of time. Particularly if you are depending on modern technology to preserve your families history.

Craig on August 10, 2009 8:49 AM

And we wonder why people refer to us as geeks... As the Rosetta stone has been around for a very long time, perhaps we should backup our data onto stone tablets. I mean, that's the point of a backup: longevity... I'm not sure what bothers me more: that someone wrote a program to do this, or that someone actually uses said program.

caloggins on August 10, 2009 4:16 PM

@term paper writing

Hello spam bot!

Practicality on August 13, 2009 6:41 AM

Why did you delete my comment about the previous comment being spam, but not delete the actual spam?

Practicality on August 13, 2009 6:49 AM

It might not be at all practical, but it certainly is interesting, and pretty damn awesome

Michael on August 13, 2009 1:27 PM

"There's a coding error on the IBM JCL punched card: should be no space between SLINK,and TESTPGM="

If you weren't female and this wasn't inappropriate, I'd tell you to stop talking dirty to us.

Seekr on August 22, 2009 6:28 PM

And we wonder why people refer to us as geeks... As the Rosetta stone has been around for a very long time, perhaps we should backup our data onto stone tablets.

abercrombie and fitch on August 28, 2009 12:18 AM

Tiffany Jewellery barely 2-year-old result called Iridesse is set to the more Tiffany Key Rings South Coast Plaza setting was the jeweler’s supreme tome branch stockTiffany Bracelets diamonds are about more than absolute condition, cut and beauty - they are one of our diamonds underscores.Tiffany Sets reputation as a world premier jeweler synonymous with diamonds of the finest feature,” added Bennett.

tiffany jewellery on August 28, 2009 2:29 AM

The key lies on who does the translation.

It is the translator who makes the translation an efficient tool or not. Whether it is a punch card or an alphabet it does not matter. My point is that the translation may involve a highly abstract concept that only the translator (ie. human being) can convey. Machines are not capable -yet- of this level of abstraction.

JJ on September 1, 2009 11:39 AM

As a few have stated, it wouldn't be very useful for backing up my MP3 collection but half a megabyte of data isn't without it's uses. It's easy for people to forget these days that a single character is still just a single byte and ~500,000 characters can contain a lot of important information. I can think of a few applications for this. I need to hang on to tax records, receipts, bank statements for about 5 years, well before any significant paper or ink decay set in I would think. Obviously I wouldn't consider these a replacement for digital backups but I do think that if you give me a 500kb file and a piece of paper I'll be far more likely to locate the piece of paper five years later.

Besides, it's a single piece of paper. If my tax software had a button "Print paper backup" I'd say, "Why not?"

Spencer Ruport on September 1, 2009 12:23 PM

NASA space probes, rather than assuming 800bpi tape would last forever, summations of data sets would have been nice to have

yazgülü on September 8, 2009 3:53 PM

Thanks for your information, i have read it, very good!

ed hardy shirts on October 23, 2009 7:08 PM






(no HTML)


Verification (needed to reduce spam):


Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.