July 31, 2009
As programmers, we regularly work with text encodings. But there's another sort of encoding at work here, one we process so often and so rapidly that it's invisible to us, and we forget about it. I'm talking about visual encoding -- translating the visual glyphs of the alphabet you're reading right now. The alphabet is no different than any other optical machine readable input, except the machines are us.
But how efficient is the alphabet at encoding information on a page? Consider some of the alternatives -- different visual representations of data you could print on a page, or display on a monitor:
5081 punch card
up to 80 alphanumeric characters
up to 93 alphanumeric characters
up to 2,335 alphanumeric characters
up to 4,296 alphanumeric characters
up to 3,067 alphanumeric characters
High Capacity Color Barcode
varies by # of color and density; up to 3,500 characters per square inch
about 10,000 characters per page
Paper the way we typically use it is criminally inefficient. It has a ton of wasted data storage space. That's where programs like PaperBack come in:
PaperBack is a free application that allows you to back up your precious files on ordinary paper in the form of oversized bitmaps. If you have a good laser printer with the 600 dpi resolution, you can save up to 500,000 bytes of uncompressed data on a single sheet.
You may ask - why? Why, for heaven's sake, do I need to make paper backups, if there are so many alternative possibilities like CD-R's, DVDÂ±R's, memory sticks, flash cards, hard disks, streaming tapes, ZIP drives, network storage, magneto-optical cartridges, and even 8-inch double-sided floppy disks formatted for DEC PDP-11? The answer is simple: you don't. However, by looking on CD or magnetic tape, you are not able to tell whether your data is readable or not. You must insert your medium into the drive, if you even have one, and try to read it.
Paper is different. Do you remember punched cards? For years, cards were the main storage medium for the source code. I agree that 100K+ programs were... inconvenient, but hey, only real programmers dared to write applications that large. And used cards were good as notepads, too. Punched tapes were also common. And even the most weird encodings, like CDC or EBCDIC, were readable by humans (I mean, by real programmers).
Of course, bitmaps produced by PaperBack are also human-readable (with the small help of any decent microscope). I'm joking. What you need is a scanner attached to your PC.
PaperBack, like many of the other visual encodings listed above, includes provisions for:
- compression -- to increase the amount of data stored in a given area.
- redundancy -- in case part of the image becomes damaged or is otherwise unreadable.
- encryption -- to prevent the image from being readable by anyone except the intended recipient.
Sure, it's still paper, but the digital "alphabet" you're putting on that paper is a far more sophisticated way to store the underlying data than traditional ASCII text.
This may all seem a bit fanciful, since the alphabet is about all us poor human machines can reasonably deal with, at least not without the assistance of a computer and scanner. But there is at least one legitimate use for this stuff, the trusted paper key. There's even software for this purpose, PaperKey:
The goal with paper is not secure storage. There are countless ways to store something securely. A paper backup also isn't a replacement for the usual machine readable (tape, CD-R, DVD-R, etc) backups, but rather as an if-all-else-fails method of restoring a key. Most of the storage media in use today do not have particularly good long-term (measured in years to decades) retention of data. If and when the CD-R and/or tape cassette and/or USB key and/or hard drive the secret key is stored on becomes unusable, the paper copy can be used to restore the secret key.
For paper, on the other hand, to claim it will last for 100 years is not even vaguely impressive. High-quality paper with good ink regularly lasts many hundreds of years even under less than optimal conditions.
Another bonus is that ink on paper is readable by humans. Not all backup methods will be readable 50 years later, so even if you have the backup, you can't easily buy a drive to read it. I doubt this will happen anytime soon with CD-R as there are just so many of them out there, but the storage industry is littered with old now-dead ways of storing data.
Computer encoding formats and data storage schemes come and go. This is why so much archival material survives best in the simplest possible formats, like unadorned ASCII. Depending on what your goals are, a combination of simple digital encoding and the good old boring, reliable, really really old school technology of physical paper can still make sense.
Posted by Jeff Atwood
Paper backups... there's an app for that.
I wish someone would make a whole disk (like disaster recovery) version of paperbak...
I would love to say, see that ream of paper...yeah thats my backup.
Or in actuallity having a single pdf that contained everything would be supremely awesome
The reason you don't use paper is because the acid in the paper makes everything disappear (relatively) quickly.
There's a coding error on the IBM JCL punched card: should be no space between
That's pretty cool information. Thanks Jeff! I never knew you could store so much on one of those image maps. Crazy!
This assumes either the software or the computer system is useable in a hundred years...
I've been wondering how efficient paper-based backup could be. That's pretty impressive. ~Half a megabyte of uncompressed data is nothing to sneeze at. You could squeeze a whole lot of 7zip compressed plain-text documents into that space; easily an entire book on a single page. With formatting things get larger, but c'mon. An entire book on a single page.
The problem of course is making sure it's still readable in a couple hundred years, but if you use decent paper and slap on a few cover pages or so documenting your algorithms I don't see why it would be an issue.
Paper that lasts a century is easy enough to make or buy, but it's definitely not the common photocopy paper you use by the ream every day. That stuff is acidic and the ink is also fragile, so it's a race between the paper eating the ink and the ink decaying on its own. Give it 20 years and you won't be able to move it without it falling apart either.
So as long as you're using a high quality acid-free archive quality paper, your laser printer is in good condition (especially the seal roller) and you store the paper in a way that minimises exposure to air and moisture (making a thick stack is good), you should be fine for 100 years.
Personally, I'd be printing the coding scheme in human-readable text on every 100th page, and making very sure that each page is independently readable. So backing up your huge pr0n collection... not so much.
Well, the intro was pretty cool. Human OCR is pretty awesome.
Storing data on paper... All kinds of problems: 1. Expensive. 2. Volatile (Paper that lasts needs to be bound, which adds more to #1). 3. Storage Space (as in actual room for the paper). 4. Security Problems.
Also, what terrible storage space! 500k per piece of paper? Let's see, to store 1GB I would need... 2,000 sheets of paper. To back up my server, I need, 2 MILLION SHEETS OF PAPER (at least).
Although, that may be fun for some spy type stuff.
Actually, the longevity is likely better than you'd think. My company just went through 27 years of project archive boxes to reduce their volume. The oldest boxes had numerous program listings printed by dot matrix on "ordinary" grade line printer paper, and had been stored in "ordinary" grade banker's boxes. There was some fading and some yellowing, but by and large everything that came out of the oldest boxes was about as legible is it had been when stored.
The biggest issue is longevity of the file format itself. Its probably best to store things in lowest-common denominator formats. Plain text where possible, for instance. Otherwise, formats that are widely documented such as JPEG and PDF are probably safe enough. For a less well known format (such as that collection of digital negatives in Nikon raw files) it might be a good idea to include a source kit to a program that can read the file.
Unless I'm being daft, but a page would equal less than half a MB?
"Or in actuallity having a single pdf that contained everything would be supremely awesome"
Oops. The rats ate my backup.
This reminds me of the Rosetta Project run by the Long Now Foundation.
When it came time to make a long lasting archive of all of their data about human languages, they ended up going with essentially high precision bronze-age technology. Words engraved in metal.
They see your PaperBack and raise you StoneBack.
More on the Rosetta project here if you are in to that kind of thing.
Twibright Labs, which developed the RONJA (Reasonable Optical Near Joint Access) open-source system for open-space optical networking, has created a similarly open-source system called OPTAR for storing data on paper. The system stores 200KB on an A4 page, with error correction. The provide the following reasons for the technology:
- Long-life storage. They point out that microfiche panels (which could be used to store quite a bit of data via an imagesetter) have an estimated life of 500 years in air conditioning, much longer than common data storage formats.
- Legal requirements. The law requires that certain kinds of records be kept on paper (for example, notary journals and financial reports per Sarbanes-Oxley). OPTAR satisfies those requirements while storing data in a directly machine-readable format.
- Inclusion of digital information on printed materials. The example of ringtones printed on paper is given, where a cellular phone camera could read the data.
Archival life and legal compliance seem the most compelling reasons, particularly the applications in satisfying corporate compliance law without sacrificing computer readability of paper records that can serve as backup.
They also mention usage in IP-over-avian-carrier implementations, per RFC1149 and RFC2549.
I like big butts and I cannot lie!
Wonderful! In 100 years your great-grandkids can find your paper backups of your never-published memoirs in the attic! And if they're very lucky they'll remember something about you using this PaperBack program to generate them. Now all they have to do is load the program on their quantum computers!
Let's see, there's an old CD of that program here, but its data layer degraded before their parents were born. Oh good, there's also a paper backup of the program.... in PaperBack format.
Hm. Maybe your PaperBacked memoirs can be hung on the wall as antique art instead.....
"Oh good, there's also a paper backup of the program.... in PaperBack format."
BWAAAAAAAAAAAAAAHAHAHAHAHAHAHAHAH. Ive been there xD.
[quote]Another bonus is that ink on paper is readable by humans. Not all backup methods will be readable 50 years later, so even if you have the backup, you can't easily buy a drive to read it. I doubt this will happen anytime soon with CD-R as there are just so many of them out there, but the storage industry is littered with old now-dead ways of storing data.[/quote]
Yet in 50 years no one will know how to read the 50 yo so called compressed paper alphabet and we will be back to square one.
That's actually really interesting, I'm going to look for a Linux compatible paper storage solution.
I'm guessing you could probably dump an ASCII version of the encoding algorithm onto a couple of sides of A4
I've been interested in paper-based backup solutions for a while. I find it interesting that we can still use punched card and tape from 40 years ago without any problem, assuming we have working readers, but most original magnetic storage media has degraded so much that we can't read it at all.
One other thought with this solution. I agree that we have to be careful what format the documents are stored in. We also need to make sure that we don't assume 8-bit encoding, or ASCII, or anything else that could so easily be overlooked, that we take for granted.
Paper also has the advantage that humans can use nothing but their eyes to determine the degradation of their backup. You can look at a sheet of paper and say "This is still very good, it can easily last a few years longer" or "Wow, this is about to fade, we better renew the backup". With a CD or DVD, you can't even tell with your eyes whether it's still good or already broken.
I think you're way out where you shouldn't be. Isn't this a case of programmers being so much programmers they get things backwards now?
Actually, this is something that national libraries are bound to look into.
In most countries a national library is charged to preserve all books, newspapers and magazines that have been published in the country, forever (In the US the Library of Congress fulfills a similar role, as far as I know).
Nowadays this has been expanded to include digital materials -- websites that are located in the country and suchlike. Also, in order to increase the usability, there are programs in virtually any national library, to digitize existing paper books.
There are obviously many problems dealing with ever increasing volume of data, even more so in digital form, where your best bet for web harvesting, for example, can only be to carefully select and harvest some snapshots of what you deem relevant of your country's web.
However, for national libraries, the problem of storing digital for many hundreds of years -- which is nothing unheard of for books, indeed -- is particularly expensive to solve.
Preserving paper is something libraries are good at already, so something that lessens the upkeep (particularly energy) costs for preserving digitized materials is quite welcome.
So how much paper would it take to back up... the internet?
Jeff celebrates April 1st 4 months later in August 1st.
May not be so obvious, but it's got to be a joke.
Same problems today as they were yesterday, which result in trying to go back to old solutions. PaperBack is probably just a very conscious ironic statement.
Particularly elaborate it seems, as it apparently also demonstrates how some people are willing to be misguided by all the lights, taut it as cool and useful, and ignore such important concepts as Green Technology, or even realize the storage capacity is not even close to be comparable to today's data storage medium. 2 DVD-ROMs surface for instance, fit perfectly on top of a A4 page. Let's be modest and assume only a DVD-4. That's close to 10GBs as opposed to the A4 page 500Kb.
The problem is longevity. We have warehouses of tapes from the NASA apollo missions that are completely unreadable. What makes you think the DVD format is so special that it will still be around in 30 years? When's the last time you saw an 8-inch floppy drive at COMP-USA? Paper has two things going for it that no digital technology can match:
So look at the facts:
1. We know that paper can last a long time, under the right conditions, (and even under not so ideal conditions), because, well, we have some really old paper.
2. Most of our digital technology is really new. The bits of digital technology that are not new, are completely useless nowadays to all but the most rare of experts on the technology, and the luck to get some of the old machines working. (which is not so straightforward, always, as they may need parts that are no longer manufactured)
3. You toiled years on that program, or that data, or that novel, you want it to still be around in 30 years or what? Look at what's still around from 30 years ago, and make your choice.
Replication of data is the answer - not paper. Replication in format across multiple distribution mediums, down through the years, is the real long term storage strategy we must be moving towards.
Paper didn't really exist as a feasible way to keep information prior to the printing press. Today we can store impossible amounts of data in cheap and effectively ways that would even impress dear old Gutenberg.
We need to look at improving the quality of how we electronically save data so that it may be archived effectively for a long period of time. Keeping the data 'live' i.e. (on active medium which is current) seems to be the standard approach.
The answers are already appearing with cloud computing and other services that can be used for this type of operation. The company is paid to provide a service and it is the business of the company to transfer data that is archaic in an obsolete file format to a usable later version based on the ongoing importance of the information i.e. someone who is willing to pay.
These are the modern day printers, busily making copies of information that is still deemed important to someone, somewhere for a fee.
Jeff's cute idea is irrelevant and outdated. Sure, look to the past for many things to help guide us in the future (e.g. morals, principles, lessons of past failure, strategies for governance etc) but not technology. We haven't reached the computing end game yet and going back to paper is a complete cop out.
Sorry, got my wires crossed a bit.
paper's got two things going for it: Longevity, and Visibility. You can see what's printed on the paper. You can't see the orientation of the iron atoms glued to a disc of plastic, or on the surface of a platter, or in the burnt away bits of coating on the surface of a 'cd-r', which does not even have the advantage of being pressed in metal as a manufactured CD or DVD has.
@vince. Tell that to the thousands of geocities customers. I really have absolutely no faith in the longevity of my data in the cloud, and what you suggest in terms of replication is really like relying on a dead man's switch to prevent the launch of a nuclear missile.
We need passive systems that can retain data without intervention. There's really a finite amount of time that we can devote to retaining data in this way that you suggest. We can probably even come up with some mathematical formulation, charting the growth in the amount of data which needs to be preservered, against the amount of time it will take to replicate it. How quickly will the former outpace the latter?
I'm not sure what is meant with longevity. Modern DVDs produced with super cyanide (Tayto's) or metal-stabilized Cyanine (TDK's) have lifespans of approximately 70 years. Far larger than the lifespan of the technologies used to read these formats. Even the 30 years lifespan of cheap DVDs is an almost guarantee of a larger lifespan than that of their format support.
Similarly, the paperback solution faces a high media longevity, but no promise of media support. In fact, it presents the huge problem of being also dependent on the encoder/decoder availability in 100 years, probably demanding a legacy system just for the purposes of restore.
And then there is also the huge archiving problem it presents. Let's not be shy here, you would need 8,000 A4 pages to match the capacity of a single 4GB DVD. And then you then need to ask, how long will it take to scan 8,000 pages so I can have my restore?
A marginal gain in longevity does not immediately make all the obvious problems, irrelevant. So, I keep sustaining this is just a joke.
Right at the top of the page it says "Olly, the author of OllyDbg, presents his new open source joke"
Maybe everyone is taking it too seriously?
I don't get the point of all of this... And no, I don't think our alphabet to be inefficent, it's simply tuned for fast loading. Have "phun" reading data saved that way!
This is a _bad_ idea. Have you ever printed a stack of paper and come back to it 3 years later? All the ink has started sticking to the page above and when you pull it apart it looks pretty bad. I can't imagine doing this!
Ah, paper. The perfect and longest lasting storage medium!
Well, I certainly know that Adobe and Monotype keep all their fonts on punched paper tape (with instructions on how to read the tape written in pencil at the top)! Highly efficient and you can easilly build a paper tape reader with nothing more than basic workshop skills. Knowing the speed and reading method (always simple) it is possible to reconstruct the binary file.
I also know that in the 90's they stored all the source code for applications such as Photoshop and Pagemaker on paper tape as well (don't know if this is still the case).
Makes a lot of sense. I have some inherited punched paper tape data from 40 years ago and its still in perfect condition. I have the reader (mechanically very fast if a bit noisy as it rattles through) and it still works. An RS232 port provides the output. I can read the data into a pc and write the raw data into a file on disk or process it further directly as a data stream.
There's a lot more of this kind of thing going on still than you might realise.
I think the more important question is, what information would I like to store 50 or 100 years? Programs that are 20 years old are already out of date and basically useless. The important programs they will be left running and therefore be accessible.
"Modern DVDs produced with super cyanide (Tayto's) or metal-stabilized Cyanine (TDK's) have lifespans of approximately 70 years."
According to that rosetta project site, paper can last for thousands of years.
"Similarly, the paperback solution faces a high media longevity, but no promise of media support. In fact, it presents the huge problem of being also dependent on the encoder/decoder availability in 100 years, probably demanding a legacy system just for the purposes of restore." not a problem to sneeze at for sure. But the source code is available, and can be bundled with the data. NASA recently made available the assembler source code for its apollo guidance computers. If you set your mind to it, you could easily write an emulator to run it, or hand port it to some modern language.
"And then there is also the huge archiving problem it presents. Let's not be shy here, you would need 8,000 A4 pages to match the capacity of a single 4GB DVD. And then you then need to ask, how long will it take to scan 8,000 pages so I can have my restore?"
Do you need your porn collection in 1000 years? I'm perhaps being overly modest, but I think I would struggle finding even 500MBs of stuff that I would want to last for that long. It would likely be source code, writing, and possibly drawings/paintings that are stored on paper in original form anyway. It's not that much of an issue for me to personally curate and maintain my own data while I'm still alive, but I don't expect anyone to care after I'm dead.
In any case, I don't think libraries will be discarding their microfiche machine in favor of DVDs any time soon. Well, not the clever ones anyway.
Not sure what your argument is Breton. If it is making a case for paper backup, I have news for you. We invented computers.
Well, I'm not sure I really have much of an argument. Merely an open question: How much about our current history will still be around in 30, 50, 100, 500, or 1000 years? Are we entering a new dark ages? The longevity of the mediums we're currently storing our personal histories into is unproven, because it has only existed for less than a century. We simply don't know which parts of it will last. However, we already know from experience which parts certainly WON'T last: Magnetic records. We have found the diaries of people who lived in the year 500. in 1500 years time, will there be any evidence that you ever even existed? What will the people of the future be able to discover about you? How would they do it? How much effort would they need to go to in order to decode it?
I can't recall anyone referring to CDC Display Code as "CDC" - it was always "Display Code." Other 6-bit encodings like BCL ("Burroughs Common Language") generally did go by their acronyms though.
There is still a TON of stuff around in EBCDIC though. No, not your Facebook ramblings but stuff like your bank accounts, driving record, criminal history, ... and EBCDIC is far from an IBM mainframe only encoding but people don't seem to know it. Heck, Windows supports EBCDIC yet at least for transcoding purposes, via Kernel32 no less (see MultiByteToWideChar with codepage 37 for one case).
ASCII was considered an interchange format, not something native for normal internal use.
"Well, I'm not sure I really have much of an argument. Merely an open question: How much about our current history will still be around in 30, 50, 100, 500, or 1000 years? Are we entering a new dark ages? The longevity of the mediums we're currently storing our personal histories into is unproven, because it has only existed for less than a century. We simply don't know which parts of it will last. However, we already know from experience which parts certainly WON'T last: Magnetic records. We have found the diaries of people who lived in the year 500. in 1500 years time, will there be any evidence that you ever even existed? What will the people of the future be able to discover about you? How would they do it? How much effort would they need to go to in order to decode it?"
Interesting questions indeed. I don't think anyone has an answer. However the, allow me, "computer revolution" is not much different from the printing press revolution when a new information storage media was invented. I'd say it is safe to assume as we continue this path we will constantly reformulate our current technology bringing new and better way to archive information. We do not tend however to go back. Especially because information density keeps growing and old methods become incompatible with that growth. It's hard to beat stone carving when it comes to longevity, and it's conceivable to think a robot and a piece of software could laser carve more information on a lime stone the size of my post-it than this PaperBack thing could on 10,000 A4 pages. Still... we don't see many of those around.
Storage media for long-term backup purposes is meant to be kept in protected environments that usually extends their life far beyond the announced material properties under normal conditions. In very good conditions, information on a DVD could last centuries. But even disregarding that possibility, it is widely acceptable and a common practice that when new better technology is invented and there is a port for that technology, all backups are converted into the new media. Much like paper has been being converted to digital format.
Not that other way around, eh.
"But even disregarding that possibility, it is widely acceptable and a common practice that when new better technology is invented and there is a port for that technology, all backups are converted into the new media. Much like paper has been being converted to digital format."
You mean it's a widely accepted belief that this happens? I think this is a vast overstatement of the reality of the situation. For instance, there's piles of music that is available on vinyl records, and not available on CD, and never will be, because it's not popular enough at this moment of transition. What if we change our minds later, and decide that music actually is interesting? Will it be too late to save it? Many of the records that will be lost will be rather interesting historical records of music in a particular time, as it was performed then.
There are piles and piles of films that were made in the 20th century, which requires some serious funding and efforts to be preserved. So much so that it must be demonstrated to have genuine historical and cultural value before it can be selected for preservation. We still haven't recovered the entirity of the movie "metropolis". There's sections missing.
I think the ease with which you are able to copy a file on a computer has the effect of distorting your perspective about how difficult it actually is to preserve what's important. How difficult it is to even decide what's important. Some of the most tragic losses have happened because it didn't seem important at the time.
Oh damn, I haven't even mentioned the effect of DRM, or hard drive crashes. We're already hearing about people losing valuable family photos to technical mishaps. You might comment that it's their fault for not backing up- But it is just much harder to crash a paper photo album. They don't have a tendency to spontaneously combust every 3-4 years like hard drives do.
We are talking of backups. Not preservation of originals. That's the only way to guarantee the preservation of information far beyond the point the original gets lost. As we speak, an European consortium of museums is undertaking a project (I believe called Europe 2.0) for backing up of their entire contents in digital format.
PaperBack is almost certainly an intentioned joke. Look at the pictures on the original article by Jeff. The last picture is a step backwards from the first. It's irony at its best.
Is any of this getting through to you? What good are backups if they don't last as long as the originals? We can spend thousands, perhaps millions of dollars to create a digital archive that is nothing but dust in 50 years, or needs another million dollar effort. Are magnetic disks really the best archival medium we can think of? Have we really thought the problem through?
Paper might sound like a joke to someone with digital myopia, but I'm not laughing.
> Is any of this getting through to you? What good are backups if they don't last as long as the originals?
I don't think you understand the purpose of a digital backup, neither you are familiar with its standard procedures.
Sorry, but I need to be somewhere else.
PaperBack also sounds like the only encoding mechanism that can be severely compromised by fly poop, dust, fingerprints etc.
Most of the encoding techniques are hardware based and just feed the encoded information to software to use. A software based solution requires that the software continue to be maintained over the many many OS changes etc.
I'll bet the original barcode scanners still work, you could probably get original punch-card readers on EBay - or even build your own. You can EASILY read 100 year old documents.
Getting software that is even 10 years old to work can often be a challenge.
You might want to calculate that into the value of a software based solution.
Oh, and a page full of static might be harder to recognize as valuable information later on.
Funny quote from PaperBack site...
"Actual version is for Windows only, but it's free and open source, and there is nothing that prevents you from porting PaperBack to Linux or Mac, and the chances are good that it still will work under Windows XXXP or Trillenium Edition."
XXXP, Trillenium Edition, nice.
CAPTCHA: essayist helped
I just scanned and restored a file that I saved to paper at my office this afternoon. It works like a charm, at least at 200 dpi.
One advantage of optical storage at this scale as opposed to a CD/DVD is that it obviously contains information. The dot pattern is a modulated gray that under only a little magnification reveals structure. It would be obvious to an archeologist that there was information to be recovered.
That said, storage of a Rosetta stone in the form of source code as plain text on paper (PaperBack is under GPL so its possible, and in C++ so its likely to be well enough known to have existed) would certainly be a good idea.
People are talking about archiving on DVD here but I've got several high quality DVDs from a couple years ago that already can't be read due to problems. No way I'd trust DVDs for anything long term. To me they're a crap storage system that we'll be lucky to leave behind.
> I like big butts and I cannot lie!
LOL, I was waiting for someone to notice this. Thank you Wade-O!
> This is a _bad_ idea. Have you ever printed a stack of paper and come back to
> it 3 years later? All the ink has started sticking to the page above and when
> you pull it apart it looks pretty bad. I can't imagine doing this!
This problem was solved decades (if not centuries) ago. Air is what deteriorates paper, and if you prevent air from reaching it, the paper is preserved. Today we laminate, but for many people simple sheet protectors are good enough.
I think, right now, this is more practical for the "future civilization" use case for long-term archives of data. For it to be useful to a future civilization, we'd have to provide instructions for encoding/decoding and preserve them along with it, and assume this future civilization has certain technologies. The only things I can think might be useful for preserving for future generations involves text passages, and we already have standards for expressing ideas using text, so it may not be worth storing barcodes at all. Still, it seems like it'd be easy to put a few thousand sheets of paper and a few dozen pages describing how to decode them in an airtight, fire-resistant box. Compare this to storing a couple of DVDs in an environmentally-controlled container along with instructions for how to build a DVD drive. Of course, for short-term backups of things we back up frequently like payroll data, it seems ridiculous to rely on a technology that would have such a long lead time on restoration. DVD is probably not going to be unusable in the next decade or so, so if that's your lifetime there's no need to look for another solution.
The part on restoring, and what use is a backup if you cannot restore. They say to use just a regular scanner but based on experience with OCR the error rate for this would be rather high.
I have this idea of an application: it will control a huge printer that will spray the mineral paint over the wall of a cave, forming a human-readable backup of your data.
Version 1.0 will aim to store up to 500 MB per cave and your data will be safe for at least 32,000 years, or your money happily refunded!
My company uses xml to archive stuff, we assume it will still be readable 50 years from now. When you store stuff in an rdbms like MS SQL or MySQL you are far less save.
As said before, April 1st was months ago.
Hmm... so how about combining the PaperBack encoding with a MakerBot-like device and create OpenRosettaStone? PlasticBack FTW!
>There's a coding error on the IBM JCL punched card: should be no
>Erica on July 31, 2009 2:55 PM
Not necessarily an error, that could just be a comment.
I've noticed a few complaints about acid in paper... very recently I went to look about "archival" paper and found that really almost any laser paper that was sold at my local Staples or whatever was acid-free at this point. As far as I can tell the obsession with acid in your printing paper is rather out of date.
Can you provide the stackoverflow datadump in this format? :D
Made me think of Futurama (S02E03) oddly:
BENDER: "They must just wanna see that episode. Let's find a tape and give it to 'em."
AMY: "There aren't any copies left."
FARNSWORTH: "No, there wouldn't be. Most videotapes from that era were damaged in 2443 during the Second Coming of Jesus."
PaperBack reminds me of a pair of programs I had on my Amiga back in the day of 300 baud modems and dot matrix printers.
You would run the program (I forget the name) and open the file you wanted to send to your buddy. It would encode the file to something like data matrix, then print it all out on the printer.
You would take all the sheets and mail them to your friend who would run the second program that would scan everything back in and decode it to the original file.
I'm pretty sure that's the original way PGP made it over to Europe back when you couldn't export it by disk or modem. A nice little loophole in the law. ;)
If you want to archive to a medium that will never detiorate, TwinkyBack is the only way to go.
This is one of the stupidest ideas I've heard in a long time...
I made a change to one word. Please back it up again.
Your paper supplier
Is there a world-wide standard specification for this paper-based compression/encryption system? Like Microsoft's 'open' xml?
What about i18n?
First you need to decide what it is that needs long-term archiving. No one sensible will whiz off 1e6 sheets to save /tmp on a daily basis.
Really important things that people might care about 50, 100, 250 years from now, well, we often call those 'books' and publish them en masse redundantly distributed in an unencoded format.
Its of course impossible to know what will be important "in the future" but hey that's not a new problem and outside the scope of storage medium (the "save it all" arg is silly if you look at all the cache and buffer crap).
We forget that "data" isn't the problem, but "information" is. For truly long term storage you need to use an editorial mindset to try to choose wisely what might be readable long term. Eg. NASA space probes, rather than assuming 800bpi tape would last forever, summations of data sets would have been nice to have, now, even though at the time they could know it's just a subset.
DVDs and even older CDs have to be kept in dark(er) place and preferably inside plastic case (or on a spindle). Just recently, I found a batch of bakup CDs from 1998-1999 and they were in perfect working order. I had some no-names ones, and couple of Verbatim/BASF.
I also had a look at some CDs/DVDs that have been on the CD shelf and they do show some yellowing, although they still work.
Media is very sensitive to (extended exposure to) UV radiation and should be handled/kept accordingly.
captcha: In tops
How does paperback deal with aged paper? Faded ink? Torn or curled pages?
We need some kind of permanent, cheap, long term storage, like some kind of sci-fi crystal storage device. That, or DVDs that are readable after 10 years.
RE "I like big butts and I cannot lie!"
That's what is in each of the bar codes!
Is it just me or do those things look like Conway's Game of Life?
You idiots, don't waste trees. How often have you actually *needed* to retrieve anything that you have backed up? If you just can't live without your digital pleasures, you should probably re-evaluate your life.
Since I'm too lazy to go check if nobody else is suggesting it, may I just add: if you're going that far, why not back up to microfiche? With the pixel-codes? I've seen the whole King James bible (in text!) stored on a piece of microfiche the size of a largish postage stamp.
Who guarantees that these applications will be ported to *all* future systems and platforms ?
This is an interesting technology for the > 20 year storage market.
For data such as emacs, which is widely distributed and hence will probably never "die", archival isn't very useful - people will be using it until something revolutionary comes along.
For data such as(example), the original video feed from the Moon landers; that should be put in a very platform-independent, long-lasting, readable-to-all-for-centuries format. Quality paper is a great option for the several-hundred year range.
I tried PaperBack, and was somewhat disappointed. First, forget your inkjet. Laser printer is a must. Second, ability to recover data from damaged paper is rather low...
If you want something digital but even more enduring than ASCII, check out "Arecibo Ascii" (USCII). I made an online converter to demonstrate the idea:
"While ASCII uses small arbitrary values (e.g. 65 for 'A', 66 for 'B'), USCII encodings use much larger values (e.g. 15621226033 for 'A', 16400753439 for 'B'). These unusual numbers were chosen because they mathematically contain bitmaps of the symbols they represent! Using a technique modeled after SETI's Arecibo Message, semiprimes are employed to suggest the two-dimensional decoding of these bit patterns."
But perhaps I should find better things to do with my free time. :)
Real man don't do backups. Real Man use the internet.
I just wanted to add two more points:
1. Thank you for returning to a topic that more technical and less esoteric. I’ve been agitating for this for a number of posts. This is a great discussion point and you've added info many will find useful.
2. If it wasn’t for paper backup, I would have been in the poop recently.
I was working on an important report and I had just plugged in the printer to print the final version when my computer released its smoke. For those who don’t know, computers aren’t based on electricity, they are based on smoke and when it escapes they no longer work. Whatever happened blew every device connected via USB, including the backup USB device still in the USB port. I had my laptop on me, but the Mac Book Air didn’t have a DVD/CD drive and with my USB backup toast I thought I was doomed.
Luckily I had printed a draft. I was able to scan in the report and via OCR recover the entire last draft I had done. I re-did the corrections and just met the submission time. It didn’t even require re-formatting, it picked up everything.
Without paper back I would have been STUFFED!!!
The character counts that you quoted for Data Matrix, QR Code, and Aztec Code seemed surprisingly large until I realized that you quoted the *maximum* number of characters that can be stored in those formats. The images that you show are much smaller than those needed to store the maximum number of characters. The formats are in fact approximately 1 bit per little square, as one would expect.
Instead of using merely black and white dots, why not use several colors? I'm thinking RGB + B/W. So instead of storing the info in a binary form, you can have a 5-bit system. The 5-bit integer string of 0s, 1s, 2s, 3s, and 4s that comes out as a result of reading the paper can then be converted to a more conventional binary string of 0s and 1s. Wouldn't we be able to store loads more and make paper backups actually viable?
It's getting kind of late here, but if anyone's interested (and I might be), I'll do the rough guesstimating math tomorrow.
In response to myself, I just read the Rainbow Storage link. Yep, turns out my idea was already taken. Darn.
nice article thanks for sharing, wish you continued success
This should be a joke, it does not make any sense if you can't restore PaperBack software after 100 years. If I have a chance to store PaperBack software, I would certainly prefer to store my important files, not the PaperBack software.
Am I the only one who sees here a possibility to steal sensitive data this way and smuggle it pass security checks?
Given that laser printer emissions are toxic, it's a good idea to take a pass on this technique.
...And photocopiers for that matter. Similar technology.
And we wonder why people refer to us as geeks... As the Rosetta stone has been around for a very long time, perhaps we should backup our data onto stone tablets. I mean, that's the point of a backup: longevity... I'm not sure what bothers me more: that someone wrote a program to do this, or that someone actually uses said program.
I find this a very interesting conversation actually. My Grandmother, bless her uncommunicative mouth, recently passed away leaving us with a nice steamer truck full of family memorabilia. Almost all of it was useless because there is no oral history to go along with it. So we have hundreds of wonderful old photos of people we will now never ever know. And these photos from the early part of the 1900’s are fading as well. So in less than a single generation every memory of my family is now gone. Storage techniques are exactly like this. When was the last time you pulled out your old family pictures from the late 50’s early 60’s? Got any Super 8 film movies or VHS tapes of your family and friends or important milestones in your history? While it is important to have records of bank statements and credit records, once you die and your contemporaries join you in the permanent dirt bath, your life history will be gone in a very short period of time. Particularly if you are depending on modern technology to preserve your families history.