Adobe's Portable Document Format is so advanced it makes you wonder why anyone bothers with primitive HTML. It's a completely vector-based layout format, both display and resolution independent. With PDF, you sacrifice almost nothing compared to traditional book and magazine layouts except the obvious limitation of resolution. Here's Kevin Kelly extolling the virtues of PDFs:
A PDF is able to retain the highly evolved grammar, design and syntax that one thousand years of bookmaking has attained. Because of the idiosyncratic way web browsers work, designers do not have full control of what you as a reader see on the web. The web page, including its fonts, fonts sizes, and placement of material and size of the window, partly depends on the viewer's preferences. In my experience as a reader, a web designer, and a book designer, the reading experience on paper -- and PDFs -- is much more refined and elegant. As a publisher and designer I can direct the flow of attention with better tools (font choices, rules, lines, columns) and better control. The benefit to me as a reader is that this sophisticated design translates into increased clarity, smoothness, comprehension, and enjoyment.
But I have a problem with PDF files.
I'm not the first person to note the usability problems of PDF, but I consider this a classic case of worse is better. The advantages of PDF rarely outweigh the many disadvantages compared to plain old HTML. I suppose relying on PDF was more defensible in 2001, when browser printing support was notoriously poor, and HTML layout was not well understood. But it's 2008. I'm surprised how many authors still reach for the safety blanket of PDF when they and their audience would be much better served with modern HTML.
The other problem with PDFs is a bit more subtle. A PDF is not merely a PDF; it's a statement. An implicit protest against the terrible limitations of the HTML used by the unwashed masses. PDF content yearns to be free of the constraints of common HTML-- this content, you see, signifies something:
It seems that the PDF format signifies something now, and it's something more than just user inconvenience. In addition to requiring the user to shift mental modes, ("I'm seeing something designed as a PDF now, this must be serious information...") the requirement that a document either be downloaded or viewed in a context that's radically different from standard web pages seems like a subtle assertion of authority by a document's creator. The decision to switch from standard HTML to PDF isn't arbitrary, but it isn't based on technical requirements either. It's based on the value that an author wants to assign to the work, and it benefits from the still-prevalent, though rapidly fading, consensus that print work is somehow more inherently valuable and authoritative than web pages and other online content.
The massive inconvenience of PDF for the user rarely outweighs the minor HTML injustices righted through the power of PDF layout. Consider Kevin Kelly's own True Films 3.0 PDF:
Kevin went to the trouble of packaging this content up as a PDF, even adding Adobe's brand new support for contextual PDF advertising. All in the name of better formatting. But I don't see any advanced formatting here! Everything in that PDF would render perfectly as HTML. And it'd be better as HTML: easier to hyperlink and search, more accessible to a wide audience, and it would certainly generate greater advertising revenue through the existing web ad ecosystem.
I don't dispute Mr. Kelly's taste in movies for a second. And I worship at the altar of his Cool Tools. But I'll never understand how the founding editor of Wired could fall prey to such shallow PDF elitism-- completely missing the obvious and inherent power of the world's HTML common denominator.
PDFs rock when it comes to high-res professional quality printing, but useless as a web device. The packaging argument makes no sense. Why would you need to package and send a web page? That's what HYPERLINKS are for. And hyperlinking is even better than packaging, because the information in the package is only a copy, and may become deprecated, while the HTML source will always be the most current copy.
Mattkins on January 3, 2008 8:03 AMIt's a data black hole. That's the biggest killer.
We have plenty of customers who want to send us, in PDF, POs, even asset lists they want us to track, and expect us to be able to use that data. Just try to extract a multi-page table out of a PDF! People are using this as universally as a container, when in fact what it's containing is no longer data but a picture of the data.
This theoretical discussion about the needs of the printing industry versus the web is fine for us, but it's wholly lost on the actual users out there, in a land where my wife is queen of her department's computers because she actually knows how to zip files. To these folks, PDF is the defacto way that you extract content to transmit to others. If I had a dollar for every time I've been sent a PDF of a website, rather than just a URL...
Add to that the fact that they just don't work on my PocketPC, where I do a lot of my reading.
Surely in some cases it really is important to preserve content, and it is useful there. It's also useful to type into forms. But considering the ineptness with which some people are creating the content (e.g., columns too wide to scan with minimal eye movement), the freedom afforded by reflowing as browsers do is valuable.
Chris Wuestefeld on January 3, 2008 8:04 AMI have to admit I have a vote for PDF's (at least for now) cuz I found it very convenient to download. Some HTML pages are just painful to download and maintain as a whole. With a single PDF it is made easier to save and print. Many websites just fail to provide a printer-friendly page.
Michael on January 3, 2008 8:08 AM"...more defensible in 2001, when browser printing support was notoriously poor..."
Looking at stats from various websites, you see an alarming rate of user still stuck with ie 6 (like 70%). IE 6 couldnt even autofit a webpage to one paper, you get 2, or worse 4.
I think that the PDF format would be a lot more useful
if Adobe would strip out the 99% of the engine that you
don't need when loading it, or at least make it a
load-on-demand operation.
Google "acrobat lite" or something similar, it's already been done (it's a small app written by a guy in the UK that unplugs the 99% of Acrobat that's useless bloat).
Alternatively, just run Foxit.
Dave on January 3, 2008 8:12 AMSorry Jeff, I don't think you get why people use PDF. It's not because of the layout. It's because they expect someone to print the document and web browsers are impossibly lame when it comes to printing pages. The reason is obvious: pages are formatted to be viewed in browsers, not printed on paper, and it is the rare designer who comes even close to making a separate set of print settings in their CSS which look good (and even then you still are at the mercy of whatever default header/footer options the browser puts on, and even then you have no possible chance of real fidelity in the output).
As all designers know, HTML is meant to be different than print. It is dynamic, and it looks different from machine to machine. It is a different paradigm than the print world. That's all well and good, but sometimes you want to design things for the print world. You want things to look the same to everyone. You don't want to worry about shit getting out of alignment or some bit of text getting orphaned on the last page. Even modern browsers print most pages like shit (from a design point of view), something which is usually the fault of the designer but is something that's going to be a problem in general with browser printing.
PDF is about the print world. It's not meant for the web. The web simply becomes a distribution point for PDFs.
What does Paul N. Edwards have that page as a PDF on his website? Because he expects people to print it. Because he's a professor (a damned interesting one, too--_The Closed World_ is a fantastically interesting book). Because he distributes it to his students. Because he wants to make sure when he says, "look at page 4," that they all have the same page 4. Because students love loopholes ("Oh, well MY copy of that didn't include that clause, so you can't hold it against me!"). Because academics like fidelity--reliable reproducibility is the cornerstone of any academic work (or, as they would say in Edwards' field--which happens to be mine as well--immutable mobiles are necessary for knowledge generation areas as geographically large as the modern world).
It's true that some people might stupidly be using PDFs as a replacement for HTML, but I suspect they are in the minority here, and any attempt to understand why people use PDFs via this lens is going to be misleading.
Shmork on January 3, 2008 8:24 AMI have to agree with your article Jeff. I think that HTML is great and while PDF may excel in some areas this is more a call to continue refining HTML rather than to jump ship to PDF.
David Mackey on January 3, 2008 8:31 AMBy the way -- MY biggest problem with PDFs is that they are very hard to implement in software when it comes to READING them. If you are not 100% happy with the free drop-in ActiveX component (and there's a lot not to be happy about), then you're stuck with either licensing a third-party reader (at ridiculous prices) or trying to re-code a viewer from scratch that can handle the entire format and all of its backwards-compatible variants.
I think Reader has improved a lot in the last version but it still doesn't do a number of things that would be useful in my line of work. For example, it is incredibly hard to take notes in a separate file while viewing a PDF on the screen. As an academic, in a world where most academic work of the past is being scanned as PDFs on sites like JSTOR, this is something that I need to do practically every day. My fellow graduate students are CONSTANTLY needing to read PDFs and take notes in separate files (the "in-line" notes preferred by Acrobat are not all compatible with how most academics use notes and they have a lousy interface).
If the format was a bit more flexible--if there were better code snippets out there that would allow it to be easily used--it would not be hard at all to whip up a custom PDF viewer that worked for my purposes. But it isn't, and the only large codebase for viewing PDFs out there is for xpdf, which is massive and only in C++. I don't have the programming chops to adapt such a thing easily. Imagine how the format could be opened up, though, if people made simple libraries for VB.NET, PHP, RB, etc., that allowed one to raster pages and extract text reliably! Well, I know that I'd put it to good use, anyway...
Shmork on January 3, 2008 8:32 AMJohn S said:
1) Embedded font
Again, How do embedded fonts help the reader?, as I see this, a reader cannot choose a bigger font to ease it's reading (just zooming, which is quite nasty for multicolumn docs)
2) Whole document including images is one file. 1 document = 1 file. A "document" that is a directory of html file(s) and images feels unwieldy.
Whole HTML document including images is one URL. 1 Document = 1 URL.
I, as a reader, never see the files and directories. Why should I?
3) The text in the images is anti-aliased according to my preferences. If the diagrams in this document were inline html images, I would be at the mercy of whatever AA settings the author used when creating the image.
Exactly the same in HTML. If the OS has antialiasing, your text will be antialiased to your settings (damn IE not supporting SVG). If the author has put text on an image (and I've seen this on PDFs too) you are limited to the AA setting the author chose exactly the same.
4) Zooming retains layout.
Layout is not content. I don't want to see the layout, I want to see the content. And as others have said, zooming a multicolumn layout is heavily uncomfortable for the reader.
5) Annotation/comments (again all within the one file), side-by-side viewing, quick rotation of the page, other viewing-based features...
Blog comments (again all in one URL), you can open as many windows of the browser to view pages side-by-side, quick rotation of the screen, and the viewing based features are out of the content becuase they don't belong with the content.
---
Graham Stewart said:
What if you want to package up multiple pages (e.g. a manual) and make them available to everyone that visits your site?
Well, you set the manual on a webpage. If the users are browsing your site, they sure have a web browser, and it costs the same to publish (on Internet) a PDF or a webpage.
OTOH, if you want your users to print out the manual (or to have a copy to pass on to offliners), you could convert on the fly those HTML files to PDF for your users, there are lots of tools to do that in whichever language you are using to your webpage, be it php, java, asp, python, ruby, etc... (and here you can be totally independent of the browser quirks, so you just have to find the way to transform the page to something that will print fine).
---
And regarding to comments related to citations and page numbers, DAMN, this is what the ANCHOR 'a href... is for: you do the reference straight with an hyperlink (don't forget what the '#' means on a URL), no need for page-paragraph numbers that burdens the user making him search by hand, when you have already done the search and can provide him with an easier way.
The only backdraw to this is the volatile element the web has become. This is something forced onto us by people that doesn't understand the medium: ISPs provide such a lousy service with webpages that we usually need separate hosting, and the idea Tim Berners Lee had about people publishing research papers online is not sustainable anymore (well, maybe for works in progress). Even though free publishing platforms make permalinking a bit easier.
maeghith on January 3, 2008 8:50 AMI use PDFs only in cases where I am unsure how much access to and with which software people have. Mostly in cases where people have to see the information as it has been designed.
Browsers vary. Desktop software varies. Fonts vary. PDF solves all that, and that's the only time I think it makes sense. However, the biggest reason I don't use PDF even when I should is filesize.
But it's nice to know that even if someone doesn't have a PDF viewer (which I imagine is rare), they can download one for FREE and see what I need. The only inconvenience I'm putting on people is the download/install of freeware. That I can accept. Locking people out by sending them a format for which they must purchase software or convert the data first is just mean.
PDF is a great cross-platform solution to communicate with the assurance your readers are seeing things exactly as you want. The trick is knowing when things have to be seen as-designed and when they are acceptable otherwise (which are usually faster and more convenient).
Morning Toast on January 3, 2008 9:02 AMThe real problem with PDFs in a web context or for local usability is search. You'll find that often ligatures are not searchable in a PDF document. This makes the terms invisible to the world (i.e. Google), at least until someone fixes this problem.
Here's a real example from a blog entry I did on the mess created for search by PDFs: http://acl.ldc.upenn.edu/P/P06/P06-2051.pdf
Now try searching for the word 'fulfills'. It follows the phrase 'which fully' on page 397 (7/8 in the PDF page numbering).
The reason you can't see it is that 'fi' is what's known as a ligature, and in the interest of prettiness, PDF treats it as a unit, so it doesn't match the sequence of two characters 'fi'.
At least in January 2008, PDFs are nice for printing and layout, but lousy for search.
Bob Carpenter on January 3, 2008 9:04 AMI'd say it is a question of control over flow, attention, ads display, and restrictions of copying, printing and viewing. PDF gives more control to publisher and HTML gives more control to reader. Those who want to control readers, use PDF, those who want to cooperate, use HTML.
Iggi on January 3, 2008 9:05 AMMaybe an idea for an article: How does your website show up on paper? A lot of blog-sites show up horrible (not this site, though).
About PDF: it's horrible for programmers. You even have to do your own line-breaking (PDF doesn't know anything about soft line breaks).
Doekman on January 3, 2008 9:09 AMAs was mentioned earlier, use the right tool for the job. PDFs generated by marketing for brochures and such can just easily be uploaded and linked to on a web-site, rather than duplicating the information in HTML and CSS.
I rarely print to a printer, but use PDF Creator to print off information from Excel, Word, etc and then can share the PDF; they'll see what I see. Its especially helpful w/ people like my parents who don't have Word or Excel.
Jack on January 3, 2008 9:10 AMI jeff,
I read your blog daily and like it a lot. Thanks for your effort in keeping such a nice blog.
I usually agree with your points of view, but not this time.
I agree that a lot of people use PDF in a wrong way. And using it on a pure web context shoudl be avoided. The Jakob Nielsen article influenced my view since it was written and still agree with bits of it. But Adobe has done a long path with PDF, and I think it is a nice replacement for Post Script (its main purpose).
HTML and PDF address totally different objectives and should not be compared.
HTML is good for web publish with good search ability on web environments.
In my humble opinion PDF should be used mainly for distribution of material to be printed, or document digital archiving.
Most Operation Systems have a good native PDF support. Even Windows is getting better, and it is easy to get a good pdf reader nowadays.
You should not mix thing up. PDF HTML. Different objectives.
Of course the web masters should now this differences, and that is really the problem.
Sebastio arata on January 3, 2008 9:14 AMIE (at least IE6, dunno about IE7) does a lousy job in printing oversized web pages. It chops off content and can't resize.
A workaround is to offer a pdf version for printing purposes.
Note: I have no time nor desire to design another version of the same page just so that it prints properly in IE.
Those four problems you cite just aren't a problem on the Mac:
1. In OS X, Safari handles the PDF itself, so no need to link to an "out-of-browser experience".
2. In OS X, everyone has Safari installed. If you don't like viewing PDFs in Safari, Preview is even nicer. And everyone has Preview installed, too. See http://watchingapple.com/2007/05/tip-opening-safari-pdfs-in-preview/ for more details.
3. The layout control PDF offers is far superior to HTML, and quite exact. While I will agree that some people resort too quickly to PDF, it's a judgment call to say whether it's "mind-blowingly" better and whether HTML+CSS requires "no aesthetic loss at all", but PDF is certainly attractive and pleasant to use.
On my Mac, I smile whenever someone offers a PDF; in my experience, most Windows users frown for the platform limitations you've already cited.
4. Your argument for "one version of the content" would just as easily favor PDF. Plenty of content originates in more traditional, non-Web authoring tools, after all.
On the Mac, you can easily print any document to PDF with no loss in fidelity. Print, upload, link--you're done. Converting documents of any complexity at all to HTML would be significantly more difficult.
Graphic designers largely piss me off.
There's nothing like making something that's "super elegant" but annoying and then throwing a bitch fit when someone says, "Can you just make this so that the text is without all this other ... stuff?"
As most have said: it's print media vs. screen media.
I run Vista and I like it. No problem loading software to view PDFs. But I don't like to HAVE to save/download something to view it as a PDF when the information therein isn't meant to be sent to a printer for mass duplication.
N on January 3, 2008 9:32 AMThe problem stems mostly from Adobe which is on far too many machines. It is bloated and eventually causes other non-pdf problems. I think if you had a simple pdf viewer as the default, many of the problems discussed would just go away.
fxp on January 3, 2008 9:40 AMFor some reason I much better like to read very big documents in PDF than HTML. I think that big documents without external links have a good home in PDF files, small ones should be fine in plain text.
loki.jf on January 3, 2008 9:47 AMThe real problem with HTML vs. PDF is the difficulty with which layout is defined, not how suited either format is to it. You can do pretty much anything you can do in a PDF in HTML, but the problem is HTML is not really designed for static layout for a fixed page size - it's purposefully designed to be renderable in many different ways.
A well written HTML page will lay out well in everything from Lynx to IE7 to mobile browsers to FF2.x, both on the screen and in print form. That's because HTML only gives hints as to layout, which should be chosen such that if the browser chooses to ignore them or change the page size or the available fonts or use a keypad instead of a mouse the information is still reasonably presented. PDFs take the opposite approach and define exactly every aspect of rendering.
That's why HTML templates are so popular. You do the hard part, the layout, once. Test it in everything, make sure it prints okay etc, and the wedge the content into it. Wired.com still uses a FIXED WIDTH LAYOUT ffs. The reason I made by browser window 1280 pixels wide is that I want to see content presented over 1280 pixels. With HTML I am partly designing the layout.
PDFs are generally rubbish on screen anyway. Unless you have a nice 24" monitor you will either have to view them scaled down or only part of a page at a time, when they are obviously designed to be viewed in their entirety like a book.
MoJo on January 3, 2008 9:53 AMIt's funny that two of the arguments made in favor of PDF seem to be 1) PDFs are Portable and provide readers a precisely controlled experience on all platforms and 2) PDFs work waaaaay better on (insert favorite platform here).
Western Infidels on January 3, 2008 9:55 AMAnd it'd be better as HTML: easier to hyperlink and search, more accessible to a wide audience, and it would certainly generate greater advertising revenue through the existing web ad ecosystem. [...] But I'll never understand how the founding editor of Wired could fall prey to such shallow PDF elitism-- completely missing the obvious and inherent power of the world's HTML common denominator.
This text implies that the content only available as PDF, but Kevin Kelly's truefilms content *has* been, and continues to be, available as HTML:
http://www.kk.org/truefilms/index.php
It's difficult to see why you're calling it "elitism" when he's simply making his content available in both HTML and PDF.
I have no trouble with PDFs, just the Adobe reader.
They're lovely here, viewed with Evince Document Viewer (GNOME) on Linux.
I did a Google for "alternative PDF viewer downloads" and this looked good: http://www.foxitsoftware.com/pdf/rd_intro.php
I've trained myself just not to open PDFs when I'm on windows though - I just wait till I'm back with Linux! :-)
Ben on January 3, 2008 9:59 AMI prefer PDFs for computer science papers, which I generally print and read offline.
joe on January 3, 2008 10:15 AMWow, this is the first time I've seen a post at CH and shook my head in disappointment. Tell me you're just saying this to incite debate/drive traffic to your site?
I agree with Joe Chin and others. I don't understand your assertion about "massive inconvenience". It's a lot easier to download a PDF that's meant to be a book than it is to download a website, especially for normal (non-techie) users. Kelly made the perfect choice for the book. Is his blog in PDF format? No? Is his eBook? So what?!? This is just inane.
The PDF format isn't good for online reading, but beats HTML for content that is primarily meant for downloading (such as technical manuals/books).
Providing a printer friendly format that the reader can choose to print to PDF (which is a lot better than saving a bunch of separate html/css/gif/jpeg/png files to disk) works almost as good, but few sites I frequent have the sensibility to do this.
"Do we really want to maintain two different versions of the same content?"
Doesn't modern content management system generate both versions from the same source anyway?
The only problem with PDF as a representation is that browser integration isn't seamless when moving from PDF to HTML. If PDF was treated more as a first-class citizen in a browser, there wouldn't be this other-worldly experience.
PDF's can contain hyperlinks and can be also be professionally typeset , arguably they are *better* then html. If you were serving TeX documents and could send different settings macros for screen, letter, A4, handheld - it would be better than what we have now. For those who have never written a TeX document, the basics are easy and the complex is possible.
If you had a PDF "web-browser" which dealt mostly with PDF content and then had to switch out to an HTML viewer to occasionally look at ugly HTML pages, your tune would be the same, only you'd be cursing HTML instead of PDF.
TeX is a professional typesetting language which can largely separate content from presentation, and had existed long before HTML was invented. If we had used some of ideas from TeX with a more uniform syntax such as s-expressions we may have avoided all headaches of HTML/CSS/JavaScript which we have lived with. Yeah, that's the ticket... TeX + Scheme = PDF :-)
Of course, there are several lines being blurred now that browsers are being used more as a thick client for *applications* rather than just to view *documents*. TeX (nor HTML for that matter) was really meant to be dynamically changed on the fly and re-rendered. However, if TeX syntax had been adjusted to sexps, it could be manipulated easier than XML, Lisp has certainly proven that.
On the other hand, moving in the opposite direction -
Read more of Ted Nelson - http://ted.hyperland.com/buyin.txt
"Markup must not be embedded. Hierarchies and files must not be part of the mental structure of documents. Links must go both ways. All these fundamental errors of the Web must be repaired. But the geeks have tried to lock the door behind them to make nothing else possible."
But, all this has been talked about before.. extensively. Someone else will be cursing PDF a year from now and this discussion will happen again.
But these "representations" that we are talking about are distinct from the actual "resource" which are behind the scenes. I should be able to ask for either a PDF or HTML representation of a particular resource from a server. That's what content negotiation is all about.
That resource could be stored as a text file, TeX document, HTML, signed and encrypted, English or Chinese... but for future generations, it should be easy to transform into something else.
Binary PDF's are *not* easy to transform into a different format - and that is it's single biggest failing as a resource format.
So, after all that, I'm saying that PDF is great as a representation, and terrible as a resource format. HTML is decent as both a representation and for storage as an underlying resource.
But in response to your article, modern web-browsers are tuned to work with HTML as a representation, and that is the only reason why PDF's are out of place. If browsers handled PDF's as easily as they did jpegs (we don't need to launch a picture-viewer do we?), there would be no problem.
Albin on January 3, 2008 10:32 AMMy 2 cents - a href="http://mvark.blogspot.com/2008/01/how-to-mimic-google-searchgmails-view.html"http://mvark.blogspot.com/2008/01/how-to-mimic-google-searchgmails-view.html/a
mvark on January 3, 2008 10:34 AM@rustyvz:
"PDF allows restrictions that you cannot get with HTML or PDF. You can:
- lock a file with a password
- prevent content from being copied
- prevent printing the document"
No, you can't do any of those except the first (password encryption), and the encryption is too weak to work. You obviously can't "prevent content from being copied"; information doesn't work that way.
Here's a fun page explaining the similar "restrictions" put on fonts in the bad old days. Don't miss the haiku at the end!
http://www.andrew.cmu.edu/user/twm/embed/dmca.html
Professor Tom's talk of digitally signing PDFs put me in mind of this other fun page, although it's not relevant to the security of PDFs, but rather to the security of MD5.
http://www.cits.rub.de/MD5Collisions/
Foxit really does take the edge off PDF reading. It's not ideal, but leaps an bounds better than Adobe's ...erm... thing.
While I don't think PDFs are the entire answer, I really would like HTML to go away and never be seen again.
For all the flexibility, and the browsers that tout an Acid2 pass. The very existence of Acid2 points to something wrong to me.
For a explicitly formatted page, I'd even be happy with a format that had a list of [thing] along with an affine matrix.
[thing] could be any visual element the viewer knows about (individual glyphs, video, images, chickens)
It'd be inefficient on space but the solution for that could just be a compression layer that's aware of the format. It could guess the next matrix for a series of glyphs quite easily. Compressing zero rotation would be obvious. Any browser using a method like this would have the compression/decompression layer as a black box they don't care about.
The net result would be a browser that could render any page exactly with an almost trivial rendering engine.
Lerc on January 3, 2008 11:15 AM@Tom Clancy: In older versions of OS X, yes, but in Leopard, when using Safari, it opens inline in the browser window, with the context menu option of opening it seperately in Preview if you want.
nexusprime on January 3, 2008 11:36 AMFor a pretty clear statement of why KK produced PDF ebooks, read here:
a href="http://www.kk.org/cooltools/archives/002537.php"http://www.kk.org/cooltools/archives/002537.php/a
I have to say I agree with him on most parts. Although the concept of PDFs containing advertising truly sucks.
Roddy on January 3, 2008 11:39 AMThe quotation at the head of this article says it all, really:
Because of the idiosyncratic way web browsers work,
designers do not have full control of what you as a
reader see on the web. The web page, including its
fonts, fonts sizes, and placement of material and
size of the window, partly depends on the viewer's
preferences.
Well, I've got news for you, Kevin Kolly: /I'm/ the one trying to read it, not you. I know better than you do what size font is legible on my screen, and how big my screen is.
Yes, I agree with other commenters, PDF is a good delivery format for stuff that's intended only to be printed. But it sucks big time for anything meant to be read on the screen.
jpsa on January 3, 2008 11:39 AMThe experience of opening a book and having everything laid out clearly is something that can't be surpassed. It provides all kinds of layout possibilities, you know, if that's more important to you than the content.
But it also isn't replicated by PDFs, at least not on-screen. Getting the same view of a document on screen means shrinking it to fit on my display, which is where resolution becomes a problem. After all, not all of us have enormous monitors. And who cares about layout when the text is illegible anyway? Apparently due to this limitation (particularly on handheld devices) it's also possible to format PDFs so that they can reflow without shrinking the text to unreadable sizes. Of course this totally defeats the purpose of the format.
At least I know the HTML will (almost) always fit on my screen without sacrificing readability. I actually LIKE the fact that the page can be dynamically sized to suit my needs. In my opinion this feature of HTML, which is often cited as it's greatest weakness when compared to PDFs, is actually its greatest strength.
Sure, PDFs are useful if you're designing primarily for print. But onscreen, they're really for two kinds of people:
1. Outmoded designers who only know how to design for print (are there any of these left?)
2. Vindictive designers who think their design is more important than the content they're presenting and want to use PDFs as a medium to strike out at the web for cutting into their turf.
Admittedly, there is one place where PDFs are really useful. If the fonts are vitally important (perhaps the document uses fonts with characters that your readers are unlikely to have, say anything not in Unicode), or if the text and images are strongly integrated, then HTML simply may not suffice.
WurdBendur on January 3, 2008 11:53 AM
@John S:
" Really? Then why is this "packaged" as a PDF?
http://www.si.umich.edu/~pne/PDF/howtoread.pdf
How does this "packaging" help me, the reader?
1) Embedded font
2) Whole document including images is one file. 1 document = 1 file. A "document" that is a directory of html file(s) and images feels unwieldy.
3) The text in the images is anti-aliased according to my preferences. If the diagrams in this document were inline html images, I would be at the mercy of whatever AA settings the author used when creating the image.
4) Zooming retains layout.
5) Annotation/comments (again all within the one file), side-by-side viewing, quick rotation of the page, other viewing-based features...
These are just things for this simple document you linked."
Embedded font: okay. And that's important?
Document as as single file. In a modern OS, a folder can be treated as a single file (Package or Archive) as well. Windows hasn't caught on there yet, but as long as the PDF proponents are saying the only problem with PDF is that IE/Adobe treat it like crap then I can claim the only problem with packages is that Windows treats them like crap :) Also, you can very easily print an HTML page as a PDF document when/if you ever need to send it around as a single file. I routinely archive web purchase receipts and invoices by printing them to PDF straight from my browser; having them displayed as PDFs originally doesn't save me any time or effort.
Anti-aliased: I'm not sure of the specific point here, other perhaps than that overlay text will anti-alias on images when that happens at render time instead of at generation time. You can do the same in HTML, except for IE's poor support for such. Still, generally speaking, text superimposed over images is the least of my worries when reading content. How the primary content, which is generally given in paragraph form be the delivery mechanism PDF or HTML, renders is far more important. And, there, HTML does a damned fine job of allowing the user's anti-aliasing preferences to be obeyed.
Zooming retains layout. Okay, zooming can retain layout in HTML as well, OR zooming can retain content. You have a choice (and the default is retaining content). Not sure how inflexibility is a win for the reader. For that matter, I'm not sure how it's a in for the designer either.
Annotation/comments. If you're talking about the user annotating the file, that's a clear case for PDF. If I, as a user, want to be able to easily annotate a file without getting my hands dirty, I'll print it to PDF (using my preferred font, anti-aliasing, and page sizing, thank you very much) and annotate from there. If you, as a designer, however, want to embed annotations in the document, it seems HTML's facilities for dynamic content are far more advanced than PDF's.
Side-by-side viewing: Huh? Requiring horizontal scrolling is a horrendous inconvenience of PDF. If you're interested in how it looks *on paper* I can see side-by-side viewing as a benefit (again, though, with an HTML source document this is easily obtained by printing to PDF).
Quick rotation of the page: Again, huh? I don't get your meaning here. I haven't ever seen PDFs which can be rotated to landscape from portrait, whereas that is child's play in HTML. Or, are you saying rotation of the page *and content* is a boon in PDF? Why the hell would you want to read text sideways? Again, if I as a reader want this, I can print my HTML doc to PDF and rotate it however I like.
The long and short of it is this: if you put a document out in HTML, it can easily be printed to PDF by anyone using a modern OS or a crappy OS but with a decent set of tools, *when and if* they need the "viewing-based features" of PDF. If you put it out in PDF, you lock in your particular preferences, your particular screen size and resolution, your particular paper size and orientation. These things can't be "undone" by the reader.
Now, a few semi-points in PDF's favor:
1. It does vector graphics better than HTML today (when will IE support VML or the like?) This means that HTML designers must "lock in" the display resolution of graphics often long before they are displayed, which is just as bad as PDF designers locking in all those things listed above.
2. Mathematical equations are clumsy in HTML (generally unsupported out of the box, and so get rendered as images and fall into the case above). PDF's not really much better in this respect, but since it allows the rendered image to be vector-based instead of raster-based, and further to include actual characters where such make sense, it allows for resizing, selection, etc.
3. You can cross-reference by page number instead of by section (although way back when, before either PDF or HTML, I was taught that page number cross-referencing was bad form and one should always favor section number/name references in any semi-structured document).
4. It is easier to generate not-obviously-ugly-and-bloated PDF from Word than it is to generate not-obviously-ugly-and-bloated HTML from Word. This isn't so much a feature of Word as it is a feature of the inability of most people to look "inside" a PDF document to see how atrociously it has been constructed by the PDF printer. Still, though, generate a 200k PDF from Word and people understand a lot better than if you generate a 150k HTML file which they can plainly see only has 2k of actual content.
5. PDFs allow for DRM (inasmuch as the particular reader obeys the DRM settings). If that is your business, then the choice is obvious. For the vast majority of PDFs I encounter on the web, though, DRM isn't a consideration and isn't employed at all.
@Bob Carpenter:
"Here's a real example from a blog entry I did on the mess created for search by PDFs: http://acl.ldc.upenn.edu/P/P06/P06-2051.pdf
Now try searching for the word 'fulfills'. It follows the phrase 'which fully' on page 397 (7/8 in the PDF page numbering).
The reason you can't see it is that 'fi' is what's known as a ligature, and in the interest of prettiness, PDF treats it as a unit, so it doesn't match the sequence of two characters 'fi'.
At least in January 2008, PDFs are nice for printing and layout, but lousy for search."
Safari found "fulfills" just fine. FYI. It seems someone has already fixed this problem.
Tom Dibble on January 3, 2008 12:08 PMmy experience is that a number of small (especially non-profits) Website owners create their original documents in something like Word or even something obscure. They want to make the documents available on the Web but don't have the resources to turn them into full-on HTML, so they resort to the next best choice: PDFs.
mingle-mangle on January 3, 2008 12:13 PMProbably the new GNU pdf project will be so flexible, that a firefox plugin could be readily cached into memory and display content in a quick, unjarring way. With Adobe it seems like there is a pretty dramatic resizing of an external window into the browser area, or, its just external to the browsers direct control all together.
The one thing PDF has going for it, is guaranteed layout... something that HTML doesn't achieve because of cross-browser issues.
Or at least, that's what you'd think. That only holds true for Adobe reader, I've some PDFs not work properly with FoxIt.
But still, PDF is usually a better way to present paginated information you'd want to print than HTML.
engtech on January 3, 2008 12:59 PM"The web page, including its fonts, fonts sizes, and placement of material and size of the window, partly depends on the viewer's preferences."
Which IMHO is quite right. If the author wants the reader to read his|her content.
The simplest example is waiting until age or something else results in your eyesight deteriorating to the point where 8 point is no longer usable. Then user preference becomes important. If I can't read it, I'll move on, no matter how wonderful your typesetting.
Dave Pawson on January 4, 2008 1:27 AMAs I have pointed out to many undergraduates ... use the correct tool for any job.
PDFs renders don't have the same set of problems that the web browsers introduces other problems. But what are PDFs good for?
I mostly write documents in TeX, and LaTeX. Why do I do that? Because I should not have to care how my text looks. I should say "my title is this" and something should render it to something that looks great.
I have used word extensively in the past. I know that there is a nice, safe feeling with a WYSIWYG interface, but it comes at a cost. How many of use have fought that chart/picture/image/clipart into exactly the right place in the document, only to press space and it move? TeX and LaTeX remove this hassle: you say "I want this picture with this caption" and it places it in the most optimum position.
LaTeX renders best to PDF, and it's there that you have the answer ... PDFs were designed to be a printer and publisher standard - a more human friendly version of postscript.
As an author, I don't necessarily want everyone to be able to copy-paste my work. I don't have the option of preventing copy-paste in a plain text standard like [X]HTML.
PDF still has a place in this world - it just might be the case that it's not the right tool for your job.
Lecturer on January 4, 2008 3:34 AMUsually I don't give a rat's ass about the designer's layout so I prefer html. Browsers are the problem. It's a pain to change styles - it should be one click and for god's sake give me a matrix for changing colors. A built in grep tool to search a subtree while in file:/// would be nice. As for packaging, whats wrong with html.tar.gz? It would be nice if firefox would open a tgz of html docs since the whole point of a browser is to reduce everything to a single mouse click. It's a little hard to do one thing well when mousing is your one thing but now I'm getting philosophical. Cheers, fellow pdf-hater.
Sam on January 4, 2008 5:02 AMPDFs are all about printing. The site I engineer makes extensive use of charting and my customers demand high quality, low bandwidth, and printable reports. Printing using the browser is a joke. My site is a highly dynamic reporting site and the PDF also acts as an excellent archiving tool to snapshot a point in time and to snapshot the report criteria and customization the user spent so much time on. Saving a complicated web app as an html page is also a joke. My solution with all of those features is to create the charts in SVG format(XML) and dump your objects to XML, hit it with an XSLT to create XSL-FO and then use a 3rd party tool create a PDF from that. Once you get your head wrapped around XSLT and XSL-FO it’s quite elegant. My tools of choice for doing this are ASP.Net, Dundas Ent Charting, AltSoft Xml2PDF and Altova Stylevision or Stylus XML Studio, and Telerik. Granted these tools add up to about $5000+ but hey if you want to play you gotta pay.
Using the internet requires the user at minimum to have a modern browser, Flash, and PDF plugins. It's just the way it is.
Travis Johnson on January 4, 2008 6:38 AMThe central criteria for choosing an output format is the reader's information transfer needs.
A secondary criteria, at least for some content, is the ability to either view or print the information. PDFs excel at the second: printing a PDF produces much nicer output that printing HTML.
But otherwise, PDFs are too often bloated. Adobe's attempts to turn PDFS into another multimedia "experience" does nothing to address the problem.
Finally, PDFs would be much more palatable, I think, if one wasn't so generally dependent on Adobe's hideuously bloated software to read them. Seriously -- why part of reading a PDF should require a 25MB or greater download?
Riley on January 4, 2008 6:40 AMI agree that HTML is much much better for new, custom documents designed for the web.
However, I find that usually PDFs are used on the web because the website is displaying a document originally created in MS Word and then converted to PDF for easier viewing operating systems other than windows.
Although HTML would still be better, the output of saving a Word document as HTML sucks.
Dave on January 4, 2008 7:05 AMWhen I was in university, lots of my Comp. Sci. courses (and a few non-comp sci courses) offered downloadable course notes, lecture slides, assignments and old tests in PDF format.
Many people have already pointed out the advantages:
- PDFs are self-contained
- They can be saved offline easily
- The fixed layout is great for printing
Of course, the fixed layout sucks for online browsing. And to me, one of the worst "features" of Adobe Reader is "fast web" view. On by default, the feature causes the browser plugin to open PDF files immediately, even if they haven't loaded completely. Unfortunately, if your PDF file is quite large (say 5-10 megs), it will take a few minutes to load completely, even on a high speed connections. Woe to you if you happen to use the text search function. Your browser will lock up as the PDF plugin waits for the rest of the file to load, in order to search the entire document.
Even if they've fixed this problem in the latest version of Adobe Reader, I don't care. This is an example of horrible design, and IIRC, it's existed in at least two versions (5 and 6). IMO, any version later than 6 is horribly bloated anyway.
And I agree that PDFs are misused in a lot of contexts. I've seen sequences of pictures packed in a PDF. No formatting, just 1 picture per page. What if you want to extract the images and save them in their original format? Well, the latest version of Adobe Reader has *removed* the "image toolbar" functionality that used to let you do this.
PDFs are decent for (offline) manuals and e-books, IMO. I've seen sites which offer manuals in both PDF and HTML format. This is a good compromise.
Will on January 4, 2008 8:28 AMI hate to point this out, but PDFs aren't intended for layout and user interaction
Case-in-point: Paperless offices and digitalization of document assets.
Nowadays, everyone is scanning in their documents, records et-al, and storing them using document management software in the form of multi-page tiffs or PDFs.
When I see a technology I don't understand, rather than just criticizing it like you're doing here, why don't you try accepting the fact that it has reached the point it has for a reason(even if you don't know what it is)?
CptBongue on January 4, 2008 9:43 AMI completely agree with you Jeff, more isn't automatically better. Part of the problem is that the user/document creator doesn't necessarily understand the alternative options available, at least I hope so. I do technical support -- for programmers, not end users (keep this in mind).
It blows my mind how often I get an email that states something like "the attached Word document explains my problem", then I open the attached Word document and it's half a page of plain text. Now how did the person sending me this email come to the conclusion that it would be better to send me a Word document than simply type the same information into the body of the email?
The other thing I get all the time is a 4 MB Word doc that contains a screenshot that could have been sent as a 50 kB .jpg image file.
Again, this is from programmers, people who use computers professionally every day and should know what options are available to them.
Dennis on January 4, 2008 9:57 AMHey, for those of you looking for bookmarking in Adobe Reader, check this out: http://korayem.net/post/2007/12/Adobe-Reader-Tip-Open-a-PDFs-Last-Viewed-Page.aspx
Luke on January 4, 2008 10:10 AMHey, for those of you looking for bookmarking in Adobe Reader, check this out: http://korayem.net/post/2007/12/Adobe-Reader-Tip-Open-a-PDFs-Last-Viewed-Page.aspx
Thanks Paul. I will look into this.
Bill on January 4, 2008 10:13 AMAll of you need to educate yourself on Document Authoring. Ever wonder how some technical documents you read are available as HTML or PDF?
They probably used DocBook (industry standard amongst technical writers), Latex, or some other form to create documents that separate content from presentation (e.g. HTML is content, CSS is presentation).
So you write all you content expressed in some markup language, and let the presentation layer do all the formatting, layout, etc to render to HTML or PDF.
There seem to be two types of people in this comment thread: those who are graphic or layout designers (or at least loyal to their cause), and those who are web programmers (or who fall into ranks behind them). The designers want to make sure the content they have painstakingly designed to look good and flow properly and be digestible to the audience.
This is a Good Thing, at least often enough to be mentioned. Jeff, you have mentioned the importance of good design (especially visual) so many times on this blog I'm flabbergasted that you throw in unconditionally with the "I don't give a rat's ass about the designer's layout" bunch. Surely you can understand the desire, and sometimes the need, to ensure that certain visual information is presented just so. If not, go talk to some print comics authors. Webcomics to this day consist generally of solid images, so the exacting layout can be immutable.
Next is the portability argument, which belies the web programmers' intentions: talking of whether Google can read something, 1 URL = 1 document, .mht or html.tar.gz, OSX's handling versus IE+Adobe's, et cetera ad nauseam completely misses the point.
Portable. Document.
I work in an office where half the people need me to teach them how to attach something in an email. We don't have a server to publish. We don't have a domain name we're willing to pay for eternally, nor a fancy content management system that stores everything in Docbook XML. Our index is a binder and a crappy interal site designed by monkeys with no proper search function. PDFs are a lifesaver in that environment. You can save them, print them straight off the intranet, toss them around in emails, back them up by shoving them on an external hard drive because god knows management doesn't have a proper backup system in place. They'll stay the same, and because PDF is now an ISO standard, I'm going to be able to read them like I can still read ASCII.
The HTML/print/CMS/CSS/anti-aliasing/whatever argument misses the point. Servers aren't permanent. URLs aren't permanent. A good chunk of things are needed offline, and a surprising number of things need to be as easy as possible to be punted around by the ignorant masses. PDF works better for that. It's not the pretty, shiny web 2.x usually discussed here, but it's the web a lot of people still live in.
Ray on January 4, 2008 10:23 AM@Shmork,
The two major open-source codebases out there for reading PDFs are xpdf and Ghostscript. Yes, xpdf is a bunch of C++, but if you don't like that you can use derivatives of it as outside commands, rather than linking directly. That's the way almost everyone uses Ghostscript, which can convert PDF (and PostScript) to any raster format (or back to PS/PDF).
Paul wins the thread.
the real problem is that PDF is commonly misused in completely inappropriate contexts.
Paul Coddington on January 3, 2008 02:53 AM
It's not about saying PDF is "better" than HTML, or vice versa. It's a question of picking the right medium for your message, which is really just a corollary to the first principal of effective communication: know your audience (and cater to their needs).
I think I can answer Mike Shaffer's question ("Can Google's spider crawl through PDFs?") with the comment I wanted to make. Yes, it seems they can, because there's that wonderful feature of Google that allows you to view (many, not all) PDFs as HTML instead. I think the fact that this feature was made says something about the massive inconvenience of PDFs.
Matt McVickar on January 4, 2008 12:08 PMWhy PDFs "suck":
(1) Size. A single page of 'text' can inflate well beyond a quarter of a megabyte. A complex document can shoot into the tens of megabytes. An HTML page that looks virtually identically is often less than one tenth (or less) of the size.
(2) "Non-Web". PDFs virtually always violate the utility of the web as a hyperlinked medium. They don't cleanly integrate with the rest of the web because they are fundamentally standalone documents. Yes, there are extensions to let them do hypertext kind of things, but hardly anyone actually uses those features.
(3) Nearly unmodifiable. Once you have created a PDF it is close to 'written in stone'.
Why PDFs "rock":
(1) _ANYONE_ can create one. A person makes whatever they want in their favorite program (Word, Photoshop, whatever) and "print" it as a PDF. Upload, link, and they are done.
(2) They look "exactly" like what the person who created it wanted.
Fundamentally, PDFs do what _non-technical people_ want: Enable them to publish documents to the web while knowing little to nothing about the web. On the short term they care only that it lets them publish their information to the web with the least effort.
Long term, of course, PDFs make life hell for site maintainers.
Benjamin Franz on January 6, 2008 8:38 AMPDFs don't look anything like what the person who created
them wanted to, because they're displayed on my computer
Why PDFs Suck Alasdair King Page 1
Why PDFs Suck Alasdair King Page 2
screen, which isn't multiple A4 pieces of paper. So
instead PDFs are slow, jerkily-scrolling mess that take
Why PDFs Suck Alasdair King Page 2
Why PDFs Suck Alasdair King Page 3
an age to load and break the back shortcut - and who would
tolerate a web page that takes thirty seconds to load text?
Why PDFs Suck Alasdair King Page 3
Why PDFs Suck Alasdair King Page 4
- and can't be searched easily and won't zoom and the text
won't wrap when you resize the font and they use a font you
Why PDFs Suck Alasdair King Page 4
Why PDFs Suck Alasdair King Page 5
can't read very well because you have a print impairment
but most of all I hate the way they break the page into
Why PDFs Suck Alasdair King Page 5
Why PDFs Suck Alasdair King Page 6
sections that make no sense for anyone trying to read the
damn thing.
Why PDFs Suck Alasdair King Page 6
I'll agree with you Jeff when: IE6 is safe and sound in it's well deserved grave. As far as Kevin goes I'll think out loud, I don't know much about the guy but it seems like he comes from the print arena. If that is the case he can probably create the example you show above in about 5 min in Quark. Compare that with the horrors of getting the CSS to work in all browsers, it's a science. No wonder Kevin prefers PDF at this point.
However, the future looks bright at the moment. IE7 is not too quirky and it will probably only get better with version 8.
Henrik Sarvell on January 7, 2008 7:23 AMAs noted many times above:
=====
"Here's a real example from a blog entry I did on the mess created for search by PDFs: http://acl.ldc.upenn.edu/P/P06/P06-2051.pdf
Now try searching for the word 'fulfills'. It follows the phrase 'which fully' on page 397 (7/8 in the PDF page numbering).
=====
Preview.app finds this effortlessly on my Mac. That Adobe Reader 8 and other programs do not is an issue with Reader and those other programs, not with the PDF file format.
John Gruver on January 7, 2008 7:47 AMI agree with the reply about making a statement:-
I want my document content to be visible to non-MS-Word users, even though I may have created the document using the software.
Also, I download and keep copies of PDF documents (think IBM 'Redbooks') for reference and think they are really good for this. Using HTML directories, or even .mht archives, is not so good when I want to look through the document.
On that thought, many of the open source docs directories are awful to read through in comparison to a well-indexed PDF document.
Also agree that PDF should not be be used as a generic form of web content, only to be used where it is applicable...
Rut the Nut on January 7, 2008 9:54 AMWhile PDF is overkill for web design and many presentation types, it is absolutely essential when the presentation must be exact.
My system produces PDF insurance documents which have been reviewed by the legal department. The reason for the legal department's review is simple: these documents are legally binding and as such *must* be exact. I can't risk errant interpretation of CSS creating "unexpected consequences", so we create the PDF documents that provide exact replicas of the paper documents the clients (and their clients) expect and in fact require.
So while I agree that PDF is overkill for many purposes, it has significant purposes left. (Another is pixel perfect typesetting for 3000 DPI imagesetters; a task I had with a prior client).
Wesley Shephard on January 7, 2008 10:36 AMWhat he calls idiosyncrasies are actually advantages. HTML is a markup language, not a layout language. It's intentionally different from PDF.
I can reflow an HTML page to fit the size of my iPhone's screen. I can't do that with a PDF, at least not easily.
LKM on January 8, 2008 7:21 AMIt's odd all the pro-PDF comments here. I'm profoundly surprised that there are people that like PDFs. I myself cannot stand them and recently wrote a blog post (referencing your article), containing a couple additional things that are worth hating.
I'll agree with the others: a strength of HTML is that it puts its display in the hands of the user, not the editor. I can't believe some people think that's a disadvantage. Another thing is I don't know why so many people are so concerned with it printing. I don't know about anyone else, but I hardly ever print (even when some site tells me to be sure to print off the page, I rarely go to the trouble).
Keith on January 10, 2008 9:29 AMWell, I read from A to Z on this page.
No one have to open PDF reader to view PDF file when linked in html code. Use opera and PDF will be opened in same window like any other link.
I am using PDF everyday 'cose I work in prepress. And, believe me, PDF rules when compared to .ps, .tiff, .jpg, etc. And from Acrobat Pro with some plug-ins I don't need any other software to edit PDF file, change colors or any similar interventions on file.
In graphics design and prepress PDF rules and indeed is standard. As far as web is concerned, I don't know, and don't want to know.
Keith and all others that don't understand PDF,
PDF is for PRINT MEDIA.
HTML is for SCREEN MEDIA.
Trying to apply a format to a domain it was never intended for will never work 100%. This is why whining about reading PDF's on a computer screen because they don't auto-flow like web pages is beyond retarded.
Do you understand yet?
Mike on January 15, 2008 12:11 PMI hate PDF. It always takes too long to load, and sometimes doesn't even work.
Also, I find it funny how everyone is pro-HTML, but you're not allowed to have HTML in your comment.
I like reading PDF document. And, it is very easy to understand PDF format, so we can make a simple PDF creator quickly.
Catherine on January 17, 2008 6:08 AMDid you ask Kevin Kelly why he used PDF?
Scott on March 13, 2008 9:28 AMLet's compare apples to apples, or HTML/CSS duo with PDF. PDFs have a huge advantage when it comes to producing a document that will look the same on screen and on paper. You can use *any* software to create the text/layout and then simply "print" to the PDF driver and you are done.
With HTML/CSS, you have to descend into the snake pit to fit a round peg into a square hole. This is so much of a distraction for me that I refuse to do it. I'd rather focus on producing content that fight CSS. I am using several freely available PDF drivers (Primo PDF and PDF995) both of which work like a charm.
Printing and line breaks are another matter. Even this blog does not print well. Have you tried it? It is a mess. Somehow, Ads and crapware on pages always prints fully on the page and the content I really want gets cut off.
You can also take a look at the crappy new layout of MSDN Magazine (msdnmag.com) What a friggin disaster. When you click on "print", your printout won't include snippets of code because there is, apparently, a bug in CSS stylesheet. They are working on it.
SamG on May 7, 2008 9:25 AMI completely agree with your assessment. PDF's suck royally.
My biggest complaint is that they are hidden time bombs. I uninstalled the reader because I accidentally clicked on a 10mb pdf.. after 10 minutes of loading (probably grossly exaggerated) i simply gave up and determined it wasnt worth it.
Its just like flash in my opinion. If its in flash, then its not worth the hassle to get.If i have to install a program to view it, forget it. Html can do it all at no ones expense.
I know this post is ancient, but I just stumbled across it, and there is one big advantage of PDF that nobody has really talked about.
Scalable images!
As an electrical engineer, I access technical information in the form of drawings every day. I've yet to see anything that works as well as PDF/Postscript for drawings.
Images in HTML pages are almost always bitmaps. They're either small, fast, and totally useless for details, or huge, slow, and only barely adequate for detail. I know there are some scalable image formats out there, but PDF is still the easiest and most portable way to put vector images on line.
An example from my own website: http://jmkasunich.dyndns.org/pics/spindle-NMTB-30.pdf is only 33Kbytes, but can be zoomed and examined in extreme detail without getting pixelated.
The drawing above is really very simple compared to most engineering drawings - I only used it because all the better examples I have at my fingertips are my employer's proprietary information. Most of the drawings I create and use are printed at 11x17, and would be difficult to read even on paper at 8.5x11. On screen? Forget it. You can get an overview, or you can zoom in, but it is impossible to do both at the same time. Paper is so much better. Maybe my opinion will change when we have 20 diagonal monitors with 300 dpi resolution, but I doubt it. I can flip through a 10 page schematic far faster than the on-screen version.
The above example is a case where the only thing in the pdf is an image, and I'm sure there are scalable image formats that would work (but are probalby not as widely supported as pdf). But in many cases, multiple scalable images are combined with text. Datasheets like http://www.analog.com/static/imported-files/data_sheets/AD7190.pdf are a good example. If Analog Devices started publishing that datasheet in HTML instead of PDF, I'd be pissed, and I wouldn't be the only one.
John on November 7, 2008 11:23 AMAs a person working A LOT with PDF, this really was a great discussion to read through... :-)
For those of you that don't know that much about the roots of the PDF format and its connection to Postscript I recommend you to read through below pdf/postscript resources:
http://www.prepressure.com/postscript/basics/history/4
http://en.wikipedia.org/wiki/Portable_Document_Format#PostScript
http://www.inkguides.com/history-of-postscript.asp
I think pdf files are a standard that is readable regardless of the system or the source format of the text. Just put anything on a pdf like a Word document, web-document, or anything and it can be viewed like a good old paper document. But the same content can be generated as a html web-page too. The web page is part of the web site though, not a distinct document. I think not everything needs a distinct document, because lots of content can be read directly from a web page.
Don on February 6, 2010 10:19 PMPDFs are invaluable for material where well defined page numbers are necessary for reference, citation, or discussion purposes. Academic and legal documents are two examples of where this is necessary. The 'how to read' document you linked above is most probably a case of this. HTML+CSS can't yet produce documents with page numbers that remain the same regardless of how a document is viewed, be it on screen or on paper. Until HTML+CSS can do this, PDF has a valid role even for documents where all other formatting is within the realm of standard HTML+CSS.
anon on February 6, 2010 10:19 PMWe entertained the notion -- very briefly -- of using HTML as the layout mechanism for our financial reports. Unfortunately reliable page breaking, column alignment over spanning pages, footnotes, and simply not being able to use 100% of the paper consistently and reliably trashed that notion.
And since they're financial reports, you really can't fuck around with "implementation defined" rendering of layout. No sir. There can't be alternate versions floating around because someone used Opera and columns 11-15 didn't print, or Mozilla 1.5 and the minus signs are hidden behind a table edge, but IE's rendering looked fine.
These reports are intended to be printed, but also optionally viewed online. PDF viewers are universally crap, and Adobe's more so than anyone else's except maybe Ghostview. But PDF is the right tool for some jobs.
Clinton P on February 6, 2010 10:19 PMFoxit is fast, but it is terrible at rendering text (I'm looking at a 21" Trinitron). Apart from the loading times I was quite fond of Adobe Readers 7 and 8. They kept changing the user interface for no reason though.
Preview.app is where it's at for PDFs. Although I haven't quite got used to the Leopard version yet compared to the Tiger one, where I actually quite liked the fact the drawer hung off the side of the window.
I think the problem is that most people just use PDFs because they can, not for good technical reasons. In fact my big gripe is that people often use more than one column of text to a page, which is very annoying to read on a computer screen because you scroll down to read the first column, scroll back to the top of the page, then scroll back down again to read the second column. If PDF readers could reflow text it would be fantastic.
John Ferguson on February 6, 2010 10:19 PMGOOD PDF USES...
When one needs only to convey textual information, then yes HTML should probably be used. But there are many things that PDF can simply do that HTML can not.
Mathematics and symbol-heavy information. Sure MathML may one day help solve this, but it's not pratical to use yet. Example paper by Einstein:
http://www.fourmilab.ch/etexts/einstein/specrel/specrel.pdf
Display of graphically-rich information. PDF is not just for controling layout and fonts, it is a full graphics engine ideal for those needing to convey very dense visual information where not everything should be in strict lines and columns of text.
Example for airport meterological forcasting:
http://reportlab.org/docs/provencio.pdf
Accurate reproduction of graphical information. This may also include things that need exact reproduction, color calibration, etc. so as not to distort informaition. Example from Edward Tufte
http://www.itee.adfa.edu.au/coursework/ZITE8140/tufte.pdf
Huge document sizes. Unlike HTML and other markup languages, PDF is not a serialized format. It is random-access. Thus it is ideal for handling very large documents of many thousands of pages in size. Properly written PDF software does not need to hold much of a PDF file in memory at any one time, nor does it need to read from the first byte and work toward the end. Example is the 9-11 commission report ~600 pages long (with proper browser integration you should be able to jump to and view any page before the whole document has even finished downloading):
http://www.9-11commission.gov/report/911Report.pdf
Legal publications. Anything that needs to be heavily cross referenced, including by page numbers, etc. or must be preserved in it's pristine "published" format for legal or long-term artchival reasons is ideal in PDF. Given that most of the US Government publishes it's official legal material in PDF is a good thing. An excerpt from the Federal Register:
http://edocket.access.gpo.gov/2008/pdf/07-6280.pdf
Vector artwork, especially for maps and cartography where precision are important. SVG can almost compete here, but it's still not as ubiquitous as PDF. Example, maps of the US congressional districts for the Palm Beach Florida area:
http://nationalatlas.gov/printable/images/pdf/congdist/FL22_110.pdf
And perhaps many others uses uniquely suited to PDF.
That being said, in general if it can be done with HTML, then it probably should be. PDF itself isn't bad. But common uses of it can be, and in particular, certain software implementing PDF (ahem, Acrobat) can also be bad. But don't confuse that with PDF in general.
Is it just me or do you also think Adobe interfaces generally suck?
Commenting on a year-old post is almost always sure to yield no reply-gratification, but I'd like to relay an experience...
Here in Denmark, the government has laid out some goals for how the lesser governmental bodies might leverage technology, and most have failed miserably...
Case in point is my complaint to the municipality a few years ago, that the current email handling was hopelessly outdated, and that the insistence of publishing some vital documents in the Word (-somthing) format was in direct contradiction of the goals.
Give two weeks, when an email arrives in my inbox, signed by a Mayors aide, with no text but an attachment in an old word-format, that basically states that they try their best. Said word-file had no less than 4 macros that OpenOffice complained about (and they were poorly written at that as well).
I cannot accept that public officials send word processor documents instead of presentation documents, if they even find it necessary, but to ship a word document that relied on poorly written macros, in response to a complaint about not complying with open standards just blows my mind.
Clearly, for any change to be made, we have to acknowledge that the people working for the powers that be, are, in all likelihood, just ordinary people, who have no clue whatsoever with regards to anything computerbased...
Sune on September 20, 2010 2:09 PMThe comments to this entry are closed.
|
|
Traffic Stats |