The Trouble with PDFs

January 1, 2008

Adobe's Portable Document Format is so advanced it makes you wonder why anyone bothers with primitive HTML. It's a completely vector-based layout format, both display and resolution independent. With PDF, you sacrifice almost nothing compared to traditional book and magazine layouts except the obvious limitation of resolution. Here's Kevin Kelly extolling the virtues of PDFs:

A PDF is able to retain the highly evolved grammar, design and syntax that one thousand years of bookmaking has attained. Because of the idiosyncratic way web browsers work, designers do not have full control of what you as a reader see on the web. The web page, including its fonts, fonts sizes, and placement of material and size of the window, partly depends on the viewer's preferences. In my experience as a reader, a web designer, and a book designer, the reading experience on paper -- and PDFs -- is much more refined and elegant. As a publisher and designer I can direct the flow of attention with better tools (font choices, rules, lines, columns) and better control. The benefit to me as a reader is that this sophisticated design translates into increased clarity, smoothness, comprehension, and enjoyment.

But I have a problem with PDF files.

  1. Every time I link to a PDF, I have to tag the link (pdf) to indicate that the hyperlink will whisk you away, not to another web page as you might expect, but to a strange, otherworldly out-of-browser experience.
  2. Links to PDF files assume the user has a PDF viewer installed. Do they? And how will the link be handled? As in situ navigation, presenting the user with a weird new set of PDF controls? Or as an undesirable popup window? Browser support for PDF is so weird there are entire PDF add-ons to deal with it.
  3. The layout better be mind-blowingly good to justify the use of the PDF format. For most of the PDFs I encounter, the information could have been presented in HTML and CSS markup with almost no aesthetic loss at all. The "refined, elegant, sophisticated design" offered by PDF is often wasted.
  4. You might argue that PDFs make sense as a secondary, print-optimized version of existing HTML content. But why not stick to one version of the content? Why repeat ourselves? Do we really want to maintain two different versions of the same content?

I'm not the first person to note the usability problems of PDF, but I consider this a classic case of worse is better. The advantages of PDF rarely outweigh the many disadvantages compared to plain old HTML. I suppose relying on PDF was more defensible in 2001, when browser printing support was notoriously poor, and HTML layout was not well understood. But it's 2008. I'm surprised how many authors still reach for the safety blanket of PDF when they and their audience would be much better served with modern HTML.

The other problem with PDFs is a bit more subtle. A PDF is not merely a PDF; it's a statement. An implicit protest against the terrible limitations of the HTML used by the unwashed masses. PDF content yearns to be free of the constraints of common HTML-- this content, you see, signifies something:

It seems that the PDF format signifies something now, and it's something more than just user inconvenience. In addition to requiring the user to shift mental modes, ("I'm seeing something designed as a PDF now, this must be serious information...") the requirement that a document either be downloaded or viewed in a context that's radically different from standard web pages seems like a subtle assertion of authority by a document's creator. The decision to switch from standard HTML to PDF isn't arbitrary, but it isn't based on technical requirements either. It's based on the value that an author wants to assign to the work, and it benefits from the still-prevalent, though rapidly fading, consensus that print work is somehow more inherently valuable and authoritative than web pages and other online content.

The massive inconvenience of PDF for the user rarely outweighs the minor HTML injustices righted through the power of PDF layout. Consider Kevin Kelly's own True Films 3.0 PDF:

True Films 3.0 PDF screenshot

Kevin went to the trouble of packaging this content up as a PDF, even adding Adobe's brand new support for contextual PDF advertising. All in the name of better formatting. But I don't see any advanced formatting here! Everything in that PDF would render perfectly as HTML. And it'd be better as HTML: easier to hyperlink and search, more accessible to a wide audience, and it would certainly generate greater advertising revenue through the existing web ad ecosystem.

I don't dispute Mr. Kelly's taste in movies for a second. And I worship at the altar of his Cool Tools. But I'll never understand how the founding editor of Wired could fall prey to such shallow PDF elitism-- completely missing the obvious and inherent power of the world's HTML common denominator.

Posted by Jeff Atwood
179 Comments

PDFs make sense in some circumstances.

pwb on January 3, 2008 1:12 AM

PDF is a packaging format. HTML is not. That's the reason people use PDF, not because of the better layout.

Stephan Schmidt on January 3, 2008 1:33 AM

Really? Then why is this "packaged" as a PDF?

http://www.si.umich.edu/~pne/PDF/howtoread.pdf

How does this "packaging" help me, the reader?

Jeff Atwood on January 3, 2008 1:37 AM

Not hatting or anything but today is the 3rd and it says January 1st.
Anyway Jeff i have to agree with you, PDFs are great, but unnecessary, theres no reason why a competent developer couldn't just build the HTML properly.

Arron on January 3, 2008 1:46 AM

I mainly use PDF for read-only email attachments.

On websites it is useful to save it locally for viewing later (which for web pages is harder) but yet again I generally I agree this blog yet again - good work.

John Kilmister on January 3, 2008 1:49 AM

My main problem with PDF files is their forced page size. HTML/CSS is very good at resizing content to the available window size for good screen readability, whereas PDF's almost perfect replication of paper layout invariably results in either viewing complete pages at a smaller than comfortable zoom level, or scrolling awkwardly in two dimensions to read the content.

Malcolm on January 3, 2008 1:51 AM

What about

http://www.kk.org/cooltools/archives/002538.php

as an example for packaging?

Stephan Schmidt on January 3, 2008 1:54 AM

i do not think the problem is the pdf itself - as stephan points out its self-contained structure is a huge plus in my eyes. the problem is the poor integration of pdfs in current browsers

you can watch movies, etc directly in your browser window. why not do this for pdfs too?

the fixed layout of a pdf IS the huge plus compared to html! have you ever assembled a report in an html table ( lets say some statistics ) and printed this out from different browsers to different printers? no chance. line breaks are different, page breaks are different -- the whole structure of the document is lost. using pdfs is the only solution you have.

-- as said before. not the pdf itself is the problem. in contrary. only the poor integration is the problem

tobi on January 3, 2008 1:55 AM

One of the main reasons I disliked PDF used to be what you mentioned: on "PDF pages", Ctrl+W didn't have the same functionality as in normal pages.

J. Stoever on January 3, 2008 1:57 AM

The absolute best use of PDF files is, in my opinion, cross platform compatibility. I used OpenOffice on Ubuntu in college and my professor used Microsoft Word.

Needless to say, despite saving files in the "compatibility" mode, they would not render correctly when opened with MS Word. Tables were screwed up and the formatting was weird.

PDF files were the only way to go. That way I could be sure that the professor was seeing the *exact* same thing as I was.

Erlingur on January 3, 2008 2:05 AM

hi,

i do not want to start a flamewar here ... but i think the problem you have with pdfs is a pure windows problem. have you ever looked at the neat integration of OS X and also linux desktops nowadays? there a pdf is just handled equally good as any html page or pure text file. with full text search etc etc

-john

john on January 3, 2008 2:05 AM

you will always find examples like the one jeff linked - there it absolutely makes no sense to use a pdf as a normal html file does the job too. but i am sure there are also examples out there where a pdf would have been the right choice ...

tobi on January 3, 2008 2:10 AM

PDF's are good for reference, and because they are contained within a single file. They are good for manuals, ebooks, articles people may refer to for medical research, etc
They are easy to send around, especially since they display pretty much identically on all platforms.

The inconvenience of PDFs are justified when the content is static, and someone may want to refer back to it often, without having to worry about a web-server vanishing of the internet.

I'd think of it more as a book than a web-page.

dbr on January 3, 2008 2:18 AM

@john - I'm not sure what you're getting at. Adobe's Windows PDF reader also has "full text search etc etc".

The problem I have with PDFs is that they are restricted as if they are pages from a book. HTML isn't like that, HTML pages accept that the web is not a book and that things work differently. As others have mentioned it is tedious to have to navigate a PDF by moving left and right as well as up and down.

Mike on January 3, 2008 2:22 AM

I often browse the web on my Wii or my mobile. HTML is by and large resizeable (some sites better than others, some browsers better than others but fundamentally things tend to work well) PDF simply isn't. It isn't designed to be. This is for good reason, the intent was to produce something that looks the *same everywhere* which is fine for many applications but not the general provision of information...

Matt on January 3, 2008 2:22 AM

How do you embed a font in an HTML page? You can't. Default browser fonts suck. Times, Arial, even Georgia all look terrible at print resolution. And more specialized fonts for mathematical typesetting are usually not even available. Never mind that HTML rendereing doesn't do kerning or properly adjusted lines.

How dou you put footnotes and sidebars on HTML pages so that they stay in the visual vicinity of the text they are referring to? You can't, not without fixing the layout in such a way that you might as well use PDF in the first place.

Chris Nahr on January 3, 2008 2:23 AM

I've encountered an employer's website where they heperlink between PDFs - that's normally fine, except they insist on using relative links, which means it'll only work if Adobe Reader is configured to run as a browser plugin. As soon as I reset things to my preferred mode of working (PDFs open outside of the browser), all the links broke. The employer response was not to fix the problem, but to force me into the 'one work mode suits all' model. Bah!

Kevin Haines on January 3, 2008 2:23 AM

PDF is great for read-only documents, that for whatever reason may not necessarily be resident on a web accessible location.

Especially if they're going to be printed out. The HTML printing story is still quite commonly dreadful, I'm afraid.

I also have a sneaking suspicion the typical information worker who isn't in a programming related field finds the concept of sending one PDF file to someone via email easier, if the source is something like a Word document or something that can be printed to the PDF printer.

The Adobe PDF tools on Windows are uniformly crap though, my biggest annoyance is the entire browser instance locking up while loading the PDF. Hello, 1998 called, they want you to not keep the UI thread busy and prevent you from doing anything else with the browser while it loads, or randomly stop responding to paint requests for whatever reason.

OS X does this so much better. (Evince on Linux still has some rendering speed "issues", shall we say).

nexusprime on January 3, 2008 2:28 AM

But let's face it: To produce HTML, you need to be an advanced techie user. You can't just create good nice looking HTML (which is viewable in Safari, Firefox, IE, ...) with no knowledge. There are very few HTML applications around, which help producing HTML, but on the other side: Using PDF, you can basically design and write your content with ANY application out there (Word, PowerPoint, Pages, OpenOffice, ...) and the output does not need to be "optimized" for one browser or the other.

I agree CSS and stuff is getting more standardized across all browsers, but there are still plenty of tweaks needed (e.g. setting IE to strict, otherwise it will fall back to poorold rendering mode). Also embedding fonts (fonts are the other 50% of a good design!) is not that easy with HTML, as well as hyphenation.

So to speak, the good old Bible, Gutenberg printed long long time ago, offers more sophistication than HTML, but could easily transfered to PDF.

just my 2ct :-)

kusmi on January 3, 2008 2:29 AM

C'mon Jeff, PDFs are ok. Give up on Windows Vista's Crap PDF support and move onto OS X, pdfs here are "just another file". :)

A PDF is better when you want to make sure that the other party receives the exact file. I haven't had issues with PDFs for ages, but.. I run OS X.

OK, this was just a "troll" ;)

HTML Print s*** big time. Happy New Year!

Martin on January 3, 2008 2:34 AM

I'm working on a web application and I'm using pdf to print some dynamically generated information. The development using pdf allowed me some very quick wins, but now I'm feeling the pain of trying to get more out of the pdf document and I'm wishing I had suck to HTML.

Having said that, the end user is more comfortable with a pdf document to save on their PC than an HTML file and on this basis alone, I use pdf.

I was far from being blown away with the structure of the pdf document and working with it programmatically. This comes down to the tools that are available and the fact it is yet another technology to get my brain around.

Brian on January 3, 2008 2:36 AM

I've never liked PDF, mainly because Adobe are so greedy with it. For the longest time you couldn't make them for free or edit them for free. The Acrobat Reader tool has always annoyed me, its slow, the interface takes a bit of getting used to and it lacks all kinds of things you normally get in text viewers/editors like a caret or the ability to view the document in a form other than pages... like continuous text.

Personally if its going on the web I am more likely to generate a PDF with anything awkward in it (maths symbols, funny diagrams etc...) then screenshot it, cut it out, save a .png/.jpg and then include that in some HTML document... I can appreciate the layout arguments if you intend the document for high-quality print... but thats not what the internet is for in general, its for providing information in easy to handle formats. What I like is a nice webpage that fills my screen on its own and doesn't waste my space with excess toolbars and by rendering everything ontop of rendered paper...

Jheriko on January 3, 2008 2:38 AM

When you view a PDF on screen, you're using a computer to view a document that was really *designed for viewing on paper*.

When you try and print an HTML document, you're using paper to view a document *designed for viewing by computer*. I have NEVER seen a non-trivial HTML document that's designed to print well.

Unsurprisingly, computers do a better job of emulating paper than vice-versa...

Roddy on January 3, 2008 2:39 AM

Oh, and the worst thing about PDFs? Acrobat Reader...

And the best? Foxit!

Roddy on January 3, 2008 2:42 AM

PDFs have a lot of advantages. The most common is production. Say what you will about it but when I create a word file for others to read its either sent as a PDF or RTF. I could create the HTML equivalent but why got through the hassle of copy pasting, recreating stylesheets, converting embedded images to links, and on and on.

Your right that PDFs aren't meant to replace websites. And they shouldn't. But they excel at being archives for static information. One thing I wished the Wayback machine and Internet Archive did was archive websites as PDFs. usually, you'll get text but the images and stylesheet are most likely busted.

PDFs and HTML serve different purposes. IANAL, but legally a PDF will stand up in court as a valid document, a web page that anyone can change in a moment will not.

I do agree that Adobe acrobat is junk. But I found better alternatives in Foxit and SumatraPDF.

Joe Chin on January 3, 2008 2:49 AM

I find the concept of page layout to be wasted if you are viewing it on a monitor, which is 99% of the time for me.

On a monitor, a PDF is completely suboptimal because you are effectively viewing a zoomed image rather than selectable wrapped text with screen-optimized fonts. You either find yourself looking at a page half-panned off the screen or "greek text" or both.

Even when printing, the PDF is often useless (no matter how carefully laid out a Foolscap page is, it will not print on A4 with satisfactory results.*

Sure, there are legitimate uses for PDF where it is useful, but web pages are not one of them. Although, I admit this is a bit like complaining that a skateboard is not a good cheesegrater: the real problem is that PDF is commonly misused in completely inappropriate contexts.

* In Australia, Foolscap/Letter are rarely used and A4 is considered standard. However, most PDFs encountered tend to be on US websites.

Paul Coddington on January 3, 2008 2:53 AM

Much of the impetus to use PDFs comes from the idea that I, as content creator, should have total control over how you, a mere reader, see my screen/page. We see the same idea in the use of CSS and other fancy tricks in HTML pages.

This is old-style thinking, which makes plenty of sense in world of print publishing. But it totally ignores those readers
- using a different screen resolution than the creator
- with vision problems
- with text-based displays
etc., etc.

Content SHOULD be able to adapt itself to different environments, and that's what basic HTML does so well. The fact that browser output to plain old paper was and is so lousy is a problem of the browsers, not HTML.

A. Lloyd Flanagan on January 3, 2008 2:57 AM

PDFs don't reflow text.
Usually I can't comfortably read them on iPhone.
I can't read them on OperaMini at all.
Sometimes even on 19" monitor to get readable text I have to zoom in so much, that it requires me to scroll horizontally.
With HTML+CSS I can break stiff layout to suit *my* needs. I don't need your (designers) layout.

kL on January 3, 2008 2:58 AM

Really? Then why is this "packaged" as a PDF?

http://www.si.umich.edu/~pne/PDF/howtoread.pdf

How does this "packaging" help me, the reader?

1) Embedded font
2) Whole document including images is one file. 1 document = 1 file. A "document" that is a directory of html file(s) and images feels unwieldy.
3) The text in the images is anti-aliased according to my preferences. If the diagrams in this document were inline html images, I would be at the mercy of whatever AA settings the author used when creating the image.
4) Zooming retains layout.
5) Annotation/comments (again all within the one file), side-by-side viewing, quick rotation of the page, other viewing-based features...

These are just things for this simple document you linked.

John S on January 3, 2008 2:59 AM

I work on software that can (among other printing targets) directly generate PDF documents. And I gotta say - I hate PDFs.

The key advantage PDF has over HTML is that it retains the 'look and feel' of a printed document. But at the same time, PDF files *cannot* be consistently printed accurately because the PDF specification itself does not allow the document author to control how the PDF is printed - only what it looks like when viewed on-screen.

In short, the ONLY thing that PDF documents are good for is book-emulation, which is better served by just using HTML.

JeffB on January 3, 2008 3:02 AM

I think the main problem with HTML/CSS is the lack of a proper archive/book format. How do you supply your manual/specs to customers who may want to read it offline?
An entire tree of files isn't very neat, not to mention the fact the IE will pop-up scary looking security warnings about local files.
Microsoft provides "Web Archive" (*.mht) format to represent a page as a single file, but this is still only half a solution and is not supported in all browsers.

As for HTML not printing well:

CSS2 does actually offer some support for printed (aka "paged") media.
@page, page-break-before, page-break-after, orphans, widows etc
See http://www.w3.org/TR/REC-CSS2/page.html

Unfortunately CSS2 only became a w3c recommendation in 1998. Browser suppliers have only had ten years, so support isn't very good yet. Perhaps by 2018 we'll be able to use it.

For now it would be good if web developers would at least provide a media="print" stylesheet which makes some effort to turn off menus, adverts and other elements that are pointless on the page.

And no, providing a link to a "Print Friendly" version is NOT a better solution.

Graham Stewart on January 3, 2008 3:17 AM

Yeah, this is pretty much the very old "page designers" vs. "web designers." The web is the web, a book is a book. Just because we use metaphors to describe things in computers doesn't mean that our applications would actually be better if they looked and word exactly like the metaphor.

I find web pages much, much easier to use than PDFs, because web pages are the web. I find PDFs handy to...print. Because they're print media, really.

You can put a cat in the oven, but that don't make it a biscuit!

-Max

Max Kanat-Alexander on January 3, 2008 3:18 AM

..so html and css is so well supported in outlook?? .. *CRY*

People tend to copypaste documents into outlook, because everybody is using outlook right? I hope that people will stop pirating office so we can get some real open standards. Because millions of people is using office without paying for it, but everybody is using it?

PDF is great, but I never understood the need for pdf reader intergrated in the browser. PDF files is and should be handled as documents, something you download to read or print.

HTML is a markup language handled by Web Browsers. HTML is good for simple documents and simple applications(talking layout + UI). HTML is great and with scripts and stylesheet it can do wonders. But since it is so badly handled, I think it is a good idea to keep the usage simple.


Peter Palludan on January 3, 2008 3:22 AM

For me, PDFs are great for things which are really, actually going to be printed on paper, especially in standard formats like booklets, etc. But that's about all they're good for. PDF is also a good format to release something _which was already a book_, I'd say, as the thought will already have gone into the design and layout work.

I dread web content being delivered in PDF format, especially as (when I'm not on the Mac, which has great PDF support built in) it normally means using some form of terrible creepware/bloatware/crashware.

And yes, I'm looking sternly in Adobe's direction here, because their bloody PDF reader keeps on trying to download products I don't want whenever it updates itself in a vain attempt to stop itself from crashing my browser every five minutes, or clinging on to its process for dear life such that I normally have to terminate it from the Task Manager at the end of the day if I've so much as glanced at a PDF document on the web in the morning...

Matt Gibson on January 3, 2008 3:24 AM

a strange, otherworldly out-of-browser experience.

Yes, a curious domain where your work looks exactly how you want it to look, and not as rendered by this version of that browser with those settings and those plugins and cookie options and whatever else. Who wants to go there?

Dave on January 3, 2008 3:33 AM

i hate scrolling top-dowm feature through all the pages with PDF's ,it totally takes away th feeling of reading a document.CHM's do a better job than PDF's.

gogole on January 3, 2008 3:35 AM

I hate PDF format files. PDF is totally anti-web. And I avoid sites offering information in this format.

Georgi D. on January 3, 2008 3:43 AM

I hate pdfs for reports some department in my corp us pdf for showing reports so that there charts will look nice but this only leads to when you need to get the data back out. do to the formating not being ez to convert into anything else lots of hand entering data

hammer on January 3, 2008 3:46 AM

Please write articles that don't suck.

Patrick P on January 3, 2008 3:48 AM

Wrote my thesis in latex pdf.

Sure beats writing it in html.

Carra on January 3, 2008 3:56 AM

While your Coding Horror PDF example makes sence, I also think you forget a few things. PDF is self contained, a container format like AVI if you will.

When ODF/OOXML or whichever breaks de-facto standard, it will be the same story bur probably better usability as Browsers will be able to render these files natively and who knows, the whole HTML/CSS mumbo jumbo could be starting to become history - as mere mortals (my dad for instance) could maintain his content.

I actually went the other way around recently. I used to have to maintain two versions of documentation for my project, a web based version (renderable from within a Java JTextArea) and a PDF for bundling with the software and for printing. Since Sun recently open sources a PDF-renderer I am now in the process of converging to only a PDF file. That makes a lot of sence considering that going the other way, printing from HTML, is a bad enough experience to justify it's own article on Coding Horror. ;)

Casper Bang on January 3, 2008 4:12 AM

Jeff, Jeff, Jeff.

Wikipedia.

I re-read "Burn Rate" just before Christmas and don't remember Kevin Kelly. Sure enough, Louis Rossetto is listed the founding editor, not KK.
http://www.wired.com/services/staff?staff=Magazine

Justin on January 3, 2008 4:15 AM

Joe Chin: I could create the HTML equivalent but why got through the hassle of copy pasting, recreating stylesheets, converting embedded images to links, and on and on.

Not to mention the slew of bogus HTML tags crapping up the document. It's a horror to do this with anything in Office because the HTML is downright filthy. When I export something I'd like to have clean markup. It's actually more sane to create in HTML first, then let Word do the job of conversion since few people care about the structure of Word files.

Book idiosyncracies? No thanks, scrolling vertically is easier.

Rob Janssen on January 3, 2008 4:27 AM

I also use PDF format to make a statement. However, my statement is that I'm not locked in to using MS Word!

Allan I. on January 3, 2008 4:28 AM

I am in agreement with Mr. Flanagan above: the major motivation for content creators to publish PDF is control. They want control over fonts and layout and width; this is all paper-think foisted upon the web.

I published similar thoughts in 2005:
http://iamacamera.org/default.aspx?section=design/web%20standardsid=15
and a follow-up after Joe Clark challenged my statements
http://iamacamera.org/default.aspx?section=design/web%20standardsid=16

Carl on January 3, 2008 4:34 AM

I'm working on a project right now where, for a POP system, we need to generate reports. In this case we are generating them from a client app, and they need to be redistributable - users should be able to save them to a location on their hard drive, and e-mail them to other employees in the company.

We had 4 options: MS Word format, HTML, plaintext, and PDF.

We went with PDF. It's packaged, and easy for users to transfer via e-mail - it doesn't look anyone into using MS Office, and it allows us for more stylistic control than plaintext.

PDF has a proper place in the world, but the web ecosystem isn't it.

Luke on January 3, 2008 4:39 AM

I second Malcom. PDFs like that are horrible to read on a computer.

pauldwaite on January 3, 2008 4:53 AM

Customers want pdf, just like they want grids instead of lists, and IE support instead of standard-compliant web sites/applications, just like... oh the list goes on and on.

I heard that pdfs are immutable, and that's why they want them (but you know, pdf writers exist out there...).

No no, in the year 2008 there's no *Good Reason* for this pdf fetish that some customers out there have.

F.O.R. on January 3, 2008 5:04 AM

I only really use PDF files when I want to take some piece of reading material on my smart phone (Motorola Q9m). Other than that, I think for the most part they are useless, and quite frankly, a lazy way to distribute information.

Zak on January 3, 2008 5:10 AM

Jeff,

"The massive inconvenience of PDF for the user..." (bolded, no less)? I click on a link and my PDF reader opens up. That's "massive inconvenience?" But, I guess you're talking about the context shift of going from browser to reader. But, then again, you are constantly talking about how you always have dozens of windows open at once. Those context shifts aren't inconvenient? Yet, one of those windows being a PDF reader is?

Now, having said that, ever since Adobe forced Microsoft to remove PDF support from Office 2007, I've been rooting for the death of PDF (and I used to own Adobe stock). Death to PDF! Long live XPS!

David A. Lessnau on January 3, 2008 5:15 AM

The choice to use PDF is almost always a choice to sacrifice ease of use for speed of production. I can print anything to a PDF and post it up online. I don't even need to be a webhead to do it. Generating nice HTML takes far more time than generating a PDF, ~especially~ when you want to post something that was designed for print layout. The process of converting even a simple print brochure for the web is time consuming and requires technical expertise. Hitting the "Print to PDF" (or better yet, just posting the PDF that the designer has already supplied) is so much easier.

That said, PDF implementation on the web sucks. HTML sucks too, but the tool we use to navigate the web (the browser) was built from the ground up to display HTML. Until the browser natively supports PDF (ha!), or we start using acrobat to browse the web (ha!), PDF and HTML will continue to fight with each other.

Patrick Stephens on January 3, 2008 5:15 AM

There is something that everyone is missing (having said that, I hope it is true... i skimmed over the last 2/3 of the posts, ulp). PDFs were not (as far as I know) invented as an alternative to HTML. They were created to fill a need in the printing industry.

Without getting into sordid details, until the advent of the PDF file, the only (reasonable) alternative for sending layouts to the printer was using Quark files. In the graphics (for print) field, there are 2 kinds of people; Quark people, and Adobe people (who now use Indesign, and used to use pagemaker). Quark makes quark files, indesign makes PDFs.

Once the printers got their hands on the PDF format, there was no going back. Now Quark users have to deal with the hassles of exporting a properly formatted PDF file.

The thing that makes PDF so handy is, the very fact that this discussion exists! I find it not only fascinating, but strangely vitalizing to read a debate about the usefulness of the format, from people who have nothing to do with the printing industry! How far we have come, indeed.

The Portable Document Format is exactly that. Not only can it integrate nearly flawless compression, but it embeds fonts... very very handy for the printing industry. It is useful for the Web... hell, it is useful for nearly anything.

What's that? Your design needs to be seen in print, distributed on CD, and used on the website? Well, you CAN have the art department make up a Quark file, redesign it for a help file and further redesign it into an HTML page... or, you can just make a PDF file and all your logistical artwork difficulties are solved.

So, anything that you see online as a PDF file, was probably also printed somewhere.

(oops, "Joe Chin on January 3, 2008 02:49 AM" seems to have touched on these thoughts briefly, I should have read more before I suppose)

Now that being said, I must agree with the conscensus here. Too much has been crammed into the format, and the readers are getting out of control. There are too many features; they are getting in the way of it's actual usefulness.

The downside of it has been well explored in previous posts, I don't need to be TOO repetitive...

The PLUS side is, there are many PDF readers out there, using the adobe one is unnecessary.

Philip Snelgrove on January 3, 2008 5:17 AM

Seems like the standard thread here is that PDF's are horrible to read/view on a computer. Yep, they are. And (depending on your browser plugin) they are a mode shift for the web surfer...and that is usually a bad thing.

But....

They do allow mere mortals to produce content. I work with a number of luddites (and I mean that with all due respect...) that could never, I mean never, produce anything remotely as "complicated" as an HTML page. Word is the pinnacle of their capabilities. So I'm left either converting their work into HTML or enabling them. Now they're set up with a PDF converter, that they "print" to, creating a PDF. Now that file can be published to the site, and as someone pointed out, it's a legal document. Less work for me, they get their goofy fonts (don't know how many times I've tried to explain why we can't use "BalloonLetterz" font on our website). It does create a different browsing experience, but I believe that's an Acrobat plugin issue...Safari on my Mac just opens it on another tab, just like an HTML page.

Can Google spiders crawl the content in a PDF?

As usual, good thought provoking post from Atwood!

Mike Shaffer on January 3, 2008 5:18 AM

There's a huge car crash of people with different outlooks here and yet we're all managing to scream past each other

People who are design-oriented (esp. print design) tend to love PDF because they know it will always look the same. Tech geeks tend to hate it because they don't care about things like embedded fonts (even though good design makes things better). There's nothing wrong with PDF in and of itself, but it feels like 90% of the times it's used on the web are times when HTML + CSS would be a better choice for all of the reasons Jeff mentioned. This will only be more true as CSS2 and CSS3 print support allows for things like better handling of page breaks, page and chapter numbering, etc.

A few random points:

What are the Mac users crowing about? I have a couple machines running OSX. PDFs pop up in Preview and I have to know to swap over to that. How is that any less jarring than having a browser plugin open in a new tab? Windows can handle PDF files just fine; it's the experience of opening them that's a problem.

Adobe != PDF. PDF is an open standard, so I doubt Adobe forced MS to take support for PDF out of Office. That sounds like an Urban Legend.

Tom Clancy on January 3, 2008 5:19 AM

So the trouble with PDFs is that people are trying to use them to work around the limitations of HTML? That sounds like the trouble with web designers to me. PDF is as good as anything for what it was created for: documents.

Larry Coleman on January 3, 2008 5:20 AM

I've seen a lot of crappy, unreadable HTML on the web. I can't recall an unreadable PDF.

PaulG. on January 3, 2008 5:25 AM

My annoyance with PDFs, is that when im reading a 600+ page ebook, there is no bookmarking feature in Foxit or Adobe reader. (If im wrong, PLEASE please correct me!)

Bill on January 3, 2008 5:28 AM

y do u antiamerican idiots try to put down a great american corporetion like ADOBE? When MIKE HUCKABEE is pres i hope we can outlaw these kinds of "opinion" blogs

sallary ross on January 3, 2008 5:32 AM

I AM A GODFEARING AMERICAN

sallary ross on January 3, 2008 5:33 AM

Book idiosyncracies? No thanks, scrolling vertically is easier.
PDF is as good as anything for what it was created for: documents.

The last time i checked documents were flipped through page by page ,as virtual documents PDF's should implement this.And in my opinion scrolling vertically through a PDF is disorienting.

gogole on January 3, 2008 5:47 AM

The trouble with PDF is that companies are demanding it. Especially those who are converting old paper documentation, statements and bills to the web.

Instead of putting forth the cash to convert paper documentation and statements to HTML, they convert it to PDF. PDF "is" the paper document. It contains all of the print messages, images, and logos. It looks and feels exactly the same. Large print shops and document authoring tools generate PDF just as they generate the files they send to the large batch printing machines. Those large Xerox and AFP files can be converted into PDF with a little work and the right toolset.

If an end user wants to save their paper documents today they simply take the document and stick it in a filing cabinet. What if you decide to turn off paper and go all electronic? How do I save my document? The company doesn't want to front the coin to store years of documents. Do I click for a printable version and save the HTML on my disk? No. It's not a good solution. Do I click on a PDF and say save? Yes. It is the same document as the one I received in the mail (without the watermark on the print stock). It's also much simpler, a single file, as opposed to an HTML document with a folder containing copies of a bunch of images. It's probably also easier for the customer support people to handle viewing a PDF rather than navigating them through a web page I've saved.

Long and short of it, there are reasons why when I go to Fidelity.com, I can view my statement online in HTML, but when I go to print the doucment, I get a PDF that looks the same as the one I got in the mail. There are reasons why my Columbia Gas bill is presented in PDF instead of HTML. And they are generally usability, legal and project funding reasons.

Mike Cornell on January 3, 2008 5:47 AM

I just noticed: no ad at the end of this post?

CK on January 3, 2008 5:54 AM

Packaging? You can save a web page as an .mht file in Internet Exploder if you like. There, it's packaged. You can then put this on your website and link to it.

PDF files? my, looks inside a lot like an old postscript file [but compressed], which was desgined for sending output to printers.
And I really loved the "benefit" of not being able to easily copy text from a PDF file.

Nchantim on January 3, 2008 5:55 AM

We need them PDF's to download eBooks from eMule and restrict them to 1 file, so we don't have to deal with archives.
You guys are such n00bs :D

The Lucifer Principle on January 3, 2008 6:03 AM

I think that PDF and HTML cover two different domains. One is for print, the other is for the web. HTML just doesn't print nicely and with a PDF I can embed fonts to make it look exactly the way I want it to.

Akira on January 3, 2008 6:08 AM

It's incredible how many are missing the point. Maybe it would be good to remind that PDF derives from PostScript, which is a language for printer rendering -- that's also why the quick and dirty way to make a PDF file out of any other format is to have a virtual printer (e. g., Primo PDF).

If you want to distribute a document as if you were sending it by snail mail, and if it is crucial to keep the page layout, PDF is perfect (as the above example of the student sending an essay to his teacher). However, if that document is going to be reformatted (as when a writer sends a manuscript to a publisher for DTP composing), PDF just doesn't work (unless it is some kind of vanity publishing, but I digress...)

(More or less) the same idea should be applied for the web: if you want to distribute a finished paper, where layout is crucial, PDF is OK (not perfect, but OK). If you want to have some form of interaction (even if it is just clicking), a light footprint on resources, and content over presentation (while maintaining the branding of a website and the flow of navigation), PDF is a stupid idea.

Jeff is right to the point. Please don't think of screen media and interactive media as an extension of written media.

(OH, and by the way: of course the Kindle should have PDF support; although electronic, it is mainly a BOOK reader)

Jorge Rosa on January 3, 2008 6:11 AM

The annoying thing with PDF's is that you can't just "copy/paste" text as you would from a MS Word document. They insert a new row character at the end of every line instead of simply wrapping the damn line.

The Lucifer Principle on January 3, 2008 6:11 AM

The large amount of clueless comments on this post is telling about what the problem with PDF's is: People have no clue what it's for.

Here is it in big bold letters: PRINTING.

A PDF is guaranteed to print the same no matter where it's printed. In fact, it's the JAVA of printing: Write once, Print Everywhere!

Matias Nino on January 3, 2008 6:13 AM

relying on PDF was more defensible in 2001, when browser
printing support was notoriously poor

Browser printing support is still pretty poor in Firefox. My standard workflow for printing web pages is to try and print them in Firefox (my default browser). If that fails (which happens quite a bit), I switch to MSIE, which has a much higher hit rate. If that also fails, I use MSIE to print to PDF Writer and then print the PDF, which almost always works.

So there's an argument in favour of PDFs: It's what you use to bypass broken browser printing.

Dave on January 3, 2008 6:13 AM

What is often forgot, visually challenged people have problems with PDF files, while in browsers they can use their preferred font and size.
That's one of the basic ideas of HTML, let the user select his preferences, not force your preferences on him.

bignell on January 3, 2008 6:15 AM

Links to PDF files assume the user has a PDF viewer installed.

Even worst, the sites that want to make sure you know what you are doing and are checking for the existence of Adobe Reader, completely ignoring the existence of alternate products.
The better of the worst at least just let you continue and open the doc in the default program, others force you download the file.

Hartmut on January 3, 2008 6:18 AM

Ditto Bill - no BOOKMARK feature!? Just a single, solitary "I'm up to here" button would have been welcome, but noooo...

TristanK on January 3, 2008 6:22 AM

"The last time i checked documents were flipped through page by page ,as virtual documents PDF's should implement this."

I'm using Adobe Reader and I can "flip" through pages just fine.
Just change the "View-Page Display" option to "Single Page", or use the Page Up/Down buttons on your keyboard, or switch to Full Screen view.

"Packaging? You can save a web page as an .mht file in Internet Exploder if you like."

Yup, and that will only work for ONE page and the result will only be usable in Internet Explorer.
What if you want to package up multiple pages (e.g. a manual) and make them available to everyone that visits your site?

Graham Stewart on January 3, 2008 6:25 AM

I must say Jeff, usually I'm in complete agreeance with you, but I think there's a place for PDF's, and that should be when a document needs to be printed or protected (as others have already stated).

You are completely correct with the document included.. That's not for print, it reads like a webpage anyway. Out with the PDF!

Shannon on January 3, 2008 6:31 AM

I agree with the essential point of this blog that we should try to stick with one standard. I would like to point out two very good uses of PDF's on the web. I used to work for a company that dealt with content management, in this arena a read-only PDF is the perfect way to deliver a consistent read-only format in both the web and desktop environments. The second is for forms that cannot be submitted electronically. For instance there are still some companies that require you to fill out a form and sign it and either mail or fax it in to get what they are offering. From a user perspective it's a lot easier to click on a link to a PDF in that case.

Arcond on January 3, 2008 6:37 AM

Hey Now Jeff,
I really like your statment 'it (HTML)would certainly generate greater advertising revenue through the existing web ad ecosystem'. I also wonder your thoughts on Adobe vs. Foxit, which is a better reader? Maybe you could post 'The PDF reader Showdown' similar to Browser showdown ( http://www.codinghorror.com/blog/archives/001023.html ).
Coding Horror Fan,
Catto

Catto on January 3, 2008 6:40 AM

PDFs are ok for things like Mathematics, or for longer items like books that you want to view offline.

Also, for some items, it makes previously published material available without the burden of reformatting, like catalogs and datasheets for electronics parts.

For "normal" web pages, I have to agree it's a mistake, and the people who do it seem like control freaks to some degree.

Just a dot on January 3, 2008 6:44 AM

My OS of choice http://plan9.bell-labs.com/plan9/ has a PDF viewer but no web browser.

Stick that up your chops :)

maht on January 3, 2008 6:45 AM

I'm not sure what you're getting at. Adobe's Windows
PDF reader also has "full text search etc etc".

Try using PDFs on a Mac to see what the OP meant. Under OS X, PDFs are a first-class citizen. Under Windows they're a crudely bolted-on afterthought. (Just for the record, I'm a Windows user, but I have to say that OS X really has PDF support sorted out).

Dave on January 3, 2008 6:50 AM

PDF for me is when I need one of two things: It needs to print the way it looks; or I need to make sure that no one changes it (e.g. Contracts). Otherwise PDF isn't useful at all....

Though the form filling API is wonderful if you need standardized forms (e.g. IRS.gov).

Shawn Wildermuth on January 3, 2008 6:53 AM

I used to see PDFs as the speed bump of the Web. In many ways they're like an unexpected cul-de-sac though too.

I destest them. In most cases even printed you have a pile of inaccessible information, though that seems to be more the product of poor organization combined with the nature of paper documents.

Generally I'd rather they just go away, for many of the reasons already given by others.

hmm on January 3, 2008 6:53 AM

@Graham

I was being a bit ignorant about the page flipping thing. thanks for the correction though , coming to think of it PDF's being immutable makes the format reliable in archiving documents.

gogole on January 3, 2008 6:56 AM

PDF is a document file format. It definitely wasn't made to be an interactive web presentation format. In the same way, HTML is not nearly sophisticated enough to do professional publishing; and it shouldn't because it's not a publishing format. In fact, HTML is a super-simplified subset of SGML (a full fledged document markup language). Comparing PDF and HTML is like comparing apples and oranges (or MIDI and WAV files). Before I was a programmer I worked for Xerox, scanning in thousands of manuals and documents for different companies. The PDF format is perfect for that. When they'd ask for 50 copies of manual xyz, I would pull it out of the archives and have it start printing in seconds. The PDFs can embed all kinds of meta data/printer specific info in them that makes it possible to do production printing from PDFs. I can agree that most web content should NOT be PDF, but I don't think HTML is a good substitute for a good document format.

Justin on January 3, 2008 6:59 AM

Jeff,

I agree with you entirely if you frame your argument to say that if we're just giving consumers information, HTML/CSS is a better choice because even if there are minor formatting issues, they still get all of the information. But you forget the whole point of PDFs: to have a platform independent method of guaranteeing that your document will render the SAME EXACT WAY on all computers.

This was a HUGE deal not too many years ago with all of thse *nixes and Macs and PCs running around. In a way, this still is a big issue. You have to remember that PDF is a publisher's tool; a layout editor's tool. The point of PDF is to preserve the presentation. Does the general populace need an exact representation of you printed brochure for your tourist attraction? No, they shouldn't need it when they're on your website unless you have incompetent web developers. But if the brochure that you just laid out needs to go to your boss for approval for print to put in your building's lobby, then hell yeah PDF makes sense.

Also, consider this: with the advent of digital signatures, PDF gives businesses ways a secure way to look at proposals, add comments, sign when approved and note what needs to be noted without giving everyone in the process the means to edit the document to their liking. Can this be accomplished with a website? I dare say no.

This is the old Mac vs. PC argument: at the end of the day both OSes are just tools. The tasks that you most often perform will dictate which platform you run. Personal preference comes into play at the intersection of common activities such as surfing the web. But editing video on any kind of a professional basis? You're killing me to say that you prefer Windows Movie Maker to Final Cut Pro.

Professor Tom on January 3, 2008 7:04 AM

Being in the content management business, PDFs frequently violate the DRY principle from an information architecture standpoint. They are maintained separately. Cutting edge firms are beginning to utilize document management/content management systems that are capable of outputting multiple formats of the same data, but we're YEARS away from it being standard practice.

Because PDFs have a high level of creation overhead, they are born outside of the normal content flow of an organization. Even though the executives and designers love them, they are the neglected stepchild of the communications department, who generally prefers web and email communications.

Personally, I see a PDF and I think: This is not the latest information, it's not going to be accurate, and it's not going to be detailed enough to provide me the information I want.

Rick Cabral on January 3, 2008 7:04 AM

I also find that organizations that emphasize PDFs as a communications medium are sensitive to the high cost of print design, and are attempting to squeeze more value out of a print project.

Rick Cabral on January 3, 2008 7:06 AM

Jeff, I do not agree at all on your piece of writing, but I have trouble explaining why..

First of all, PDF was never meant (and not commonly used) as a replacement for HTML. If you do so, I'd say you are using it for all the wrong reasons. Second of all, Adobe provides customers (or should I say consumers) with a lot more then just Acrobat to show their content on the web! (I like to refer to FLEX, since it is so closely related to PDF and Adobe itself, and because it is a 'new' and hip technology).

Also, like most of Adobe's products, Acrobat is cross platform (as far as it goes) which is good thing. Considering that HTML renders differently on every browser on every system on every machine in every country in every language on every resolution..

My point is; PDF for the web is extra. It was never meant to be. It's a publishing thing. Try shipping a HTML page to your publisher!

MiRAGe on January 3, 2008 7:10 AM

Also, PDF allows content creators control over their content, while allowing the content to be passed around.

Take for instance this page. For someone else to read it, I either have to give them the URL, print it out, or save as something like a MHT.

But how do they KNOW that the content is the original? Could I have saved the file locally, edited to change the point of view expressed, and then provided the copy to them?

PDF allows restrictions that you cannot get with HTML or PDF. You can:
- lock a file with a password
- prevent content from being copied
- prevent printing the document
... and so on ...

Nasty eBook publishers know these tricks. So do 'whitepaper' authors who lock their whitepapers to prevent modification, to prevent corrupting their statements.

Oh, and MHT(built into IE and available as a AddOn in FF) will save the CURRENT page page only. A PDF is more like a Word document, as it can have many pages, a table of contents, and a vast array of other features that are not dependent on the capabilities (or quirks) of browsers.

That being said, programming code to do automatic layout in PDF format, where position can be expressed in hundredths of a mm, is a pain in the... you know...

rustyvz on January 3, 2008 7:26 AM

I think that the PDF format would be a lot more useful if Adobe would strip out the 99% of the engine that you don't need when loading it, or at least make it a load-on-demand operation. Waiting 15 seconds to load the viewer each time just plain sucks.

And have you tried to maintain many PDF files for common changes? If PDFs had an "Iframe" or "include" type construct you could really make them more useful.

Chris Chubb on January 3, 2008 7:27 AM

I tend to revile Adobe entirely because of Acrobat, it is entirely too damn slow for the web, as it is today, the promotional offers that Adobe pummels us with and the clumsy, giant software updates irk me to no end. I only download, and then review .pdf outside of my browser, otherwise I can't switch tabs in Firefox without waiting for it to load, by which time I have entirely lost my patience, and then who knows, I might starting killing processes almost at random. I'm too wise, I know it is always the Acrobat lagging my browser, so I do my best to avoid them in my personal surfing.

I see your point, and I'm sure Adobe wishes we would all use .pdf instead of HTML, but I think fillable secure forms, white papers, and scanned documents are served well by .pdf format, but we need some optional software with which to view and interact with, as well as edit them. I know there are open source alternatives, but I have yet to try them. Interesting read, thanks all.

Jeremy Anderson on January 3, 2008 7:29 AM

Chris Chubb: for fast loading times use Foxit PDF Reader

The Lucifer Principle on January 3, 2008 7:33 AM

Having read through a pile of further comments I'll emphasize my previous comment even more: The majority of perceived PDF suckage is due to its awful implementation under Windows. If it were handled the way Apple does it (and again, I'm saying this as a Windows user and not an OS X fanboy), the perception of PDFs would be far less negative.

Dave on January 3, 2008 7:42 AM

I think you're missing one key point for PDFs. They're savable. If I create something, I can make it into a PDF, and save it offline. I can easily see two cases where this would be better than an HTML version.

1) That will allow me to send it to other people, that may not have access to it otherwise (say it is located on a company intranet, and while the person I'm sending it to has a business reason to see that information, they don't have access to the intranet).

2) A document that is to be used as reference. I may want to save that document so I can use it when I'm not connected to the internet (while I know you can view HTML while not connected, I'm assuming that an HTML version would be displayed on a website, and thus no connection would mean no ability to view). It also means the user can maintain a copy, and not have to worry about the site removing it and they no longer have access to it.

Just my $.02

mjmcinto on January 3, 2008 7:43 AM

While I agree with Jeff on the inconvenience, and often overkill, of the PDF format, I do agree with Stephen Schmidt on PDF's packaging purposes. Just try saving most any web page from the internet and send it to someone via email (and I don't mean a link to the content). PDF is very good for document management this way. However, I think that Internet Explorer's "Web Archive" format, as mentioned by Graham Stewart, is better for most content currently packaged as PDF. Currently stuck in "Proposed Standard" status since 1999, RFC 2557 (a.k.a. MIME HTML), seems to be the missing functionality that should be standardized in all modern web browsers.

http://tools.ietf.org/html/rfc2557
http://en.wikipedia.org/wiki/MHTML

Craig Boland on January 3, 2008 7:45 AM

I'm with Jeff on this one. PDF has many virtues, but friendliness to the web is not one of them. Too damn slow, even with Foxit Reader.

Andy C on January 3, 2008 7:52 AM

I think the biggest issue here is to remember that PDFs generally occur as a convient medium for multiple paths of distribution. Sure, someone could send a link to a website, but in most cases it's more effective having an actual file. Specifically in a bussiness eviroment. I don't think people intend to USE pdf as a replacement for HTML, I think it just comes across that way. Plus, it's much easier to convert text from another "medium" and just throw it up on the web as opposed to having to create an HTML document from it. Not to mention I can scan and print to PDF, it's quick and dirty but efficeint.

Not Quite on January 3, 2008 7:55 AM

The old Acrobat from ~2000/2001 wasn't bad, but they reached a point where adding every possible functionality was more important than small, speedy, and not annoying. I don't want to use PDF anymore and I avoid the pdf links like they have plague.

Printing a web page from a browser is one of the worst experiences I've gone through a web developer. The features that make them good anywhere, anycontent browsers contradict the precision needed for form printing.

Stephen on January 3, 2008 7:57 AM

First of all, I agree with what many people have said -- use the right tool for the job, PDF is ideal for documents to be printed, PDF unifies a document in a way that HTML can't, Windows PDF integration sucks compared to Mac and Linux (specifically KDE and probably Gnome), Adobe's PDF reader is increasingly big and annoying, etc.

But many have mentioned "immutability" of PDF, and there I must partially disagree. While Adobe certainly promotes that idea, it's not entirely true. A PDF can be loaded into a PDF editor, or converted into something that can be loaded into a vector graphics editor. A password-protected document can still be copied as a file, as well as screenshotted (is that a verb?). A PDF reader doesn't have to obey a "don't print this" directive. And as I recall it's even become possible to create a PDF with arbitrary content that still passes an MD5 digital signature validation.

Certainly it's hard to violate the protections (to varying degrees depending on the specific protection you're depending on), but to my knowledge the only PDF protection that can't be broken (yet) is the password protection to read it. Most of the protections people rely on in PDF won't stop someone determined to get through them.

Rob Funk on January 3, 2008 8:00 AM

Page numbers. Try reading the XML specs when printed. Without hyperlinks, page numbers are essential.

This is easy to solve. Two ways:
1. force page-breaks; OR
2. hyperlink serialization: hyperlinks, when printed, are annotated with the page number of the target, if it's in the same document. We could add a switch, to enable/disable this behaviour, so it happens for the page of contents, but not for every term.

Brendan Macmillan on January 3, 2008 8:00 AM

@Graham Stewart: "... [.mht files] will only work for ONE page and the result will only be usable in Internet Explorer."

Actually, Opera supports .mht files natively, and Firefox can read and write them as well, through plugins. AKAIK MHTML isn't limited to encapsulating a single page. That's just what current web browsers save.

It's not such a big deal to .zip up a tree full of HTML and related resources anyway. Bruce Eckel has been distributing his books this way for years. It works fine. For reading and using a book from a computer screen, I think it works far better than a PDF would.

Western Infidels on January 3, 2008 8:02 AM

More comments»

The comments to this entry are closed.