Convert PDF to HTML5

How to convert PDF to HTML5

Converting PDF content to HTML5 for flipbooks or otherwise can be done in a number of ways. This tutorial will explain how to peek under the hood in order to understand how some providers try to convert your content.

Try converting a document

The 'lets fake it until we make it' approach

Some publishers claim to convert your content to HTML5 but they're not actually doing this, instead they're just bitmapping your pages. What does this mean? It means that each page in your PDF content will be converted into a JPEG or PNG image in high resolution in order to be shown on the web site. You'll see if this is what they do if you try to zoom in deep enough into a text. Text tends to start to blur eventually with images and text blending together in a bit of a pixelated mess (hello 1998!). Not only does it look bad, it also loads slowly.

How do you ultimately know if this is what's going on? Try zooming deep into a page with your browser and check if the text stays sharp. If you're not afraid of HTML, then peek into the source code of the publication and you should see references to the image for each page. Open the page in a new tab and you'll see if text has been rendered together with images or not.

The middle way - The Canvas

HTML5 offers a number of ways of rendering text and images and while most web pages render images and HTML5 elements as objects, you can also use a canvas to do this which is more similar to what a painter would do if they were drawing a picture on a canvas. Text and images are drawn on the canvas, in a resolution which suits the viewer or zoom level so that they always appear sharp. The images and texts are thus initially separated (before drawn) but as the page is shown, these objects are then blended together. So what's good about this approach?

It stays sharp no matter how deep the zoom is and size of the device

It's very accurate because of the pixel precision of the canvas

It can be combined with various 3D rendering methods

The Zine template, which can be seen on the picture, uses this approach when you use FlowPaper to convert your documents into HTML5, resulting in sharp flipbooks and is able to render the publication with 3D effects in a capturing manner.

All the way - Real HTML5

Some conversion tools such as FlowPaper Elements, takes the conversion all the way and converts the text and fonts inside a PDF into real HTML5. It also converts the texts inside the PDF document into real HTML5 elements with their corresponding headers. Why is this a good option sometimes? Its good of three main reasons:

Google loves real HTML5 as it makes indexing the content easier

It always stays sharp on screen no matter the zoom level and screen/device size

It reads up nicely in screen readers for people with visual impairments

If you want to convert your documents to HTML5 within FlowPaper, then use the 'Elements' template when importing your PDF document. You can see an example of a flipbook published created using the 'Elements' style in the picture to the right.

Converting large documents using the command line from PDF to HTML5

FlowPaper supports splitting PDF documents into one file per page and loads only the visible pages to reduce bandwidth consumption and load time for your visitors. To split the document we typically recommend a tool like PDFTK if you are doing this manually. Our Desktop Publisher will do this work automatically for you from your desktop if you prefer that. Use the following command to split the document manually using PDFTK:

pdftk.exe Paper.pdf burst output Paper_%1d.pdf compress

You will also need to supply a JSON file to FlowPaper when using this mode. This is to give FlowPaper the ability to search the document even if not fully downloaded. A JSON file can be created from your PDF with the following command using PDF2JSON:

pdf2json.exe Paper.pdf -enc UTF-8 -compress -split 10 Paper.pdf_%.js

To configure your viewer with these new files, you must adjust your PDFFile and JSONFile parameters. The following example uses the files created above to load them in split mode:

  $('#documentViewer').FlowPaperViewer(
  { config : {
    PDFFile : 'pdf/Paper_[*,2].pdf',
    JSONFile : 'pdf/Paper.pdf_{page}.js',
    RenderingOrder : 'html5'
  }});