11 Replies Latest reply on Sep 19, 2011 7:12 AM by ssl829

    EPUBs and page numbering


      There are frequent questions about the number of pages shown for e-books compared with paper books. This is an interesting side-effect of Adobe's attempt to provide page numbering in an e-book format (EPUB) that doesn't really do pages.


      The text in an EPUB file basically consists of a collection of HTML files, just like Web pages. Just like on the Web, each text file is one huge page that you scroll down (or up) through. How the text of the e-book is broken up into separate files varies, but commonly each file contains one chapter. It's recommended that no file be over about 1/4 Megabyte, to keep readers from running out of memory from trying to handle a big file.


      When your reader transitions from one of those "chapter" files to the next, there is a page break. The earlier file might finish near the top of the screen, leaving a bunch of white space. This works very nicely if each file contains a chapter—the new chapter heading appears at the top of the next screen. There also can be a delay while the reader opens up the next file. For this reason, it's not recommended (and rarely done) to put each book page into a separate file.


      So… a typical EPUB ebook might have a couple dozen "chapter" files in it. One for each chapter, plus a few for pieces like the front cover, front matter, etc. But the number of pages shown is a whole lot more than a couple of dozen, right?


      Here's the deal: Adobe "synthesizes" the page count and page numbering. Adobe reader software looks at the size of each of those HTML files and counts one page for each 1K (1024) bytes in size, rounded up. So if the file is 114 bytes, that's one page. 1024 bytes, one page. 1025 bytes, two pages. Now, what it's looking at is the actual size of the file inside the EPUB, and an EPUB is just a ZIP file with a specific layout. Since it's a ZIP file, the contained HTML files are compressed. Adobe uses the compressed size, not the actual number of bytes in the original HTML file. Anyway, it adds up all of those page counts and that's the number of "pages" in the e-book.


      Within each "chapter" file, the pages are divided up evenly. So a "chapter" file that was 1025 bytes is two pages, right? Let's say that after it's uncompressed, there are 2425 characters in that "chapter" file. The first page will be 1212 characters long, and the second will be 1213 long.


      Notice that because the page count is based on the compressed size, the actual number of characters per page is not really predictable. It also varies between "chapter" files, since the compression ratios may be different. It'll probably be somewhere around 2000 characters, or 350 words, plus or minus a bunch. (The character counts include the HTML markup in addition to the actual visible text.)


      By the way, if DRM has been applied to the compressed "chapter" file, Adobe will attempt to account for any DRM overhead before calculating the number of "pages" in that file. This way the number of "pages" is (we hope) the same for DRMed and un-DRMed versions of the same e-book.


      Keep in mind that this is just how Adobe does it. Since NOOK uses Adobe Reader Mobile for EPUBs, and Adobe Digital Editions and the various NOOK apps also use EPUB software from Adobe, they all agree as to "page" numbering. That's Adobe's plan: the page numbers don't mean anything you can put your finger on, but for a given e-book they are constant from one platform to another.


      P.S. Adobe offered an EPUB extension called a "page map" which allowed specific page numbers to be applied to HTML anchor tags. However, that extension is non-standard EPUB and is generally discouraged. The NCX file inside the EPUB (essentially the table of contents file) also has the ability to tie page numbers to specific locations within the EPUB text, but pretty much no readers pay any attention to that. So for now, Adobe's synthesized "page" numbering is about all we've got for EPUB.

        • Re: EPUBs and page numbering

          Very interesting - thank you for researching that for us.


          I had assumed, incorrectly, that the ePub creation software provided a page map.


          • EPUBs and page numbering - summarized

            You know, for all of that blathering I did, it'd probably be easier to summarize the results.


            • The page numbers that we see in EPUB are not true pages. They're a numbering system invented and used by Adobe.
            • The page numbers are calculated based on data size.
            • The amount of text in EPUB pages varies from page to page, but most pages are somewhere between 300 and 400 words.
            • Each "chapter" file within the EPUB starts on a new page. There is a visible page break at the end of a chapter file when you're reading the e-book. Adobe assigns a new page number to the start of the chapter file.
            • The starting and ending points, and therefore the size, of any given page in any given EPUB e-book won't change as long as the file doesn't change.
            • All e-book rendering software that does Adobe page numbering should calculate the same number of pages and the same page starting and ending points. All NOOK software uses Adobe rendering software, and of course Adobe Digital Editions also does, so they all should agree on page numbering.


              • Re: EPUBs and page numbering - summarized


                So, if I understand all of the above, any version of  "Ya Ya's Hypothetical Novel" should have the same number of pages, and any version of "Little Women" (my favorite book ever) will also have the same number of pages.  
                It is not safe to assume that the proportional difference in pages between YYHN and LW reflects a proportional number of words, however, because the margin of error of that is +/- 30%.
                Am I on the right track here, or am I completely confused?  (It's been too many years since my math minor, and, it's the end of a Friday...)


                  • Re: EPUBs and page numbering - summarized

                    No, the page numbers won't necessarily be the same for different versions. The calculations are on a file-by-file basis. Any copy of the original file should have the same page numbers, but even recreating the file by a Calibre EPUB-to-EPUB conversion can result in a shift.


                    However, for a given text with no changes in the material (from your example, Little Women), the differences should generally be small. Here are some things that contribute to this variability:


                    1. How the text is broken up into "chapter" files. Each chapter file always starts on a new page.
                    2. How much ZIP compression is applied to each chapter file.
                    3. How close to a multiple of 1024 each compressed chapter file ends up being.


                    There are some hidden parts of the text itself that affect character-counting, and thus where the pages start and end within a chapter file:


                    1. The HTML markup that's applied. The HTML tags are counted for the purposes of page numbering.
                    2. Multi-byte characters. Some of the most common multi-byte characters are curly quotes (“ and ” instead of "), apostrophes (‘ instead of '), dashes (– and —), and the ellipsis (…). These count as one character but compress more like they're two.


                • Re: EPUBs and page numbering

                  I recently used PubIt! To publish my first two novels. The instructions said I could just upload Word files and that it would accepts them. I found out that if I wanted page breaks for each new chapter I had to insert a section break at the end of the previous one.


                  I shared my books on the forum and was surprised when a reader "complained" about how short the book was. I attempted to view the book myself on my own Nook and found that the page count was extremely low. I had to click the page advance 8 times before the counter at the bottom actually rose to 2/242.  Since "242" seemed a very low page count, I figured I needed a better yardstick to measure how many "Nook pages" I actually had.


                  This solution directly deals with the "word count" issue mentioned earlier and satisfies the need of balancing long and short words. I counted the number of words per page [using Amasis small font] and found an average of 150. Then I took the total word count from the original Word documents: 94,378 / 150 = 629 Nook pages and 90,863 / 150 = 606 Nook pages.


                  If any of you know how to make the Word docs correctly translate the page numbering, then I'm all ears. But I do like the idea of starting to post the number of words in the synopsis of an e-book so the public knows what they are getting beforehand! [even if we just say "94K words" or "91K words" using my 2 books as examples.] I'm going to make that change to both pages later today!



                  • Re: EPUBs and page numbering

                    I have a Sony reader and now I just bought the Nook Color which I really like so far.......My question is I have alot of books on my sony and am trying to get them on my nook........My problem is I see some EPUBS  but looks like most of them are BBeB   what the heck is that and can i change them or use them in the nook?   Thanks  hope I made myself clear.

                    • Re: EPUBs and page numbering

                      I've never really cared about page numbers on my Nook.  I don't pay attention to them.


                      But, I'm now taking an English lit class online and am being assigned page numbers to read.  I'm having trouble getting a breakdown of chapters and am not sure where I am supposed to be.  Having an idea of how page numbers match up on my Nook would help.  Especially given that B&N heavily pushes Nook Study to college students.  I'm just returning to college after 20 years, but I certainly remember being assigned page numbers for reading assignments and in the 2 classes I'm taking now that is the case.