Reply
Doug_Pardee
Posts: 5,522
Kudos: 4,015
Registered: ‎03-09-2010

EPUBs and page numbering

There are frequent questions about the number of pages shown for e-books compared with paper books. This is an interesting side-effect of Adobe's attempt to provide page numbering in an e-book format (EPUB) that doesn't really do pages.

 

The text in an EPUB file basically consists of a collection of HTML files, just like Web pages. Just like on the Web, each text file is one huge page that you scroll down (or up) through. How the text of the e-book is broken up into separate files varies, but commonly each file contains one chapter. It's recommended that no file be over about 1/4 Megabyte, to keep readers from running out of memory from trying to handle a big file.

 

When your reader transitions from one of those "chapter" files to the next, there is a page break. The earlier file might finish near the top of the screen, leaving a bunch of white space. This works very nicely if each file contains a chapter—the new chapter heading appears at the top of the next screen. There also can be a delay while the reader opens up the next file. For this reason, it's not recommended (and rarely done) to put each book page into a separate file.

 

So… a typical EPUB ebook might have a couple dozen "chapter" files in it. One for each chapter, plus a few for pieces like the front cover, front matter, etc. But the number of pages shown is a whole lot more than a couple of dozen, right?

 

Here's the deal: Adobe "synthesizes" the page count and page numbering. Adobe reader software looks at the size of each of those HTML files and counts one page for each 1K (1024) bytes in size, rounded up. So if the file is 114 bytes, that's one page. 1024 bytes, one page. 1025 bytes, two pages. Now, what it's looking at is the actual size of the file inside the EPUB, and an EPUB is just a ZIP file with a specific layout. Since it's a ZIP file, the contained HTML files are compressed. Adobe uses the compressed size, not the actual number of bytes in the original HTML file. Anyway, it adds up all of those page counts and that's the number of "pages" in the e-book.

 

Within each "chapter" file, the pages are divided up evenly. So a "chapter" file that was 1025 bytes is two pages, right? Let's say that after it's uncompressed, there are 2425 characters in that "chapter" file. The first page will be 1212 characters long, and the second will be 1213 long.

 

Notice that because the page count is based on the compressed size, the actual number of characters per page is not really predictable. It also varies between "chapter" files, since the compression ratios may be different. It'll probably be somewhere around 2000 characters, or 350 words, plus or minus a bunch. (The character counts include the HTML markup in addition to the actual visible text.)

 

By the way, if DRM has been applied to the compressed "chapter" file, Adobe will attempt to account for any DRM overhead before calculating the number of "pages" in that file. This way the number of "pages" is (we hope) the same for DRMed and un-DRMed versions of the same e-book.

 

Keep in mind that this is just how Adobe does it. Since NOOK uses Adobe Reader Mobile for EPUBs, and Adobe Digital Editions and the various NOOK apps also use EPUB software from Adobe, they all agree as to "page" numbering. That's Adobe's plan: the page numbers don't mean anything you can put your finger on, but for a given e-book they are constant from one platform to another.

 

P.S. Adobe offered an EPUB extension called a "page map" which allowed specific page numbers to be applied to HTML anchor tags. However, that extension is non-standard EPUB and is generally discouraged. The NCX file inside the EPUB (essentially the table of contents file) also has the ability to tie page numbers to specific locations within the EPUB text, but pretty much no readers pay any attention to that. So for now, Adobe's synthesized "page" numbering is about all we've got for EPUB.

Inspired Bibliophile
LarryOnLI
Posts: 2,001
Registered: ‎01-04-2010
0 Kudos

Re: EPUBs and page numbering

Very interesting - thank you for researching that for us.

 

I had assumed, incorrectly, that the ePub creation software provided a page map.

 

Doug_Pardee
Posts: 5,522
Kudos: 4,015
Registered: ‎03-09-2010

EPUBs and page numbering - summarized

You know, for all of that blathering I did, it'd probably be easier to summarize the results.

 

  • The page numbers that we see in EPUB are not true pages. They're a numbering system invented and used by Adobe.
  • The page numbers are calculated based on data size.
  • The amount of text in EPUB pages varies from page to page, but most pages are somewhere between 300 and 400 words.
  • Each "chapter" file within the EPUB starts on a new page. There is a visible page break at the end of a chapter file when you're reading the e-book. Adobe assigns a new page number to the start of the chapter file.
  • The starting and ending points, and therefore the size, of any given page in any given EPUB e-book won't change as long as the file doesn't change.
  • All e-book rendering software that does Adobe page numbering should calculate the same number of pages and the same page starting and ending points. All NOOK software uses Adobe rendering software, and of course Adobe Digital Editions also does, so they all should agree on page numbering.

 

Distinguished Scribe
Ya_Ya
Posts: 3,334
Registered: ‎09-29-2010
0 Kudos

Re: EPUBs and page numbering - summarized

 

So, if I understand all of the above, any version of  "Ya Ya's Hypothetical Novel" should have the same number of pages, and any version of "Little Women" (my favorite book ever) will also have the same number of pages.  
It is not safe to assume that the proportional difference in pages between YYHN and LW reflects a proportional number of words, however, because the margin of error of that is +/- 30%.
Am I on the right track here, or am I completely confused?  (It's been too many years since my math minor, and, it's the end of a Friday...)

 

Doug_Pardee
Posts: 5,522
Kudos: 4,015
Registered: ‎03-09-2010
0 Kudos

Re: EPUBs and page numbering - summarized

[ Edited ]

No, the page numbers won't necessarily be the same for different versions. The calculations are on a file-by-file basis. Any copy of the original file should have the same page numbers, but even recreating the file by a Calibre EPUB-to-EPUB conversion can result in a shift.

 

However, for a given text with no changes in the material (from your example, Little Women), the differences should generally be small. Here are some things that contribute to this variability:

 

  1. How the text is broken up into "chapter" files. Each chapter file always starts on a new page.
  2. How much ZIP compression is applied to each chapter file.
  3. How close to a multiple of 1024 each compressed chapter file ends up being.

 

There are some hidden parts of the text itself that affect character-counting, and thus where the pages start and end within a chapter file:

 

  1. The HTML markup that's applied. The HTML tags are counted for the purposes of page numbering.
  2. Multi-byte characters. Some of the most common multi-byte characters are curly quotes (“ and ” instead of "), apostrophes (‘ instead of '), dashes (– and —), and the ellipsis (…). These count as one character but compress more like they're two.

 

Distinguished Scribe
Ya_Ya
Posts: 3,334
Registered: ‎09-29-2010
0 Kudos

Re: EPUBs and page numbering - summarized

[ Edited ]

At the end of the day, ePub "page numbers" aren't a reliable gauge of how long the book really is?

 

We want a word count.

Wordsmith
wordsandmelodies
Posts: 355
Registered: ‎08-07-2010
0 Kudos

Re: EPUBs and page numbering - summarized

So why don't they start giving us word counts, which OUGHT to be static, when we purchase the book?  No matter what they do, the words can't change.  It should be a pretty reliable indication of book length.  I don't care if they use an artificial means to determine pages remaining in the lower right hand corner.

Distinguished Correspondent
Saint_Nookolas
Posts: 153
Registered: ‎04-22-2010
0 Kudos

Re: EPUBs and page numbering - summarized

OK, Doug.  Here's the question I've been dying to ask someone for several months now: 

 

Due to variable page length displays, how does one properly cite text from an e-book?  

 

Has anyone seen any APA or other professional writing recommendations in this regard?  I don't know if they've caught up with technology yet, so if anyone knows otherwise or has any interim suggestions, I'd be interested in hearing them. 

 

Thanks!

Wordsmith
wordsandmelodies
Posts: 355
Registered: ‎08-07-2010
0 Kudos

Re: EPUBs and page numbering - summarized

Well, being a lazy student, I would have found a paper source, because I know how to cite that, and I had a couple English profs who would have gleefully jumped at the opportunity to mark off for the citations.  Probably each time I cited it "wrong".  *grins*  It's all much more humorous now that it's in the past.  But...  If pagination isn't stable and certain, you have to move to something that is.  Chapters and paragraphs are immutable, and I think you would would need to reference them.  But I would still expect markings on my paper to the effect of "You couldn't find a paper source?" because that would make citations and references so much easier.

Doug_Pardee
Posts: 5,522
Kudos: 4,015
Registered: ‎03-09-2010
0 Kudos

Citing e-books

[ Edited ]

Saint_Nookolas wrote:

 

Due to variable page length displays, how does one properly cite text from an e-book?  

 

Has anyone seen any APA or other professional writing recommendations in this regard?


As far as I can tell, APA's recommendation is to cite chapter (or heading) and paragraph numbers in the in-line citation.

 

Their example seems to be for a web page and uses a heading rather than a chapter number:

According to Smith (1997), ... (Mind over Matter section, para. 6).

 

APA's recommendations for electronic books in the reference list are, in decreasing order of preference:

  • cite a print version instead (if practical),
  • cite the "retrieved from" URL if the e-book is free online, or
  • cite the "available from" URL otherwise.

Their examples:

De Huff, E. W. (n.d.). Taytay’s tales: Traditional Pueblo Indian tales. Retrieved from http:...

 

Davis, J. (n.d.). Familiar birdsongs of the Northwest. Available from http:...

If there is a (difficult to obtain) associated print work, cite the publication date of the print work as the date of the e-book. [I think APA's date citation rules for e-books are totally inappropriate, but the APA didn't ask me.]

 

 

The above info was extracted from Purdue University's OWL web site. Look there for specific wording.

 

 

I've seen a number of discussions on citing electronic works, including web sites and e-books, but I don't think any kind of consensus has been achieved. Most e-books have chapter numbers, and those are unambiguous. A more specific location would probably have to be specified by searchable text, which is automatically provided in the case of a direct quote.

 

But there's a more fundamental problem: versioning. There's no general way to be assured that the version of the online document being retrieved is the same as the one that was cited. A related problem is that there's no general way to assure that a document will remain available at all, much less at the cited URL.

 

AlanNJ
Posts: 3,722
Topics: 64
Kudos: 1,518
Solutions: 0
Registered: ‎03-09-2010
0 Kudos

Re: Citing e-books

Sure.  Let's get a word count.  Now, how do we differentiate between 2-letter words and 10-letter words?  I think we should count letters to be truly exact.  And let's not forget punctuation marks while we're at it!

►Without order there is chaos◄
Distinguished Correspondent
Saint_Nookolas
Posts: 153
Registered: ‎04-22-2010
0 Kudos

Re: Citing e-books

Thanks for your research, Doug.  

 

It seems we're in agreement about the limitations of APA's recommendations. In the past, I have cited by URL availability for electronic content hosted on-line, but this method didn't seem suitable for actual e-books purchased/downloaded.  

 

DTB print would be most suitable for this purpose, of course, but many of the more obscure works can only be found electronically.  Standardization of pagination then becomes an issue across formats and versions.  Especially with multiple sites/projects/sellers offering various scans of the same work.

 

I'm inclined to concur with citing by chapter/paragraph, but frankly, I think this would be a terribly cumbersome process with an e-reader.  I was just thinking out loud to see if anyone else had any better suggestions.  To my knowledge, there does not seem to be a perfect solution (perhaps that is why APA has not yet addressed this specific issue).

 

Thanks again for your input!

Distinguished Scribe
Ya_Ya
Posts: 3,334
Registered: ‎09-29-2010
0 Kudos

Re: Citing e-books

 


AlanNJ wrote:

Sure.  Let's get a word count.  Now, how do we differentiate between 2-letter words and 10-letter words?  I think we should count letters to be truly exact.  And let's not forget punctuation marks while we're at it!


 

Way back when I took typing in junior high a "word" was 5 characters.  I assume that is still true and that that is how the word count would be/is calculated.

 

I really don't think it's unreasonable to want to know the approximate length of media you are condsidering purchasing.  Is it a dealbreaker for me?  No, but I, personally, like to know whether I'm buying The Long Walk or LOTR.

AlanNJ
Posts: 3,722
Topics: 64
Kudos: 1,518
Solutions: 0
Registered: ‎03-09-2010
0 Kudos

Re: Citing e-books

To me the value of a book is in the quality not necessarily the quantity of the words.

►Without order there is chaos◄
Wordsmith
wordsandmelodies
Posts: 355
Registered: ‎08-07-2010
0 Kudos

Re: Citing e-books

Lol, I understand what you're saying, AlanNJ, but I'm not sure that I'm pay $10 or $12 for a short story.   It could be a fabulous short story, but I'd still be trying to find it elsewhere.  I did, however, pay $14 for the LOTR trilogy all in one volume.  It was more than I would normally pay for an ebook (or any book), but since it was essentially 3 books in one, I thought "What a deal!"

AlanNJ
Posts: 3,722
Topics: 64
Kudos: 1,518
Solutions: 0
Registered: ‎03-09-2010
0 Kudos

Re: Citing e-books

I absolutely agree.  Quite honestly I cringe at paying more than $9.99 for a full e-book and I'm certainly not going to pay that for a short story no matter how good it is.  But all things created somewhat equal I'm not going to let word-count affect my decision too much either.  I don't want to see publishers getting the idea of charging based on word-count.  It's bad enough what they've done to e-books themselves.  Of course that's why I borrow e-books from the library.

►Without order there is chaos◄
Contributor
RobtDWilson
Posts: 19
Registered: ‎10-08-2010
0 Kudos

Re: EPUBs and page numbering

[ Edited ]

I recently used PubIt! To publish my first two novels. The instructions said I could just upload Word files and that it would accepts them. I found out that if I wanted page breaks for each new chapter I had to insert a section break at the end of the previous one.

 

I shared my books on the forum and was surprised when a reader "complained" about how short the book was. I attempted to view the book myself on my own Nook and found that the page count was extremely low. I had to click the page advance 8 times before the counter at the bottom actually rose to 2/242.  Since "242" seemed a very low page count, I figured I needed a better yardstick to measure how many "Nook pages" I actually had.

 

This solution directly deals with the "word count" issue mentioned earlier and satisfies the need of balancing long and short words. I counted the number of words per page [using Amasis small font] and found an average of 150. Then I took the total word count from the original Word documents: 94,378 / 150 = 629 Nook pages and 90,863 / 150 = 606 Nook pages.

 

If any of you know how to make the Word docs correctly translate the page numbering, then I'm all ears. But I do like the idea of starting to post the number of words in the synopsis of an e-book so the public knows what they are getting beforehand! [even if we just say "94K words" or "91K words" using my 2 books as examples.] I'm going to make that change to both pages later today!

 

Blessings,

Doug_Pardee
Posts: 5,522
Kudos: 4,015
Registered: ‎03-09-2010
0 Kudos

Re: EPUBs and page numbering


RobtDWilson wrote:

 

If any of you know how to make the Word docs correctly translate the page numbering, then I'm all ears.


As noted in the first posting in this thread, EPUB e-books—the only kind that PubIt! currently produces—don't have real pages. The page numbering is synthesized by Adobe EPUB-reading software and ends up with somewhere around 300-400 words per "page". That's pretty much in line with typical hardcovers.

 

Because of their smaller page size, mass-market paperbacks need more pages for the same number of words. I just checked a random page in the mass-market paperback copy of Bones and Ashes that I've got sitting on my desk, and it comes in at around 270 real words per page. Here on B&N, the hardcover of The Lost Symbol is listed at 528 pages and the mass-market paperback is at 656 pages, or about 25% more.

 

Let's see... The Girl with the Dragon Tattoo is listed at 465 pages in hardcover, 590 pages in trade paperback (+25%), and 672 pages in mass-market paperback (+45%). The EPUB version comes in at 421 "pages". Again, not far from the hardcover number (9% fewer pages).

 

Counting "NOOK pages" is probably a pointless exercise. There are many different combinations of font face and size, and each will give a different answer. Plus, B&N e-books aren't just for NOOKs. They're read on PCs, Macs, iPads, various cellphones, the iPod Touch, and other e-readers like the Pandigital Novel.

 

Contributor
RobtDWilson
Posts: 19
Registered: ‎10-08-2010
0 Kudos

Re: EPUBs and page numbering

Doug,

 

I agree that "Nook Pages" are useless. So B&N PubIt! aught to add a "Word Count" box to the book set up process and include the count on the Product Page for each of the books that use that system. I have added the count to both of my pages.

Bibliophile
bklvr896
Posts: 4,812
Registered: ‎12-31-2009
0 Kudos

Re: EPUBs and page numbering

Maybe it's just me, but the number of words doesn't really tell me anything in most cases.  I have no idea how many words a 200 or 400 or 600 page paperback or HC books, so I have no frame of reference. About the only time this might mean something to me is if said it was 500 words, some ridiculously small amount I'd get the idea it was a very short story, but once you get up into the thousands of words, I'm clueless.  Now maybe if I knew how thick the paperback version was, you know, 1/2", 1", 3" thick, that might help me. :smileyvery-happy: