4 Replies Latest reply on Apr 1, 2011 7:56 AM by froggles

    Houghton Mifflin to attack e-book scannos - with a script?


      Author Diane Duane reported today that her publisher, Houghton Mifflin Harcourt, will be working to remove the OCR scan bugs in her "Young Wizards" e-books. What strikes me, though, is that they're not going to use proofreaders. Instead, they're going to create a computer script to look for the most common scan errors.


      They figure this should only take them two months. :smileytongue:


      Okay, I get it that Duane's e-books aren't the only ones that Houghton will have to be fixing, and that Houghton is on shaky financial ground. Still... I maintain that computer error scanning is a good aid to human proofreading, but by itself is unacceptable for professional publication.


        • Re: Houghton Mifflin to attack e-book scannos - with a script?

          Watt r u cumplanein ubout Dug?:smileyvery-happy:

          • Re: Houghton Mifflin to attack e-book scannos - with a script?

            Did the article say that this would be their only proofreading, or is it just going to be their first proofread (with the second proofread being done by a human)?

              • Re: Houghton Mifflin to attack e-book scannos - with a script?

                The number of "edge cases" any such software would have to handle makes it horrendously impractical to handle by script. I've read books where the first letter in a chapter (or every paragraph, for that matter) is different; where the first word, line, sentence, or paragraph is bolded or italicized; where specific characters use different fonts when speaking; where dialogue has deliberate mis-spellings; and so forth and so on. Trying to script to recognize and deal with all the varieties in the wide world of books is a recipe for a never-ending project.


                Some day there may be a powerful enough set of software tools out there to quickly recognize all the potential trouble-areas, flag them for a human reviewer, and provide an easy set of solutions to select from which will then be consistently applied... but that day is not coming any time soon (certainly not in two months!).

                  • Re: Houghton Mifflin to attack e-book scannos - with a script?

                    I've got old OCR software at work that we've used on scanned poor quality photocopies (author wants to revise work, has no e-copy of her article because it was written in another era).  The software picks up everything and I have the choice to ignore or make a correction.  Some of these texts have a lot of technical challenges, but working together, human and software can make very rapid progress.  Alone, the software's ideas of how to correct some items are laughable.  Alone, the human reads right past things because he or she knows how they should read.  Together, we're not such a bad team!  I really can't imagine approaching the task any other way.