Interesting interview with a compulsive book uploader and downloader. Lots of interesting details, but I found the following most striking:
TM: How long does it take you to scan a physical book?
TRC: The scanning process takes about 1 hour per 100 scans. Mass market paperbacks can be scanned two pages at a time flat on the scanner bed, while large trades and hardcovers usually need to be scanned one page at a time. I’m sure that some of the more hardcore scanners disassemble the book and run it through an automatic feeder or something, but I prefer the manual approach because I’d like to save the book, and don’t want to invest in the tools. Usually I can scan a book while watching a movie or two.
Once scanned, the output needs to be OCR’d – this is a fairly quick process using a tool like ABBYY FineReader.
The final step is the longest and most grueling. I’ve spent anywhere from 5 to 40 hours proofing the OCR output, depending on the size of the book and the quality of type in the original. This can be done in your OCR tool side-by-side with the scan of the original image or separately in your final output type (RTF, DOC, HTML, etc.). If there are few errors on the first few pages of text my preference is to proof in RTF, otherwise I do the proof within Finereader itself.
That’s a lot of time.