Brian Lavoie and Lorcan Dempsey, Beyond 1923: Characteristics of Potentially In-copyright Print Books in Library Collections, D-Lib Magazine, November/December 2009, tries to give some tentative answers about the shape of the elephant. From the introduction:
The analysis that follows examines the characteristics of US-published print books, with an emphasis on books that are likely in copyright according to US copyright law. As with our earlier article, the analysis is based on data from the WorldCat database, which represents the aggregated collections of more than 70,000 libraries worldwide. The analysis focuses on three areas: the WorldCat aggregate collection of US-published print books; the subset of this collection published during or after 1923 - i.e., those potentially associated with copyright and/or orphan works issues; and the combined print book collection of three academic research library participants in Google Books - again, with an emphasis on materials that are potentially in copyright.
Lots of detailed tables on dates, authors, genres, audience level follow. From the conclusion:
This article characterizes the aggregate collection of US-published print books in WorldCat, with a special emphasis on materials published during or after 1923, and therefore either potentially or definitely in copyright. Findings from the analysis indicate that the collection of US-published print books in WorldCat is quite large, encompassing about 15.5 million print books. Nearly two-thirds of these - those published after 1963 - have a high likelihood of being in copyright; less than 15 percent - those published prior to 1923 - are almost certainly in the public domain, with the rest - those published between 1923 and 1963 - potentially in copyright if copyright was renewed. The post-1923 materials collectively account for more than 80 percent, or about 12.6 million, of the US-published print books in WorldCat. It is difficult to predict how many of these print books might be orphan works, but even a small fraction would, in terms of absolute numbers, be considerable, and require a substantial effort to investigate and clear copyright. One study, based on an examination of a random sample of books, estimates a cost of approximately $200 for each title for which digitization and access permissions were obtained.