This week, we take a closer look at the HathiTrust Digital Library. This collection is likely the most oriented towards academic researchers, largely because it was the product of 13 universities that made up the Committee on Institutional Cooperation (renamed the Big Ten Academic Alliance last year) and the University of California.
The Trust began in 2008 as the result of the digitization of “orphan books,” which started in 2004 by the Google Books Library Project and now consists of a partnership of 60 research libraries located in Canada, Europe and the U.S. (See www.hathitrust.org/community). The University of Michigan currently provides the infrastructure on which the digital content resides. The collection includes 15 million volumes, of which about half are books. Of those 7.5 million books, 5.8 million are in the public domain.
A fairly detailed piece on this collection appeared in 2015 as a blog post on “The Scholarly Kitchen,” the official blog of “The Society of Scholarly Publishing.”
I imagine many NSR readers are familiar with the multi-year copyright battle that ensued from Google’s activities. The Author’s Guild sued Google in 2005, and the case was not decided until 2013 (in Google’s favor). That decision was appealed and reaffirmed in Google’s favor in 2015 (see Wkipedia entry on the lawsuit). Along the way, in 2011, the Guild also sued HathiTrust for massive copyright infringement. As the Trust began as a spin-off of Google Books, both cases pitted allegations of copyright infringement against the doctrine of Fair Use. The suit against the Trust was decided (in its favor) in 2014.
Searching is intuitive and straightforward, as might be expected of an e-library designed by librarians. A useful feature for librarians, and others oriented to classification, is the Library’s creation of Zephir, a bibliographic metadata management system created in 2013. This data export system is available to libraries that partner with the Trust (i.e., pay a fee). The details of bib metadata sharing (which—having dipped my toe into tech services along the way—I found interesting are at https://www.hathitrust.org/zephir).
The ability to download items in the public domain are restricted to those affiliated with partnering institutions (i.e., students, faculty, and staff). The site states:
Users affiliated with HathiTrust partner institutions are able to download full-PDFs of all public domain works, and works made available in under Creative Commons licenses. Users who are not affiliated with HathiTrust partner institutions can download single-page PDFs of all public domain works, full-PDFs of works made available under Creative Commons licenses, and full-PDFs of public domain works that are not subject to third-party agreements (see “Why isn’t full-PDF download publicly available for all viewable items” here). There is significant overlap of volumes in HathiTrust and Google Book Search and if a book is “full view” in HathiTrust, it is possible that a PDF of the entire book can be downloaded from Google Book Search. Note that logging in through a Friend account does not enable full-PDF download of Google-digitized materials.
In essence, ‘third-party agreements’ limit the ability to download items both as PDFs or in EPUB format if one is not affiliated with a partner university. However, among participating institutions, downloading eligible material is straightforward.
The mission of HathiTrust is to contribute to both the preservation and sharing of research and scholarship. In this they have abundantly succeeded; the Trust is now a major source for providing scholarship worldwide. Oh, one last thing: As a reference librarian, I couldn’t let this question go by: What does “hathi” mean? Glad you asked: It’s Hindi for ‘elephant’, and was chosen because of the animal’s prodigious memory. ‘Hathi’ is also the name Kipling gave his elephant character in The Jungle Book (1894) and The Second Jungle Book (1895).
See more Ari Sigal’s Free Content Alerts here.