The Walt Whitman Archive – Greene 2
By
Mark A. Greene
January 2011
2What are its weaknesses? What do you wish it would let you do? What changes would you suggest?
Mark A. Greene
Director, American Heritage Center – University of WyomingPast President – Society of American Archivists
¶ 1 Leave a comment on paragraph 1 0 The weaknesses I have noticed in the Archive are not unique to this site. One problem is the site’s misunderstanding—and thus its improper assertion—of copyrights.[ref]I am a founding member of the Society of American Archivists’ Intellectual Property Working Group.[/ref] Specifically the site asserts the existence of copyright in the transcripts and facsimiles and more broadly in Whitman’s writings—because transcripts and facsimiles are “slavish copies” neither is entitled to copyright protection;[ref]The term is from the U.S. Second District Court of New York’s decision in Bridgeman Art Library Ltd v. Corel Corporation, 1998.[/ref]published or unpublished anything written by Whitman is in the public domain.[ref]See “Conditions of Use,” TWWA. Unpublished works are public domain for authors deceased prior to 1940; published works for everything published prior to 1923. For an authoritative source, see Peter Hirtle, “Copyright Term and the Public Domain in the United States: 1 January 2010,” online at http://copyright.cornell.edu/resources/publicdomain.cfm.[/ref] Only original contributions of editors are copyrightable at this late date.[ref]Such mistakes are not unique to The Walt Whitman Archive, certainly. Copyright statements for the In the Valley of the Shadow Civil War project (http://valley.lib.virginia.edu/VoS/usingvalley/copyright.html) and the Journals of the Lewis and Clark Expedition site (http://lewisandclarkjournals.unl.edu/read/?_xmlsrc=lc.privacy.xml&_xslsrc=LCstyles.xsl) are much more egregious.[/ref] What distresses me about the inaccuracies is they suggest that there are limits to use of the main site content by researchers—notwithstanding frequent assertions that the purpose of the archive is “to make Whitman’s vast work electronically accessible” to all.[ref]Katherine L. Walter and Kenneth M. Price, “An Online Guide to Walt Whitman’s Dispersed Manuscripts,” TWWA (2004).[/ref] Making the archive accessible but incorrectly claiming limitations on actual use is paradoxical at best.
¶ 2 Leave a comment on paragraph 2 0 There are other issues causing me unease, for example the inadequacy of site use data. The site does not regularly report use statistics, and what statistics it does provide—average daily hits in 2007[ref]Ed Folsom, “Database as Genre: The Epic Transformation of Archives,” TWWA (2007).[/ref]–are of dubious value even when current.[ref]Google Analytics: “the number of hits a website receives is not a valid popularity gauge, but rather is an indication of server use and loading” (http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=33026). Better measures are unique visitors or page views, coupled with visits’ duration.[/ref] In addition there is little similar data at similar websites to compare to (though to their credit one of the site editors did gather some parallel data in 2008).[ref]The data, from the Willa Cather Archive and the Journals of the Lewis and Clark Expedition, is presented in Kenneth M. Price, “Electronic Scholarly Editions,” TWWA (2008).[/ref] Thus this is an endemic weakness of major “archive” sites, a weakness related to a broader disinterest in the economics and sustainability of such sites.[ref]And of bricks and mortar archival operations, though this is beginning to change, as suggested by the questions—“Does the level of onsite use of special collections justify the resources being expended? What are the most appropriate measures by which to evaluate use?”—in Jackie M. Dooley and Katherine Luce, Taking Our Pulse: The OCLC Research Survey of Special Collections and Archives (OCLC, 2010), 34-36, online at http://www.oclc.org/research/publications/library/2010/2010-11.pdf.[/ref] The greater viability of websites such as The Walt Whitman Archive is ultimately the $64,000 question and the focus of my essay for Part II of this roundtable.
I want to make sure I understand where this concern over site use data is coming from. Is your point that “expensive” sites like The Whitman Archive–expensive to its creators, not its users–need to justify their cost by foregrounding how much use they are getting. In your footnote, you quote Dooley and Luce, “Does the level of onsite use of special collections justify the resources being expended? What are the most appropriate measures by which to evaluate use?” which leads me to believe that this, indeed, is your point. It’s not entirely clear to me, though, that this is your point from the text of paragraph 3 itself.
Whether expensive or inexpensive I do believe that repositories, analog or digital, must begin to address cost/benefit questions and make some form of such information transparent to their constituents. That is, to respond more directly to Ed’s comment, I do believe archives must begin to “foreground” their use and their costs.
I’d like you to expand on this point a little more. Is the point you’re making that sites like The Whitman Archive have no claim to copyright? That they shouldn’t expect users to properly credit the Archive when they use it to cite a bit of Whitman’s poetry that they found there? I’d like to see a distinction between (or a clarification of) the legal claim that a site like The Whitman Archive has to copyright its intellectual labor and the ethical claim that the Archive has to ask users to be honest in their scholarship, for example, that when they’re citing a passage from the 1855 Leaves of Grass it’s because they read the Folsom and Price electronic edition rather than the actual edition in a library’s special collections. Does that make sense? I’ve always read the “Conditions of Use” page on The Whitman Archive as a request for ethical citation practices rather than as a legal claim to copyright, but that’s probably because I’m not well versed in copyright law. Either way, I think there’s a distinction to be drawn between the legalities of using the material on the Archive and the ethics of it.
My concern is solely with the legal parameters of copyright, not with the ethical/moral requirements for accurate source citation, and I apologize for any confusion I’ve created on that point. Let me be as clear as possible that researchers have a strict obligation to provide citations to the Archive copy of published or unpublished work, whether in or out of copyright. Such a citation should, indeed, include not only the “place” where the copy was accessed (i.e., the Archive), but also, when applicable, the repository where the physical original resides. This is little different than the expectations for citing microfilmed copies, though many of today’s researchers would be too young to have learned those expectations.
Actually the editors of the Whitman Archive don’t see the copyright issues in the way you have said we do. Our first frequently asked question in the “About the Archive” section is:
>Can I have permission to quote a Whitman poem found on your site?
>Whitman’s writings are out of copyright. So, yes, you can quote his words without seeking permission from anyone.
>Editorial introductions, annotations, the biography, and the criticism found on this site are under copyright. You can certainly quote from these items within the “fair use” limits of copyright law. If you wish to make more extensive use of these materials, contact the directors, Ed Folsom <Ed-Folsom@uiowa.edu> or Ken Price <kprice@unlnotes.unl.edu>.
We are not experts in copyright law, and it is possible that we have articulated our position inadequately elsewhere on the site, in particular in the “Conditions of Use” section. Specific pointers for better language would be welcome. With regard to images (of, say, poetry manuscripts), we have had to adhere to the wishes of the repositories owning the originals. Some of them have been quite insistent that in giving us the right to publish an image they are not giving us the right to pass a high resolution image on to other people.
With regard to the transcription of Whitman’s words: we make no claim to copyright on the HTML rendering of Whitman’s words as displayed on the site. Underlying the HTML, as you know, is XML, and that XML is a combination of transcription and interpretive markup. We do regard the XML markup as our intellectual labor and claim ownership of it. Even as we claim the rights to that markup, we distribute it via a Creative Commons license and invite scholars and other interested users to download the markup and change it or improve upon it as they see fit. They are only required to note the original source of their work.
Ken, I must apologize for having overlooked the FAQ and stand corrected on that most important point. Your generosity in placing the XML under a Creative Commons license is something other sites should be encouraged to emulate. –Mark
The editors of the Whitman Archive would love to hear you elaborate on the point you are making here. When we call up site statistics during our annual planning meeting, people are often intrigued by the distribution of users and by spikes in interest at various times. No one has suggested that we make these site statistics public and perhaps doing so might be regarded as a kind of vanity. Nonetheless, I can also see why it might be useful for people wanting to study the impact of electronic editions. One of the issues I’d like to get clarification on is the implication that we have an obligation to provide this information. If so, where does that obligation come from? Do digital projects have an obligation that print publishers do not? For example, if Yale University Press publishes a book in my field, they don’t ordinarily let me know if they have readers in Singapore or if they sell 8, 80, or 80,000 copies.
I’m not sure where to start here. In part the “obligation” comes from “the editors” own expressed concerns about the sustainability of the site. If sustainability is a valid concern, then both costs and benefits must be considered. While the editors of this and other sites might wish to do such considering in private, it seems to me that when a project is sustained by large amounts of public funds—grants and other—there is an obligation to be accountable to the public as well. As to whether digital projects have an obligation that print publishers don’t, I would suggest that it is in part the unwillingness or inability to confront such questions has more than a little to do with the demise of so many academic presses in the last 20 years—they were economically unsustainable. Yale’s press is private, and has little obligation to be accountable publicly; I feel differently about U Kansas Press and other publicly funded publishers—if they are selling only 8 copies of some of their books they’re making poor choices in the allocation of public funds. Yes, I do believe there is an increasing moral obligation to be transparent about the deployment of such funds. But no, I don’t fully walk the walk either. My own repository does publish a budget every year, but we do not yet publish our site statistics, partly because the stats produced by our university’s preferred program are not very useful and we have not yet deployed a different program. We do, however, publish many of our other “production” and “use” statistics—how many collections and cubic feet we process, how many researchers we serve, etc. Website use is merely one more metric.
I wouldn’t say that stats have to be public in the sense of available via the website, but I would say that you should keep stats for traffic for a couple of years if possible. We found when we were doing research on use of DH resources for the LAIRAH project that it could be very hard to get access to log data. So we recommended that projects should keep raw logs if possible, or if not traffic reports, so that if they are requested by researchers they may be studied. I think that this answers the question of being accountable for public funding, without placing too huge a burden on projects themselves.