Scholarly Citation in a Digital World: Some Thoughts

cyocum on 2009-02-10T20:35:06

Amazon has released the newest version of the Kindle. This event has caused me to re-evaluate the relationship of Humanities scholarship and its most basic and technical part, to wit, citation. Citation is the bedrock upon which scholars in the Humanities (and the Sciences) build upon and comment on each other's work in publication. It is also the source of much of scholarship's tedium. While thinking about the Kindle and the way in which the system works (it seems that the inner format is a limited form of HTML), I came to the conclusion that citation will become a great battle ground in the future of scholarship. The fundamental problem is, simply stated, this: HTML does not guarantee the placement of a particular piece of text anywhere in the document or on the screen.

The foundation of citation is the page number or, even, the concept of the page. This is, of course, taken from the idea of a book. This, however, does not hold within the realm of markup languages. The renderer of a markup page is generally allowed great freedom to present information on the computer screen. As the idea of a page is broken down by this and the fact that the Kindle and other ebook readers do not have a standard page size, the technicalities of citation in scholarship will take greater precidence than before.

To forestall much criticism, the idea of the document fragment as part of the SGML and HTML standards is not specific enough of a tool for what it is worth. This is especially true with handcrafted HTML which is created by a not so technically inclined scholar who might forget to put the appropriate anchor tags in the proper places in their text. In terms of XHTML, this might be overcome by the use of the "id" attribute and XPath to create a new form of link that would allow a reader to link directly to a specific paragraph by its "id" attribute. In more general terms, the XML standard XLink, which is sadly not widely implemented, might allow this. In terms of PDF, as it is an electronic facsimile of a book, the concept of the page is still useful in citation and does not cause much difficultly in this regard.

Whither then the citation? While some citation book have added electronic citation to their standards, they tend to use full URLs and date accessed (see MLA for an example). This is inadequate when many in the developed world are moving to devices like the Kindle and the concept of a page becomes much more nebulous. This also effects the citation of electronic resources like arxiv.org, which I feel is the future model for the online scholarly journals. While use of PDF with its page numbers may be satisfactory in the PDF realm, when scholarly publishers move to more fluid models of text presentation and digital only publication, the situation in the citation of these resources will be difficult. One way around this would be the ubiquitous use of DOI, but this does not solve the underlying issue of how to cite specific parts of the text. In the end, there are no easy answers but a discussion must take place and a forum created where solutions can be proposed.

I would be very happy to hear any ideas about how one might solve this puzzle.


fixed width

gizmo_mathboy on 2009-02-10T20:44:12

I think you might look at how laws are dealt with in a digital world.

I think they are specified with a fixed font/format/line width.

While not citation based laws have had to deal with changes in the law and how to track them without the aid of electronic tools (diff/patch/svn/etc).

Now, is this the best way to handle a pile of bits that can be rendered any possible way? Probably not but you can easily specify a font type, font size and page width with CSS. This should give a repeatable layout that you can then use to cite, line 247, word 15, etc.

Simple almost always wins out and it is probably easier to implement, too.

While it would be cool to require something like id's or something it would be a nightmare to write the spec and then deal with changes and such.

Re:fixed width

cyocum on 2009-02-12T19:47:52

While this might be a way of dealing with it, it seems rather ham-handed. Having all the publishers to agree to this will be just as bad as updating and maintaining any other standard. If one goes out of line and decides to be different, it all goes out the window.

It's not just the Kindle

brian_d_foy on 2009-02-12T01:25:07

The non-digital world has the same problem. The concept of folios, pages, lines, and so on are tied to the particular presentation. Change the form, such as from hardcover to paperback, and the numbers change. That's why citations contain so much information: you have to exactly specify not only the information but the presentation.

ISBNs don't even solve this problem because different printings of the same material in the same form can be slightly different.

Re:It's not just the Kindle

cyocum on 2009-02-12T19:50:44

Indeed. I think this becomes more pronounced in a digital arena. It becomes even more difficult to pin down what kind of abstraction of the material would facilitate citation.