Many archivists refer to eXtensible Markup Language (XML) as the “acid-free paper of the digital age” because it is platform-independent and non-proprietary. Nearly all current development in descriptive text markup is XML-based, and far more tools (many of them open-source) are available for handling XML data than were ever available for SGML. Future versions of TEI will be XML-only, and XML search engines are maturing quickly. XML and more particularly, the TEI implementation of XML, has become the de facto standard for serious humanities computing projects. XML allows editors to determine which aspects of a text are of interest to their project and to “tag” them, or label them with markup. For example, at the Whitman Archive, we tag structural features of the manuscripts, such as line breaks and stanza breaks, and also the revisions that Whitman made to the manuscripts, as when he deleted a word and replaced it with another. Our markup includes more information than would be evident to a casual reader. A stylesheet, written in Extensible Stylesheet Language Transformations (XSLT), processes our XML files into reader-friendly HTML that users see when they visit our site. A crucial benefit of XML is that it allows for flexibility, and the application of XSLT allows data entered once to be transformed in various ways for differing outputs. So while some of the information that we have embedded in our tagging is not evident in the HTML display, later, if we decide that the information is valuable to readers, we can revise our stylesheet to include it in the HTML display.
It is well known that modern generalized markup systems, such as SGML and XML, evolved from the ‘presentational’ markup contained in early digital documents intended for printing (Goldfarb, 1996, 1997). XML, whose specification was co-authored by a humanist, Michael Sperberg Mc-Queen (Bray et al., 1998), is used as a metalanguage to define most of the actual markup languages in use today.