layout text
layout text
layout text
layout text
layout text layout text
layout text

Technical Introduction. (Arianna Ciula, Paul Spence)

layout text layout text layout text layout text
layout text layout text layout text layout text
layout text
Document Contents
TEI Encoding Model
Benefits of using XML
Focus on Integration
layout text layout text
layout text layout text layout text layout text

The ASChart (Anglo-Saxon Charters) project involved research by a team of Anglo Saxon and Digital Humanities specialists including Alex Burghart (History Department) and Arianna Ciula (Centre for Computing in the Humanities), who led the technical research in the project. The project created an on-line publication of the set of charters dated earlier than A.D. 900. The project’s main research aim was to explore the extent to which a text encoding approach could provide new ways of interrogating Anglo-Saxon charters.

TEI Encoding Model

The encoding model was developed using a customized version of the P4 Guidelines of the Text Encoding Initiative (TEI). TEI is a major and long-standing (since 1989) international scholarly standards initiative, and its Guidelines offer the potential to develop and establish an ideal standard for encoding text-based materials on medieval charter projects with a number of different objectives. Its flexible yet highly customizable grammar seems to allow us to deal with challenges as varied as the representation of diplomatic discourse, the expression of uncertainty and attribution of authority in the interpretation of a source, the description of complex linguistic phenomena, the recording of prosopographical and topographic information, or the encoding of physical aspects of the charters as archival objects.

TEI enables interchange at a general level, but some scholarly communities have found it useful to create customizations specific to particular fields of enquiry in order to facilitate disciplinary and project-based interconnections, and the extensible architecture of the TEI was designed with these kinds of customizations in mind.

The encoding model developed for ASChart covered general metadata about the charters – including a brief description of their contents and information about the archive to which each belonged – as well as more specific information pertaining to the structure and semantics of the charters themselves. The markup included elements of Anglo-Saxon charters that are characteristic of their diplomatic discourse such as invocation, proem, dispositive word, bound, curse, dating clause, promulgation place, witness list, names in general, name of donor and name(s) of beneficiary(ies).

Sample of TEI XML file for one of the charters in ASChart.
Sample of TEI XML file for one of the charters in ASChart.

The result is the digital publication published in 2005 and available on this website, which not only gives access to the charters themselves by ‘Sawyer number’ (a field-specific identification system as explained in more detail later) and archive but also provides six diplomatic indexes, alternative representations of the charters based on the diplomatic markup, links from the diplomatic representations back to the diplomatic indexes, a feedback facility and tightly-bound connections to material from another project, the Prosopography of Anglo Saxon-England (PASE) database.

Graph that summaries the structure of the ASChart web resource.
Graph that summaries the structure of the ASChart web resource.

Benefits of using XML

In building the digital resource Centre for Computing in the Humanities (CCH) made use of its own electronic publishing suite, known as xMod , which uses standards-based technologies such as XSLT (eXtensible Stylesheet Language Transformations), a language which enables us to transform the source XML materials into the HTML web pages that underpin the website. Since one of the basic premises of XML is the separation of content from presentation, we are able to produce a potentially infinite number of different representations from the same source XML, meaning that we only need to maintain the information in one place (the principle of single-source publishing) and future iterations of the website could in theory drastically change the presentation of the material without any need to edit the original documents.

These are fairly generally understood digital publication advantages of using XML. There have also been two other important benefits, more closely related to the specific scholarly research objectives.

The first is the production of the indices, where the fact that the texts were marked up at a fine level of detail meant that individual components could be easily extracted, and in some cases grouped into categories in the diplomatic indices. So far the resource includes indices by invocation, proem, dating clause - organised by type, dispositive word, curse, and place of promulgation. In general, these indices are simply ordered alphabetically, but in some cases an intermediate stage of classification of the occurrences of a particular textual component has been considered interesting or necessary in the generation of the diplomatic index.

To provide an example, the index of curses or clauses of warning is also organized by archive, so as to make possible an easy contextualisation of common formulas regarding threats of malediction or anathema and, eventually, to support the study of their evolution and authenticity. On the other hand, the series of dating clauses as expressed in the charters has been collected in an intermediate XML file, where the dating clauses have been associated with different and non-exclusive types of calendar expression, so as to generate an index arranged by type of date: Anno Dionysii, Anno Domini, Days and months, Episcopal Dating, Indiction and Regnal year.

Whereas the first benefit relates to the re-organisation of material according to the encoding applied, the second benefit has to do with its visualisation.

Visualisation of the diplomatic discourse

Given the importance of the components chosen for diplomatic analysis and encoding, the project produced three alternative visualisations of the charters themselves, which in one display consisted of discrete sections of the charter being colour-coded (invocations in blue, proems in red etc).2 In many projects, the actual encoding of a text may drive indices and search functions, but semantic interpretation in the markup is not usually highlighted overtly in the final digital publication. In the case of ASChart, a clear visualisation of the encoding is provided, and so the user is able to access some of the interpretative layers in the underlying markup, without having to read the sometimes rather complex XML code behind it.

This visualisation of course responds to what the encoder or the editor has considered to be key structural parts of the text and so deals with a visual representation of selected elements of which the scholar or user may already possess a mental image, or, even more interestingly, it may represent a new and unusual visual representation of the text generated by the encoding process and therefore open to further interrogation. It is interesting - although perhaps not surprising - to note that in describing the process of traditional diplomatic criticism carried out by archivists, Duranti’s words recall the same issues the displayed markup has raised for us:

The diplomatic criticism […] may seem a sterile exercise of identification and “labelling.” However, the exercise itself is the key to an understanding of the action in which the document participates, and of the document itself. The names on the labels are indicators which direct attention to the entities which are relevant to the continuous process of extrapolation by the archivist. The effort of including the elements of real documents into the framework of diplomatic analysis is a necessary prelude to discovery and knowledge. […]
Diplomatics is a mind-set, an approach, a perspective, a systematic way of thinking about archival documents.3

Example of the visualisation of the markup in ASChart.
Example of the visualisation of the markup in ASChart.

Focus on Integration

In concluding our description of the markup it is worth saying that the encoding model could certainly have been elaborated further, so as to tag other elements such as locations, attestation types, biblical quotations or allusions, and other stock phrases of interest (see a more comprehensive list).

Moreover, the indices could have been complemented by semantically-aware search facilities which allowed the user to search on different combinations of the elements mentioned above. In addition, the images of the charters, or at least of the single-sheet, could have been included or referred to.

However, the core focus of this pilot project was specifically to show the potential of XML markup for diplomatic analysis and to enrich the resource by establishing connections with different projects built around the same primary sources.

CCH is currently involved in over forty digital projects, which cross a range of humanities disciplines and technical models, and at least five of these projects involve Anglo-Saxon research interests. Even though there is considerable variation in the technologies used, we have aimed to use standards as far as possible in order to make it easier to establish connections between projects. At present, there are links to just one other Anglo-Saxon project called PASE, but we will also describe other project connections that might take advantage of the encoding work carried out on ASChart.

Integration with the Prosopography of Anglo-Saxon England database

At the heart of our research into the possibility of integrating material on different project lies the desire to make the scholarly research itself as transparent as possible, so that projects may share their findings, allowing users to switch seamlessly from one resource to another, while making clear the boundaries, limitations and contexts of each.

In the ASChart pilot publication there are two different types of connection from the website to another project, and they are both to PASE, the Prosopography of Anglo-Saxon England database. PASE is designed to give "access to structured information on all of the recorded inhabitants of Anglo-Saxon England from the late sixth to the end of the eleventh century" (see the Prosopograpy of Anglo-Saxon England publication about page; for more details visit the PASE digital strategy page.

We chose PASE for this experiment because the project had just been completed and because it provides a wealth of information about persons who are recorded or referred to in a number of sources, including the Anglo-Saxon charters, by documenting assertions that sources make about these people. This allowed us to provide links from the individual charters in ASChart to the corresponding source information in the PASE database.

We were significantly helped by a robust system of identifiers already used in Anglo-Saxon scholarship, namely the so-called Sawyer numbers, conceived by Peter Sawyer in his Annotated List and Bibliography of 1968 to list Anglo-Saxon charters. Whilst the majority of digital humanities projects dealing with extant documents are required to invent identification systems, we were able to make use of these conventional identifiers - in the format capital “S” plus digit - as common denominator and created an automated process to link each charter page in ASchart to the relevant charter page in PASE, where, besides the richness of prosopographical information on each individual mentioned in the corresponding charter, the ordering of witness lists and other relevant details of transactions are available, allowing the end user to go as far as to trace the ownership of estates over time.

Example of a charter page in PASE.
Example of a charter page in PASE.

Issues of Integration and Interoperability with other digital publications

Other kinds of integration could potentially be achieved, starting from the integration of ASChart with the most natural candidate: the Revised Catalogue of Anglo-Saxon Charters or Electronic Sawyer . The metadata currently associated with the ASChart are relatively simple and should ideally be revised. Therefore, by taking advantage of the wealth of scholarly information that already exists in the Electronic Sawyer project, we could provide scholars with more information about the charters. This includes specifying the system of date ranges for the charters, extending the bibliographical information considerably, as well as the associations to the correspondent manuscripts descriptions and, eventually, to the images.

This multi-project scenario not only avoids the need to duplicate effort - which would introduce new layers of human error with no tangible benefit – but also allows different projects to focus on their core research objectives while sharing the benefits of other research.

Sample page for the Electronic Sawyer.
Sample page for the Electronic Sawyer.

Another candidate for integration is The Language of Landscape: Reading the Anglo-Saxon Countryside (LangScape), a project which focuses on Anglo-Saxon charter boundary clauses:

Sample of TEI XML file for one of the charter bounds in ASChart.
Sample of TEI XML file for one of the charter bounds in ASChart.

Sample of TEI XML file for the same charter bound in LangScape.
Sample of TEI XML file for the same charter bound in LangScape.

LangScape is a project which combines relational database and XML technologies. Text markup is applied - using the new P5 release of TEI - to the boundary clauses, which are in turn linked to a glossary in the database containing linguistic and more general semantic classification. This potentially requires technical improvements to the ASChart encoding model to bring it in line with LangScape and to allow for integration of the rich markup on the two projects. 4

We do not intend to go into technical details here, but the P5 release of TEI (November 2007) brings significant improvements for text encoding projects, including a robust and flexible new system which makes it much easier to tailor encoding models to the specific needs of a project, while providing mechanisms to specify allowed content in given text fields via a datatype mechanism. The underlying ODD (One Document Does it all) language also facilitates the creation of document schemas and tailored documentation as part of an integrated process.

The connections could be bidirectional, or could be specifically tuned to reflect the navigational route chosen by the user.

Looking ahead, the integration of material from projects built around Anglo-Saxon sources in general, and around charters in particular, could offer a number of different perspectives on the same historical evidence: a diplomatic approach based on the structural and formulaic components of the charter texts, an archival perspective relying on the source description and its correspondent updated bibliography, a prosopographical perspective on individuals - and, possibly, on the locations - mentioned in the sources, a linguistic and geographical perspective focused on the text analysis of the boundary clauses, and so on.

Indeed, connecting data from different projects not only provides technical challenges, but also involves complex analysis of the relationship between different intellectual models. The fact that the electronic text of a charter that survives in multiple copies does not contain an apparatus or does not include the physical description of the relevant documents may be satisfactory for one project but extremely unsatisfactory for another one. Not only can the relevance to different types of data be a problem if not properly contextualized in the strategy of a specific project, but the contrasting interpretation related to the same data may also be an issue. We know for a fact that subsequent editions of the same text bring about changes, raising questions of interpretation and correctness, not to mention the scenarios where new sources shed light on new discoveries or where forgeries are uncovered.

In spite of the challenges, we believe that the practice of seeking deeper connections between different types of digital scholarship will be an important part of future research projects. In this spirit of broader academic and technical collaboration is the development of the Charter Encoding Initiative (CEI). Indeed, the desirability is of not only using technical standards but also developing common frameworks, including markup guidelines and procedures tailored to the needs of a much wider range of medieval charter projects.

Footnotes
layout text
1. layout text This is an extract from the paper by Arianna Ciula and Paul Spence, The Anglo-Saxon charters pilot project at the Centre for Computing in the Humanities. Digital Diplomatics (Munich, Ludwig-Maximilians-Universität München - Germany, 28 February - 2 March 2007); see abstract.
2. layout text The visual interface work was carried out by Paul Vetch.
3. layout text L. Duranti, Diplomatics: New Uses for an Old Science (The Scarecrow Press, Inc, Lanham, Maryland, and London, 1998) p.158.
4. layout text The technical aspects of this integration have been explored, implemented and presented in a poster at the TEI Members Meeting in November 2007. See A. CIULA/E. PIERAZZO, Usage of TEI P5 for data interchange between projects: the Anglo-Saxon charters case study, poster accepted at the TEI Members Meeting 2007, TEI@20: 20 Years of Supporting the Digital Humanities, University of Maryland, 1-2 November, 2007. See abstract available.
layout text layout text
layout text layout text
layout text
layout text layout text