ASA Home  About the ASA  Structure  Membership  Events  Contacts
  Publications  Directory of Archives  Listserve  Links  Site map
ASA Logo

Australian Society of Archivists
1999 Conference

Heritage and the Internet - Encoding Context Objects: Using Knowledge to Reduce Risks

Gavan McCarthy


Introduction

The documentation, management and use of contextual information has emerged as one of the critical issues of the World Wide Web in the late twentieth century. Contextual information is that extra, associated, related, assumed and perhaps a priori information or knowledge that is required to meaningfully interpret the content of any given information source. The Web, as it has evolved over the last five or six years, bridges the boundaries of space and time like no other medium yet developed. The printing press of the Middle Ages, the universal postal system of the nineteenth century, the telegraph and telephony systems also originating in the nineteenth century, radio and television in the early to mid twentieth century, all represent important new media that have shaped our information environment. Is the Web unique in the challenges it poses to the credibility and authority of content or is this history repeating itself? Were there the same issues being raised with the introduction of those earlier information technologies? Unfortunately, those fascinating questions are not the topic of this paper, but they do provide the beginnings of a “contextual” framework in which I will develop the content of this presentation.

Context is multi-faceted, dynamic, highly variable and complex beyond our ability to build systems to totally contain it. Given that the effective documentation of context will therefore involve compromise, the questions facing us are:

  • What are the most useful elements? and
  • What do we need to know about them?

This paper is about some of the more pragmatic work, underway at the moment, on the structuring and digital encoding of information surrogates for people and organisations as the defining elements of context. The deeper philosophical aspects and recursive structure of the relationships between content and context, as they apply to all forms of meaningful cultural transmission, forms the foundation for this research. Context, like content, is essentially a product of the action of people and it is through the documentation of people (and organisations) that we can establish a usable foundation for the documentation of context more generally.

Archivists and Context

As all archivists know, records, at whatever stage of their existence, are only meaningful if they exist in an accessible contextual framework. Records are created in an active system that is composed of varying levels of structure, internal explication and self-documentation but most importantly they are created in an environment of common understanding and implicit knowledge that establishes the trust on which transactions are based. But how do we, as archivists, capture (document) this framework which is such a complex infusion of assumed knowledge and which, by the very interactions of content and context, is continuously and randomly changing? Certainly part of the challenge for the contemporary archivist is to build documentation processes that can evolve to successfully function through time and across space - systems that can deal with the diachronic and synchronic variation that arises naturally as a product of the passage of time.

The Entity - Relationship Model

The “entity - relationship” model for documenting and managing resources is a conceptual model that has emerged in the literature of various disciplines in recent times and offers the most promising foundation from which such systems may arise.(1) This was neatly applied to archives by Chris Hurley in his recent paper in Archives and Manuscripts:

"Archivists can participate in recordkeeping processes by documenting complex relationships between records and context. Records must be placed in context - in time and place - by fashioning descriptive entities and documenting relationships. This is how we can understand the record and derive evidence, it must be interpreted not by reference to our observation of it in the circumstances obtaining when we access it, but by understanding the circumstances which existed at its creation and changes since. . .The two fundamental issues for discussion concerning archival description are therefore what the descriptive entities should be and what are the relationships we need to show between them."(2)

Figure 1, below, is an example of the entity-relationship model in use to underpin the Resource Description Framework (RDF) which is introduced by Eric Miller as “an infrastructure that enables the encoding, exchange and reuse of structured metadata”.(3)


Click to view full image

Figure 1. A graphic example of the entity-relationship model taken from figure 5 in Eric Miller, Eric, ‘An Introduction to the Resource Description Framework’, D-Lib Magazine, May 1998 at http://www.dlib.org/dlib/may98/miller/05miller.html.

Many types of systems have been created over the years to describe records and in general they attempt to capture context in terms of provenance and structure or order. However, it has only been in a few cultures that the documentation of context has been drawn out of the descriptions of records and managed as separate but related entities. In other words, information about provenance and order, within a defined environment, has been used to create a contextual framework that is used to enhance the meaning of a broad range of records -whether they are in custody or not, destroyed or extant or yet to be created. These examples of the entity-relationship model at work, albeit in forms limited by the technology of the day, provide a solid foundation for future research and development.. The International Council on Archives recognised the essential importance of this separation of entity documentation from resource documentation through its two standards for archival documentation - ISAD(G) for archival records and ISAAR(CPF) for corporations, persons, and families.(4)

The Australian Opportunity

It could be argued that the Australian archival profession has been a leader in this area. But, with a some notable exceptions, we have been slow to optimise the use of our documentation and descriptive practices and this applies especially to our use of the Web.

In recent years, the concerns raised by electronic records and the systems in which they operate, with their high levels of assumed knowledge, low levels of embedded contextual information and rapidly changing technological basis, have been a necessary focus of the record-keeping professions. However, this in turn has led to a narrowing of focus onto the development of standards for the description and management of records. We must remember that this is only part of the story and in the short term provides few archivists with the tools they need now to be archivists rather than current records managers.

The Web Today

The Web is huge and it is growing rapidly in terms of the number of people who have access. However, much of its potential remains largely untapped - especially by archives. It is disappointing to see organisations mounting on the Web electronic versions of their in-house information resources, often hiding them behind database query walls with few or limited general access opportunities. The key content of these resources is usually excluded from the main Web search engines and can only be “discovered” by those that already know where to look for them. Like their paper-based predecessors they usually present a one-way flow of information that inhibits participation and contribution from the broader community of users. They assume prior knowledge and lack the dynamic opportunities to grow, develop and evolve through use.

Carl Lagoze, in examining the nature of the resource discovery process both on and off the web noted that it “is a long-term, multi-threaded, and iterative process with complex and dynamic requirements”(5). His landmark 1997 study “From Static to Dynamic Surrogates: Resource Discovery in the Digital Age” provides strong argument for the use of the entity-relationship model and the imperative need for the building of virtual infrastructure. He observes:

“We do not intend to dismiss the current flock of web indexers as useless. In fact, in the course of the writing this paper we found ourselves using them quite frequently. Making innovative use of IR technology, the indexers are often successful at supporting resource discovery in a framework (the Web and HTTP) that provides little infrastructure support for the service. In fact, even when only marginally successful, the web indexers have a definite role in the resource discovery process.”(6)

More broadly the Web is used predominantly as a publication or marketing tool with little or no attempt to explore the wider opportunities on offer. The commercialistion of the Web has led to technologies that promote this tendency, as the essence of business is to compete, not to collaborate. Cultural heritage activity, on the other hand, is predicated on collaboration between a wide variety of participants. It is critical that we develop the Web technologies that provide us with the virtual infrastructure and tools to facilitate collaboration both on and off the Web. The Web can be many things and it can be different things to different people and different communities and there does not need to be any one particular mode which defines usage. It is important that the heritage communities look creatively at how they can build the virtual infrastructure they require. The Web is not a passing fad and is something that we can play a major role in shaping.

Again Lagoze reflected:

“We believe, however, that the greatest potential for improvement to networked resource discovery lies in the use of dynamic, or derived, surrogates. Lynch, Michelson, et. al.(7) refer to this capability with the comment ‘...it is important to recognize that the networked information environment offers new opportunities to derive (by extraction or computation) a much richer and more diverse set of surrogates from networked objects than the surrogates that were typically found in the print world.’”(8)

The Yale Initiative

In late 1998, a small group of North American archivists and information technology specialists, with funding from USA Digital Libraries Federation, organised a weekend meeting at Yale University to look at whether it was technically possible and indeed worthwhile attempting to develop a international standard for the digital encoding of archival authority records.(9) The starting point for discussion was the existing, but underutilised, International Council on Archives standard, ISAAR(CPF) - a standard defined before the implications of the emerging Web were apparent. The other key factor shaping the meeting was the experience of the Encoded Archival Description (EAD) initiative and the use of Standard Generalised Markup Language (SGML) for encoding archival finding aids.(10)

The meeting, composed of North Americans, Europeans and one Australian representative, examined existing online systems that treated context elements as separate entities with relationships to resources. The National Archives of Australia RINSE data(11) and the Australian Science Archive Project Bright Sparcs web resources(12) were examined along with other examples from the USA and Sweden. In summary the meeting agreed that we were looking at an enormous untapped resource, with implications far beyond the preservation of records for archival purposes. It was agreed that an international working group be established from the core participants of the Yale meeting with the aim of working towards a revised standard of ISAAR(CPF), that takes into account the networking opportunities of the Web and formal requirements of SGML and/or Extensible Markup Language (XML) encoding.(13)

However, what was perhaps the most exciting aspect of this meeting was recognising that this was an achievable objective, with a low technological dependency, that could be applied across the breadth of the archival sector. It has more to do with a change in what we do with the contextual data we collect and documented as opposed to the complex, and for many, unimplementable metadata schemas for records description. The recent work by the Monash University Recordkeeping Metadata Set project stands out as a watershed in this area and indicates optimism for the future may be warranted.(14)

A Network of Context Entities

In June 1999, the Australian Science and Technology Heritage Centre quietly celebrated five years of Bright Sparcs on the Web. This celebration not only marked 5 years of continuous use of a stable, surrogate-structured, context-based, database driven information gateway, built from a large and interactive user group, but also marked the beginning of the first major re-development of Bright Sparcs and the initiation of its sister site Australian Science at Work. Funding for this re-development has come through project-based grants from both the Federal government and the Victorian State government.

At the core of these sites are Hypertext Markup Language (HTML) encoded context entities (people for Bright Sparcs and organisations, societies and other constructs for Australian Science at Work) which are linked, by defined relationships, with other entities and information resources. It is based on a relatively simple conceptual model. It is easy to implement in an uncomplicated environment, but can also become complex very quickly and thus mimic the complexity of real life. The challenge of further research and development is to maintain the simple and accessible foundations while building systems that will allow the complexity to evolve as new data accumulates.

Not only does the ‘entity-relationship’ model provide a useful and workable model for both internal and public information systems it also provides the conceptual foundation for building a structured network of Web-based information gateways based on linked context entities.(15) A very simple example of how this could work is demonstrated in the Appendix, Figures 2 to 4. These show the Bright Sparcs entity for Phillip Law (Figure 2), the published hyperlink (or relationship) (Figure 3) to the parallel entry in the National Archives of Australia RINSE database (Figure 4). At this stage there is no reverse link but we hope this is something that will be achieved in the future.

The re-development of the systems supporting Bright Sparcs, Australian Science at Work and the History of Australian Science and Technology Bibliography, are focused on the expression of the encoded context entities and their relationships in XML. This will enable specific meaning to be given to data elements within an encoded entity and permit a much greater level of control over elements that define an entity in space and time. This in turn opens the door to the exciting analytical and access opportunities offered by data visualisation and graphic representation.

Conclusion

The wide acceptance of the Web and its ability to enable the inter-linking of web spaces and networks provides us with the opportunity to utilise documented (encoded) context entities to build Web-based infrastructure to support cultural heritage activities. Archivists have access to key contextual data, which if systematically and simply encoded could provide the basis for a network of context objects that would underpin a wide variety of functions of local, national and global significance.

The encoding and networking of context objects on the Web has been demonstrated using current Web technologies and has been shown to be a powerful tool for network development and the building of interactive communities that contribute to the depth of information in the system. Bright Sparcs is used extensively each day by a broad range of users from around the world and every week we are contacted by new users with new information and resources to contribute.

The re-development of Bright Sparcs provides the Australian archival community with a research and development opportunity to investigate the potential of the next generation of Web technologies utilising the power of XML encoded objects, but anchored firmly within the framework established by the International Council on Archives through ISAAR(CPF).

Indeed, if the preservation of our cultural heritage is of community concern, in the national interest and of global importance in the building of meaningful lives for ourselves and future generations, then we, as archivists, have a critical role in building contextual frameworks that will minimise the risks of destruction of the essential evidence of that heritage and maximise its ability to be meaningfully interpreted.

Acknowledgments

I would like to thank my colleagues at the Australian Science and Technology Heritage Centre, the University of Melbourne for all their work in helping develop these ideas and turning them into reality. I would especially like to thank Joanne Evans for help in the preparation of this paper.

Appendix

Click to view full image

Figure 2. This shows the top section of an HTML encoded Bright Sparcs entry. Relationships to other resources are defined in the ‘Online Sources’, ‘Archival Sources’ and ‘Published Sources’ sections
Link to original site

Click to view full image

Figure 3. A link from the ‘Online Sources’ connects this Bright Sparcs entity with its parallel entry in the National Archives of Australia RINSE database.
Link to original site

Click to view full image

Figure 4. The top section of National Archives of Australia RINSE entry for Phillip Garth Law that defines him as a ‘Commonwealth Person’.
Link to original site

Footnotes

(1) The best starting point for accessing this literature is through D-Lib Magazine which can be found at http://www.dlib.org/dlib/.

(2) Hurley, Chris, 'The Making and the Keeping of Records: (1) What are Finding Aids For?' Archives and Manuscripts, vol. 26 no. 1, pp. 74 and 75.

(3) Miller, Eric, ‘An Introduction to the Resource Description Framework’, D-Lib Magazine, May 1998 (ISSN 1082-9873) at http://www.dlib.org/dlib/may98/miller/05miller.html.

(4) The International Council on Archives standard for archival authority records, ISAAR(CPF), can be found on the World Wide Web at: http://www.ica.org/cds/isaar_e.html, the related ISAD(G) document can be located through the same site.

(5) Lagoze, Carl, ‘From Static to Dynamic Surrogates: Resource Discovery in the Digital Age’, D-Lib Magazine, June 1997 (ISSN 1082-9873) at http://www.dlib.org/dlib/june97/06lagoze.html.

(6) Lagoze, Carl, ‘From Static to Dynamic Surrogates: Resource Discovery in the Digital Age’, D-Lib Magazine, June 1997 (ISSN 1082-9873), from the section “The Current State of Networked Resource Discovery” at http://www.dlib.org/dlib/june97/06lagoze.html.

(7) Lynch, Clifford, Avra Michelson, Cecilia Preston, and Craig A. Summerhill, CNI White Paper on Networked Information Discovery and Retrieval, Incomplete Draft, at http://www.cni.org/projects/nidr/nidr.html.

(8) Lagoze, Carl, ‘From Static to Dynamic Surrogates: Resource Discovery in the Digital Age’, D-Lib Magazine, June 1997 (ISSN 1082-9873), from the section “Beyond Static Surrogates - Opportunities in Networked Resource Discovery” at http://www.dlib.org/dlib/june97/06lagoze.html.

(9) Information on the ‘Archival Authority Information Meeting’, including papers submitted discussion and reference, can be found on the Web at http://www.library.yale.edu/~rszary/Authority/agenda.html

(10) Pitti, Daniel V., ‘Encoded Archival Description: The Development of an Encoded Standard for Archival Finding Aids’, American Archivist, vol. 60 Summer 1997, pp. 268-283. Online information about this work can be found at: http://lcweb.loc.gov/ead/

(11) This can be found on the Web at http://www.naa.gov.au/RESEARCH/COLECTDB/colectdb.htm.

(12) This can be found on the Web at http://www.asap.unimelb.edu.au/bsparcs/bsparcshome.htm.

(13) For a more detailed account of the meeting see: McCarthy, Gavan, ‘Engineering Utility: A Visionary Role For Encoded Archival Authority Information In Managing Virtual And Physical Resources’ Auswebb99, Balina, Australia April 1999, at http://ausweb.scu.edu.au/aw99/papers/mccarthy/.

(14) Acland, Glenda and Kate Cumming, “SPIRT Recordkeeping Metadata Project” Australian Society of Archivists Conference, Brisbane, July 1999, Archives at Risk: Accountability, Vulnerability and Credibility, in press.

(15) A more detailed development of this idea can be found in: McCarthy, Gavan, ‘Utilizing the Web to Build a Network of Archival Authority Records’ submitted for publication in the International Council on Archives journal Janus, May 1999.

ASA Home  About the ASA  Structure  Membership  Events  Contacts
  Publications  Directory of Archives  Listserve  Links  Site map
Please send your comments and suggestions to the ASA webmaster.
Last updated 13 August 2001.