Australian Society of Archivists
1999 Conference
Heritage and the Internet - Encoding Context
Objects: Using Knowledge to Reduce Risks
Gavan McCarthy
Introduction
The documentation, management and use of contextual information has
emerged as one of the critical issues of the World Wide Web in the
late twentieth century. Contextual information is that extra,
associated, related, assumed and perhaps a priori
information or knowledge that is required to meaningfully interpret
the content of any given information source. The Web, as it has
evolved over the last five or six years, bridges the boundaries of
space and time like no other medium yet developed. The printing press
of the Middle Ages, the universal postal system of the nineteenth
century, the telegraph and telephony systems also originating in the
nineteenth century, radio and television in the early to mid twentieth
century, all represent important new media that have shaped our
information environment. Is the Web unique in the challenges it poses
to the credibility and authority of content or is this history
repeating itself? Were there the same issues being raised with the
introduction of those earlier information technologies? Unfortunately,
those fascinating questions are not the topic of this paper, but they
do provide the beginnings of a “contextual” framework in
which I will develop the content of this presentation.
Context is multi-faceted, dynamic, highly variable and complex
beyond our ability to build systems to totally contain it. Given that
the effective documentation of context will therefore involve
compromise, the questions facing us are:
- What are the most useful elements? and
- What do we need to know about them?
This paper is about some of the more pragmatic work, underway at the
moment, on the structuring and digital encoding of information
surrogates for people and organisations as the defining elements of
context. The deeper philosophical aspects and recursive structure of
the relationships between content and context, as they apply to all
forms of meaningful cultural transmission, forms the foundation for
this research. Context, like content, is essentially a product of the
action of people and it is through the documentation of people (and
organisations) that we can establish a usable foundation for the
documentation of context more generally.
Archivists and Context
As all archivists know, records, at whatever stage of their
existence, are only meaningful if they exist in an accessible
contextual framework. Records are created in an active system that is
composed of varying levels of structure, internal explication and
self-documentation but most importantly they are created in an
environment of common understanding and implicit knowledge that
establishes the trust on which transactions are based. But how do we,
as archivists, capture (document) this framework which is such a
complex infusion of assumed knowledge and which, by the very
interactions of content and context, is continuously and randomly
changing? Certainly part of the challenge for the contemporary
archivist is to build documentation processes that can evolve to
successfully function through time and across space - systems that can
deal with the diachronic and synchronic variation that arises
naturally as a product of the passage of time.
The Entity - Relationship Model
The “entity - relationship” model for documenting and
managing resources is a conceptual model that has emerged in the
literature of various disciplines in recent times and offers the most
promising foundation from which such systems may arise.(1)
This was neatly applied to archives by Chris Hurley in his recent
paper in Archives and Manuscripts:
"Archivists can participate in recordkeeping processes
by documenting complex relationships between records and context.
Records must be placed in context - in time and place - by fashioning
descriptive entities and documenting relationships. This is how we can
understand the record and derive evidence, it must be interpreted not
by reference to our observation of it in the circumstances obtaining
when we access it, but by understanding the circumstances which
existed at its creation and changes since. . .The two fundamental
issues for discussion concerning archival description are therefore
what the descriptive entities should be and what are the relationships
we need to show between them."(2)
Figure 1, below, is an example of the entity-relationship model in
use to underpin the Resource Description Framework (RDF) which is
introduced by Eric Miller as “an infrastructure that enables the
encoding, exchange and reuse of structured metadata”.(3)

Figure 1. A graphic example of the
entity-relationship model taken from figure 5 in Eric Miller, Eric, ‘An
Introduction to the Resource Description Framework’, D-Lib
Magazine, May 1998 at
http://www.dlib.org/dlib/may98/miller/05miller.html.
Many types of systems have been created over the years to describe
records and in general they attempt to capture context in terms of
provenance and structure or order. However, it has only been in a few
cultures that the documentation of context has been drawn out of the
descriptions of records and managed as separate but related entities.
In other words, information about provenance and order, within a
defined environment, has been used to create a contextual framework
that is used to enhance the meaning of a broad range of records
-whether they are in custody or not, destroyed or extant or yet to be
created. These examples of the entity-relationship model at work,
albeit in forms limited by the technology of the day, provide a solid
foundation for future research and development.. The International
Council on Archives recognised the essential importance of this
separation of entity documentation from resource documentation through
its two standards for archival documentation - ISAD(G) for archival
records and ISAAR(CPF) for corporations, persons, and families.(4)
The Australian Opportunity
It could be argued that the Australian archival profession has been
a leader in this area. But, with a some notable exceptions, we have
been slow to optimise the use of our documentation and descriptive
practices and this applies especially to our use of the Web.
In recent years, the concerns raised by electronic records and the
systems in which they operate, with their high levels of assumed
knowledge, low levels of embedded contextual information and rapidly
changing technological basis, have been a necessary focus of the
record-keeping professions. However, this in turn has led to a
narrowing of focus onto the development of standards for the
description and management of records. We must remember that this is
only part of the story and in the short term provides few archivists
with the tools they need now to be archivists rather than current
records managers.
The Web Today
The Web is huge and it is growing rapidly in terms of the number of
people who have access. However, much of its potential remains largely
untapped - especially by archives. It is disappointing to see
organisations mounting on the Web electronic versions of their
in-house information resources, often hiding them behind database
query walls with few or limited general access opportunities. The key
content of these resources is usually excluded from the main Web
search engines and can only be “discovered” by those that
already know where to look for them. Like their paper-based
predecessors they usually present a one-way flow of information that
inhibits participation and contribution from the broader community of
users. They assume prior knowledge and lack the dynamic opportunities
to grow, develop and evolve through use.
Carl Lagoze, in examining the nature of the resource discovery
process both on and off the web noted that it “is a long-term,
multi-threaded, and iterative process with complex and dynamic
requirements”(5). His landmark 1997
study “From Static to Dynamic Surrogates: Resource Discovery in
the Digital Age” provides strong argument for the use of the
entity-relationship model and the imperative need for the building of
virtual infrastructure. He observes:
“We do not intend to dismiss the current flock of web
indexers as useless. In fact, in the course of the writing this paper
we found ourselves using them quite frequently. Making innovative use
of IR technology, the indexers are often successful at supporting
resource discovery in a framework (the Web and HTTP) that provides
little infrastructure support for the service. In fact, even when only
marginally successful, the web indexers have a definite role in the
resource discovery process.”(6)
More broadly the Web is used predominantly as a publication or
marketing tool with little or no attempt to explore the wider
opportunities on offer. The commercialistion of the Web has led to
technologies that promote this tendency, as the essence of business is
to compete, not to collaborate. Cultural heritage activity, on the
other hand, is predicated on collaboration between a wide variety of
participants. It is critical that we develop the Web technologies that
provide us with the virtual infrastructure and tools to facilitate
collaboration both on and off the Web. The Web can be many things and
it can be different things to different people and different
communities and there does not need to be any one particular mode
which defines usage. It is important that the heritage communities
look creatively at how they can build the virtual infrastructure they
require. The Web is not a passing fad and is something that we can
play a major role in shaping.
Again Lagoze reflected:
“We believe, however, that the greatest potential for
improvement to networked resource discovery lies in the use of
dynamic, or derived, surrogates. Lynch, Michelson, et. al.(7)
refer to this capability with the comment ‘...it is important to
recognize that the networked information environment offers new
opportunities to derive (by extraction or computation) a much richer
and more diverse set of surrogates from networked objects than the
surrogates that were typically found in the print world.’”(8)
The Yale Initiative
In late 1998, a small group of North American archivists and
information technology specialists, with funding from USA Digital
Libraries Federation, organised a weekend meeting at Yale University
to look at whether it was technically possible and indeed worthwhile
attempting to develop a international standard for the digital
encoding of archival authority records.(9)
The starting point for discussion was the existing, but underutilised,
International Council on Archives standard, ISAAR(CPF) - a standard
defined before the implications of the emerging Web were apparent. The
other key factor shaping the meeting was the experience of the Encoded
Archival Description (EAD) initiative and the use of Standard
Generalised Markup Language (SGML) for encoding archival finding aids.(10)
The meeting, composed of North Americans, Europeans and one
Australian representative, examined existing online systems that
treated context elements as separate entities with relationships to
resources. The National Archives of Australia RINSE data(11)
and the Australian Science Archive Project Bright Sparcs web
resources(12) were examined along with
other examples from the USA and Sweden. In summary the meeting agreed
that we were looking at an enormous untapped resource, with
implications far beyond the preservation of records for archival
purposes. It was agreed that an international working group be
established from the core participants of the Yale meeting with the
aim of working towards a revised standard of ISAAR(CPF), that takes
into account the networking opportunities of the Web and formal
requirements of SGML and/or Extensible Markup Language (XML) encoding.(13)
However, what was perhaps the most exciting aspect of this meeting
was recognising that this was an achievable objective, with a low
technological dependency, that could be applied across the breadth of
the archival sector. It has more to do with a change in what we do
with the contextual data we collect and documented as opposed to the
complex, and for many, unimplementable metadata schemas for records
description. The recent work by the Monash University Recordkeeping
Metadata Set project stands out as a watershed in this area and
indicates optimism for the future may be warranted.(14)
A Network of Context Entities
In June 1999, the Australian Science and Technology Heritage Centre
quietly celebrated five years of Bright Sparcs on the Web.
This celebration not only marked 5 years of continuous use of a
stable, surrogate-structured, context-based, database driven
information gateway, built from a large and interactive user group,
but also marked the beginning of the first major re-development of
Bright Sparcs and the initiation of its sister site Australian
Science at Work. Funding for this re-development has come through
project-based grants from both the Federal government and the
Victorian State government.
At the core of these sites are Hypertext Markup Language (HTML)
encoded context entities (people for Bright Sparcs and
organisations, societies and other constructs for Australian
Science at Work) which are linked, by defined relationships, with
other entities and information resources. It is based on a relatively
simple conceptual model. It is easy to implement in an uncomplicated
environment, but can also become complex very quickly and thus mimic
the complexity of real life. The challenge of further research and
development is to maintain the simple and accessible foundations while
building systems that will allow the complexity to evolve as new data
accumulates.
Not only does the ‘entity-relationship’ model provide a
useful and workable model for both internal and public information
systems it also provides the conceptual foundation for building a
structured network of Web-based information gateways based on linked
context entities.(15) A very simple
example of how this could work is demonstrated in the Appendix,
Figures 2 to 4. These show the Bright Sparcs entity for
Phillip Law (Figure 2), the published hyperlink
(or relationship) (Figure 3) to the parallel
entry in the National Archives of Australia RINSE database (Figure
4). At this stage there is no reverse link but we hope this is
something that will be achieved in the future.
The re-development of the systems supporting Bright Sparcs,
Australian Science at Work and the History of
Australian Science and Technology Bibliography, are focused on
the expression of the encoded context entities and their relationships
in XML. This will enable specific meaning to be given to data elements
within an encoded entity and permit a much greater level of control
over elements that define an entity in space and time. This in turn
opens the door to the exciting analytical and access opportunities
offered by data visualisation and graphic representation.
Conclusion
The wide acceptance of the Web and its ability to enable the
inter-linking of web spaces and networks provides us with the
opportunity to utilise documented (encoded) context entities to build
Web-based infrastructure to support cultural heritage activities.
Archivists have access to key contextual data, which if systematically
and simply encoded could provide the basis for a network of context
objects that would underpin a wide variety of functions of local,
national and global significance.
The encoding and networking of context objects on the Web has been
demonstrated using current Web technologies and has been shown to be a
powerful tool for network development and the building of interactive
communities that contribute to the depth of information in the system.
Bright Sparcs is used extensively each day by a broad range
of users from around the world and every week we are contacted by new
users with new information and resources to contribute.
The re-development of Bright Sparcs provides the
Australian archival community with a research and development
opportunity to investigate the potential of the next generation of Web
technologies utilising the power of XML encoded objects, but anchored
firmly within the framework established by the International Council
on Archives through ISAAR(CPF).
Indeed, if the preservation of our cultural heritage is of community
concern, in the national interest and of global importance in the
building of meaningful lives for ourselves and future generations,
then we, as archivists, have a critical role in building contextual
frameworks that will minimise the risks of destruction of the
essential evidence of that heritage and maximise its ability to be
meaningfully interpreted.
Acknowledgments
I would like to thank my colleagues at the
Australian Science and
Technology Heritage Centre, the University of Melbourne for all
their work in helping develop these ideas and turning them into
reality. I would especially like to thank Joanne Evans for help in the
preparation of this paper.
Appendix
Figure 2. This shows the top section of an HTML
encoded Bright Sparcs entry. Relationships to other
resources are defined in the ‘Online Sources’, ‘Archival
Sources’ and ‘Published Sources’ sections
Link
to original site

Figure 3. A link from the ‘Online Sources’
connects this Bright Sparcs entity with its parallel entry in the
National Archives of Australia RINSE database.
Link
to original site

Figure 4. The top section of National Archives of
Australia RINSE entry for Phillip Garth Law that defines him as a ‘Commonwealth
Person’.
Link
to original site
Footnotes
(1) The best starting point for accessing this
literature is through D-Lib Magazine which can be found at
http://www.dlib.org/dlib/.
(2) Hurley, Chris, 'The Making and the Keeping of
Records: (1) What are Finding Aids For?' Archives and Manuscripts,
vol. 26 no. 1, pp. 74 and 75.
(3) Miller, Eric, ‘An Introduction to the
Resource Description Framework’, D-Lib Magazine, May
1998 (ISSN 1082-9873) at
http://www.dlib.org/dlib/may98/miller/05miller.html.
(4) The International Council on Archives standard
for archival authority records, ISAAR(CPF), can be found on the World
Wide Web at: http://www.ica.org/cds/isaar_e.html,
the related ISAD(G) document can be located through the same site.
(5) Lagoze, Carl, ‘From Static to Dynamic
Surrogates: Resource Discovery in the Digital Age’, D-Lib
Magazine, June 1997 (ISSN 1082-9873) at
http://www.dlib.org/dlib/june97/06lagoze.html.
(6) Lagoze, Carl, ‘From Static to Dynamic
Surrogates: Resource Discovery in the Digital Age’, D-Lib
Magazine, June 1997 (ISSN 1082-9873), from the section “The
Current State of Networked Resource Discovery” at
http://www.dlib.org/dlib/june97/06lagoze.html.
(7) Lynch, Clifford, Avra Michelson, Cecilia
Preston, and Craig A. Summerhill, CNI White Paper on Networked
Information Discovery and Retrieval, Incomplete Draft, at
http://www.cni.org/projects/nidr/nidr.html.
(8) Lagoze, Carl, ‘From Static to Dynamic
Surrogates: Resource Discovery in the Digital Age’, D-Lib
Magazine, June 1997 (ISSN 1082-9873), from the section “Beyond
Static Surrogates - Opportunities in Networked Resource Discovery”
at http://www.dlib.org/dlib/june97/06lagoze.html.
(9) Information on the ‘Archival Authority
Information Meeting’, including papers submitted discussion and
reference, can be found on the Web at
http://www.library.yale.edu/~rszary/Authority/agenda.html
(10) Pitti, Daniel V., ‘Encoded Archival
Description: The Development of an Encoded Standard for Archival
Finding Aids’, American Archivist, vol. 60 Summer 1997,
pp. 268-283. Online information about this work can be found at:
http://lcweb.loc.gov/ead/
(11) This can be found on the Web at
http://www.naa.gov.au/RESEARCH/COLECTDB/colectdb.htm.
(12) This can be found on the Web at
http://www.asap.unimelb.edu.au/bsparcs/bsparcshome.htm.
(13) For a more detailed account of the meeting
see: McCarthy, Gavan, ‘Engineering Utility: A Visionary Role For
Encoded Archival Authority Information In Managing Virtual And
Physical Resources’ Auswebb99, Balina, Australia April
1999, at http://ausweb.scu.edu.au/aw99/papers/mccarthy/.
(14) Acland, Glenda and Kate Cumming, “SPIRT
Recordkeeping Metadata Project” Australian Society of Archivists
Conference, Brisbane, July 1999, Archives at Risk:
Accountability, Vulnerability and Credibility, in press.
(15) A more detailed development of this idea can
be found in: McCarthy, Gavan, ‘Utilizing the Web to Build a
Network of Archival Authority Records’ submitted for publication
in the International Council on Archives journal Janus, May
1999. |