Starting on OBLATES-LOD

I’ve recently posted about the challenge of establishing basic facts about Indian Residential Schools (IRS) in Canada, e.g. How many were there? What were their names? When were they open? Who attended them?

I’m convinced that Linked Open Data (LOD) and semantic web technologies will help us overcome this challenge.

OBLATES-LOD applies these technologies to the study of 62 Indian Residential Schools managed by the Oblates of Mary Immaculate (Oblates).

NCTR

The National Centre for Truth and Reconciliation (NCTR) is a vital repository of records about Canada’s IRSs. The NCTR’s online archive runs on Access to Memory (AtoM) – an open source that implements various descriptive standards, including:

  • General International Standard Archival Description (ISAD(G))
  • International Standard Archival Authority Records (Corporate bodies, persons, and families) (ISAAR-CPF)
  • Dublin Core Metadata Element Set, Version 1.1. (DC)

We’re initially interested in two of AtoM’s main entity types:

Authority records

Authority records provide descriptions of the actors (corporate bodies, persons, and families) that interact with archival materials as creators, custodians, subject access points, etc. AtoM includes an edit template, based on the International Standard Archival Authority Records (Corporate bodies, Persons, Families) (ISAAR-CPF).

Authority records are linked to archival descriptions in AtoM by events delimited by start/end dates. Through events, one actor can have zero, one, or many relationships to zero, one, or many archival units; and one archival unit can have zero, one, or many relationships to zero, one, or many actors. Event relationships link ISAAR authority files (descriptions of actors) and ISAD(G) records (descriptions of archival materials).

Archival descriptions

Archival descriptions provide contextual information about archival materials and are arranged into hierarchical levels (fonds, series, files, items, etc.). AtoM’s default archival description edit template contains data elements based on the General International Standard Archival Description (ISAD(G)). Other edit templates include data elements based on Dublin Core (DC).

NCTR’s Authority Records

In October 2022, the NCTR’s online archive included Authority Records for 1,531 Entities.1In August 2021, the number was 1,521. Every Entity was assigned to one of seven mutually-exclusive Entity Types (Table 1).

Table 1. Distribution of Authority Records by Entity Type (NCTR on-line archive, accessed October 2022).
Entity Type Count Percent
Person 949 62.0%
Corporate body 312 20.4%
School 171 11.2%
Event 90 5.9%
Community 2 0.1%
Family 1 0.1%
Medical facility 1 0.1%
Untyped 5 0.3%
Total 1,531 100.0%

We’d like to identify on Authority Records that relate directly or indirectly to the Oblates.

Our first thought was simply to search for the terms “Oblate” and “Oblates” in the NCTR online archive. Unfortunately, these searches yielded only 55 results – 28 Schools and 27 Corporate bodies (Figure 1).2Selecting the “Roman Catholic” filter yielded 65 hits – all Indian Residential Schools – no affiliated organizations, e.g. Oblates, Grey Nuns, etc.

Figure 1. Search results for the term “Oblates” on the NCTR’s online archive.

Our research indicates that the Oblates partnered with many other Catholic organizations to manage at least 62 Indian Residential Schools. Clearly, we need to examine the Authority Records more closely.

The NCTR’s online archive lists its Authority Records in a series of 154 web pages that includes links to HTML versions (“view pages” in AtoM parlance) of its 1,531 Authority Records. We published these links.

One of these links, for example, takes us to the view page of the Amos Residential School’s Authority Record (Figure 2). The AtoM user’s manual discusses how to interpret this sort of resource – so we won’t go into details.

Figure 2. View page of the Amos Residential School’s Authority Record.

However, we do want to highlight that this resource provides two avenues for exploring Authority Records more thoroughly.

First, we observe that the HTML code underlying the view pages of Authority Records include a good deal of semi-structured data. We extracted these data, dealt with some inconsistencies and redundancies (see details), and then published them.

Second, we see a link labelled “EAC” at the top right-hand side of an Authority Record’s view page (Figure 1). We used this link to export Authority Records as an “Encoded Archival Context” XML files.3The Encoded Archive Context for Corporate Bodies, Persons, and Families (EAC-CPF) is an XML-Schema compliant with the ISAAR-CPF standard. We compressed and then published these XML files.

We also partitioned the XML files into four files (to address multiple entries per field per Authority Record):

  1. fields with no more than one entry per Authority Record
  2. fields with one or more authorized names and alternative names per Authority Record
  3. fields with one or more access points and access point entries per Authority Record
  4. fields with one or more resource relations, resources, relations, exist dates, and descriptive notes per Authority Record

These four files provide the basis for identifying Authority Records that relate directly or indirectly to the Oblates.