Our start on OBLATES-LOD has focused on extracting data from two of AtoM’s main entity types underlying the NCTR’s online archive:
Here we complete three tasks:
- Extracting data from 89 Authority Records relate to the Oblates.
- Collecting HTML and XML versions of the Archival Descriptions of resources that relate to these Authority Records.
- Extracting data from these versions of the Archival Descriptions.
Extracting data from Authority Records related to the Oblates
We have described two approaches to collecting data from Authority Records in the NCTR’s online archive to help us to identify resources that relate to the Oblates of Mary Immaculate (Oblates):
- Collecting semi-structured data from the HTML code underlying the view pages of 1,531 Authority Records.
- Exporting these Authority Records as Encoded Archival Context (EAC) XML files.
Five files result that help us identify Authority Records directly or indirectly related to the Oblates:
- fields within the HTML code underlying the view pages of Authority Records.
And, after partitioning the EAC XML files into four files (to address multiple entries per field per Authority Record):
- fields with no more than one entry per Authority Record
- fields with one or more authorized names and alternative names per Authority Record
- fields with one or more access points and access point entries per Authority Record
- fields with one or more resource relations, resources, relations, exist dates, and descriptive notes per Authority Record
Files 1 & 2
Files 1 & 2 contain single entries for 62 Indian Residential Schools once managed by the Oblates of Mary Immaculate (Oblates) plus entries for 27 Corporate bodies that include the term “Oblates” in the History/biogHist fields. We extracted, merged, reconciled, and published these 89 entries.
Files 3, 4 & 5
Files 3, 4 & 5 contain one or more entries per Authority Record. We extracted and published entries for the 89 Authority Records related to the Oblates:
Archival descriptions provide contextual information about archival materials and are arranged into hierarchical levels (fonds, series, files, items, etc.). AtoM’s default archival description edit template contains data elements based on the General International Standard Archival Description (ISAD(G)). Other edit templates include data elements based on Dublin Core (DC).
File 5 contains 18,682 links to resources in 89 Authority Records related to the Oblates.
Figure 1 shows the view page of the Archival Description of the Amos Residential School’s Bilingual One Page History:
The AtoM user manual discusses how to read this sort of resource – so we won’t go into details.
We collected and compressed the HTML code underlying the view pages of these Archival Descriptions into a ZIP file.
Next, we want to highlight two links at the top right-hand side of the view page of the Archival Description in Figure 1.
The first link – labelled “Dublin Core 1.1 XML” – supports the export of the Archival Description as DC XML.1The DC XML export is not hierarchical – that is, child records are not included. We used these links to access both machine-readable versions of the Archival Descriptions related to the Oblates. We compressed the DC XML files for these Archival Descriptions into a ZIP file.
The second link – labelled “EAD 2002 XML” – supports the export of a single Encoded Archival Description (EAD) of the entire “TRC Document Collection” held by the NTRC XML file created on February 17, 2021.