The NCTR’s online archive lists its Authority Records in a series of web pages that includes links to view pages of these Authority Records. We extracted a good deal of data from the HTML code underlying these view pages. Unfortunately, these data required considerable processing.
First, we needed to sort hundreds of values of different types that were entered haphazardly in catch-all fields named “Creator of”, “Creator of 2”, “Creator of 3”, …, “Creator of 13”. Second, many entries in these “Creator of n” fields were tagged with labels that were nearly identical to the names of other (primary) fields (Table 1). We found that every one of these tagged entries in a “Creator of n” field was matched by the entry in the corresponding (primary) field – and so considered these tagged entries to be redundant and ultimately discarded them.
Name of primary field | Related label in a “Creator of n” field |
Parallel Forms of Name | Parallel form(s) of name |
Other Forms of Name | Other form(s) of name |
History | History |
Sources | Sources |
Functions Occupations and Activities | Functions, occupations and activities |
Places | Places |
Subject Access Points | Subject Access Points |
Place Access Points | Place Access Points |
Internal Structures Genealogy | Internal structures/genealogy |
Finally, it’s worth noting that many of the remaining entries in “Creator of n” fields referenced multiple target entities. While these entries give the total number of targets, no more than ten entities are named. We had to look elsewhere for a possible solution to this problem.