Identify, obtain, explore

using NLP to link article and journal records in the NHM library catalogue

Authors

Keywords:

natural language processing, serials cataloguing, linked data

Abstract

This paper addresses the critical need to link related records in library catalogues, particularly for aggregated volumes and serial articles in physical collections, to enhance user access and discoverability. The Natural History Museum (NHM) Library, facing a significant backlog of unlinked article-level records within physical journal holdings, developed a semi-automated solution. We outline a two-stage pipeline utilizing natural language processing (NLP) and record linkage techniques to automate the matching of article (child) records to journal (parent) records. This approach, leveraging title similarity and metadata extraction, achieved 60% accuracy against a previously-linked dataset. We discuss the challenge article level records can present and how such computational methods can optimise metadata workflows, allowing human effort to be redirected to more complex tasks and improving access across vast collections with limited resources.

References

Bachmann, M. (2024) rapidfuzz/RapidFuzz (v3.11.0) [Software]. Zenodo. Available at: https://doi.org/10.5281/zenodo.14509091 [Accessed: 31 May 2025]

Breeding, M. (2007) ‘Next-Generation Library Catalogs: Chapter 1 Introduction’, Library Technology Reports, 43(4). Available at: https://librarytechnology.org/document/18344/ [Accessed: 31 May 2025]

British Film Institute (no date) Simple Search. Available at: https://collections-search.bfi.org.uk/web/search/simple [Accessed: 31 May 2025]

de Bruin, J. (2023) Python Record Linkage Toolkit: A toolkit for record linkage and duplicate detection in Python (v0.16) [Software]. Zenodo. Available at: https://doi.org/10.5281/zenodo.8169000 [Accessed: 31 May 2025]

Ex Libris (no date a) Introduction to Alma Inventory. Available at: https://knowledge.exlibrisgroup.com/Alma/Product_Documentation/010Alma_Online_Help_(English)/040Resource_Management/050Inventory/010Introduction_to_Alma_Inventory# [Accessed: 31 May 2025]

Ex Libris (no date b) Mapping to the Display, Facets, and Search Sections in the Primo VE Record. Available at: https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/020Primo_VE/Primo_VE_(English)/120Other_Configurations/Mapping_to_the_Display%2C_Facets%2C_and_Search_Sections_in_the_Primo_VE_Record#MARC21_and_KORMARC_Resource_Type_Mapping [Accessed: 31 May 2025]

Honnibal, M., Montani, I., Van Landeghem, S., and Boyd, A. (2024) spaCy: Industrial-strength Natural Language Processing in Python (v3.7.5) [Software]. Zenodo. Available at: https://doi.org/10.5281/zenodo.1212303 [Accessed: 31 May 2025]

Library of Congress (2016) Leader (NR). Available at: https://www.loc.gov/marc/bibliographic/bdleader.html [Accessed: 31 May 2025]

Riva, P., Le Bœuf, P., Žumer, M. (2024) IFLA Library Reference Model: A Conceptual Model for

Bibliographic Information. International Federation of Library Associations and Institutions (IFLA). Available at: https://repository.ifla.org/handle/20.500.14598/40.2 [Accessed: 31 May 2025]

Sonawane, C.S. (2017) ‘Library Discovery System: An Integrated Approach to Resource Discovery’, Informatics Studies, 4(3), 27-38.

Downloads

Published

2025-06-17

Issue

Section

Articles