Smart enough to mislead

the functional shortcomings and ethical dilemmas of generative AI use in metadata work

Authors

Keywords:

generative AI, metadata creation, metadata enhancement, AI ethics

Abstract

This article critically examines the applicability of generative AI in library metadata creation and cataloguing, arguing that despite growing interest and experimentation, such technologies remain fundamentally unsuited for this domain. Drawing on recent literature, surveys, and institutional case studies, the author demonstrates that generative AI tools consistently produce metadata outputs that are unreliable, inconsistent, and ethically problematic. While machine learning offers potential in specific, supervised metadata functions, generative AI’s reliance on probabilistic outputs, lack of transparency, and tendency to hallucinate undermine the accuracy and reliability essential to cataloguing. The article also explores the broader ethical implications of AI adoption in libraries, including issues of bias, environmental impact, copyright concerns, and labour exploitation. The author argues that fully automated metadata creation using generative AI is neither technically viable nor ethically responsible and instead advocates for cautious, critically informed AI integration, emphasising the continued necessity of human oversight and ethical scrutiny in metadata work.

References

Amram, T., Malamud, R. G., and Hollingsworth, C. (2023) Response to "From ChatGPT to CatGPT". Information Technology and Libraries, 42(4). Available at: https://doi.org/10.5860/ital.v42i4.16983

Bagchi, S. (2023) What is a black box? A computer scientist explains what it means when the inner workings of AIs are hidden. The Conversation, 22 May. Available at: https://theconversation.com/what-is-a-black-box-a-computer-scientist-explains-what-it-means-when-the-inner-workings-of-ais-are-hidden-203888 [Accessed: 24 May 2025]

Baldé, C. P., Kuehr, R., Yamamoto, T., McDonald, R., D’Angelo, E., Althaf, S., Bel, G., Deubzer, O., Fernandez-Cubillo, E., Forti, V., Gray, V., Heart, S., Honda, S., Iattoni, G., Khetriwal, D. S., Luda di Cortemiglia, V., Lobuntsova, Y., Nnorom, I., Pralat, N. and Wagner, M. (2024) Global E-waste Monitor 2024. Geneva/Bonn: International Telecommunication Union and United Nations Institute for Training and Research. Available at: https://ewastemonitor.info/wp-content/uploads/2024/12/GEM_2024_EN_11_NOV-web.pdf [Accessed: 28 May 2025]

Ball, Caroline (2025) The Unethical Underbelly of AI: A Call for Universities to Take a Stand. UKSG News, 586. Available at: https://www.uksg.org/newsletter/uksg-enews-586/enews-586-editorial/ [Accessed 21 May 2025]

Barker, C. (2024) Artificial intelligence and the environment: Taking a responsible approach. JISC Artifical Inteligence, 18 September. Available at: https://nationalcentreforai.jiscinvolve.org/wp/2024/09/18/artificial-intelligence-and-the-environment-taking-a-responsible-approach/ [Accessed: 24 May 2025]

Barker, C. (2025) Artificial intelligence and the environment: The current landscape. JISC Artifical Inteligence, 28 March. Available at: https://nationalcentreforai.jiscinvolve.org/wp/2025/03/28/artificial-intelligence-and-the-environment-the-current-landscape/ [Accessed: 24 May 2025]

Barratt, L. and Gambarini, C. (2025) Revealed: Big tech’s new datacentres will take water from the world’s driest areas. The Guardian (Online), 9 April. Available at: https://www.theguardian.com/environment/2025/apr/09/big-tech-datacentres-water [Accessed: 9 April 2025]

Battersby, M. (2024) Academic authors 'shocked' after Taylor & Francis sells access to their research to Microsoft AI. The Bookseller, 19 July. Available at: https://www.thebookseller.com/news/academic-authors-shocked-after-taylor--francis-sells-access-to-their-research-to-microsoft-ai [Accessed: 24 May 2025]

Ben Abacha, A., Yim, W.-W., Fu, Y., Sun, Z., Yetisgen, M., Xia, F. and Lin, T. (2025) MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes. Preprint. Available at: https://doi.org/10.48550/arXiv.2412.19260

Berkowitz, A. E. (2025) “Slow-MO or FOMO”: AI conversations at library conferences. Public Services Quarterly, 21(1), pp. 51-70. Available at: https://doi.org/10.1080/15228959.2024.2442657

Blouin, L. (2023) AI's mysterious ‘black box’ problem, explained. University of Michigan-Dearborn News, 6 March. Available at: https://umdearborn.edu/news/ais-mysterious-black-box-problem-explained [Accessed: 24 May 2025]

Brzustowicz, R. (2023) From ChatGPT to CatGPT: The Implications of Artificial Intelligence on Library Cataloging. Information Technology and Libraries, 42(3). Available at: https://doi.org/10.5860/ital.v42i3.16295

Chen, S. and Li, M. (2024) AI for Cataloging and Metadata Creation: Perspectives and Future Opportunities from Cataloging and Metadata Professionals. Technical Services Quarterly, 41(4), pp. 317-332. Available at: https://doi.org/10.1080/07317131.2024.2394919

Chow, E. H. C., Kao, T. J. and Li, X. (2024) An Experiment with the Use of ChatGPT for LCSH Subject Assignment on Electronic Theses and Dissertations. Cataloging & Classification Quarterly, 62(5), pp. 574-588. Available at: https://doi.org/10.1080/01639374.2024.2394516

Chowdhury, N., Johnson, D., Huang, V., Steinhardt, J. and Schwettmann, S. (2025) Investigating truthfulness in a pre-release o3 model. Available at: https://transluce.org/investigating-o3-truthfulness [Accessed 24 May 2025]

Constantino, Tor (2025) U.S. Copyright Office Shocks Big Tech With AI Fair Use Rebuke. Forbes, 29 May. Available at: https://www.forbes.com/sites/torconstantino/2025/05/29/us-copyright-office-shocks-big-tech-with-ai-fair-use-rebuke/ [Accessed 31 May 2025]

Cornish, B. and Scott, B. (2025) Identify, Obtain, Explore: using NLP to link article and journal records in the NHM library catalogue. Catalogue & Index, 211, pp. 20-27 . Available at: https://journals.cilip.org.uk/catalogue-and-index/article/view/749

Corrado, E. M. (2021) Artificial Intelligence: The Possibilities for Metadata Creation. Technical Services Quarterly, 38(4), pp. 395-405. Available at: https://doi.org/10.1080/07317131.2021.1973797

Crawford, K. (2024) Generative AI’s environmental costs are soaring - and mostly secret. Nature, 626, p. 693. Available at: https://doi.org/10.1038/d41586-024-00478-x

Crownhart, C. (2024) AI will add to the e-waste problem. Here’s what we can do about it. MIT Technology Review, 28 October. Available at: https://www.technologyreview.com/2024/10/28/1106316/ai-e-waste/ [Accessed: 24 May 2025]

Deutsche Nationalbibliothek (2023) Launch of cataloguing machine EMa. Available at: https://jahresbericht.dnb.de/Webs/jahresbericht/EN/2022/Hoehepunkte/Erschliessungsmaschine/erschliessungsmaschine_node.html [Accessed: 24 May 2025]

DeZelar-Tiedman, C. (2023) Response to "From ChatGPT to CatGPT". Information Technology and Libraries, 42(4). Available at: https://doi.org/10.5860/ital.v42i4.16991

Eaton, L. (2024) Research Insights #12: Copyrights and Academia: Scholarly authors are not going to be happy... AI+Edu=Simplified, 23 July. Available at: https://aiedusimplified.substack.com/p/research-insights-12-copyrights-and [Accessed: 24 May 2025]

ExLibris (no date a) AI Bibliographic Records Enrichment. Available at: https://knowledge.exlibrisgroup.com/Content/Knowledge_Articles/Alma/Knowledge_Articles/AI_Bibliographic_Records_Enrichment [Accessed: 31 May 2025]

ExLibris (no date b) AI Metadata Enrichment for Libraries. Available at: https://knowledge.exlibrisgroup.com/Alma/Product_Materials/010Roadmap/AI_Metadata_Enrichment_for_Libraries [Accessed: 31 May 2025]

Floyd, D. (2023) Response to "From ChatGPT to CatGPT". Information Technology and Libraries, 42(4). Available at: https://doi.org/10.5860/ital.v42i4.16995

Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M. M., Kim, S., Dernoncourt, F., Yu, T., Zhang, R. and Ahmed, N. K. (2024) Bias and Fairness in Large Language Models: A Survey. Computational Linguistics, 50(3), pp. 1097-1179. Available at: https://doi.org/10.1162/coli_a_00524

Goldman Sachs (2024) AI, data centers and the coming US power demand surge. Available at: https://www.goldmansachs.com/pdfs/insights/pages/generational-growth-ai-data-centers-and-the-coming-us-power-surge/report.pdf [Accessed: 24 May 2025]

Golub, K., Suominen, O., Mohammed, A.T., Aagaard, H. and Osterman, O. (2024) Automated Dewey Decimal Classification of Swedish library metadata using Annif software. Journal of Documentation, 80(5), pp. 1057-1079. Available at: https://doi.org/10.1108/JD-01-2022-0026

Google (no date) What is Retrieval-Augmented Generation (RAG)? Available at: https://cloud.google.com/use-cases/retrieval-augmented-generation [Accessed: 03 June 2025]

Hicks, M. T., Humphries, J. and Slater, J. (2024) ChatGPT is bullshit. Ethics and Information Technology, 26(2). Available at: https://doi.org/10.1007/s10676-024-09775-5

Inkinen, J., Lehtinen, M. and Suominen, O. (2025) Annif Users Survey: Understanding Usage and Challenges. National Library of Finland. Available at: https://urn.fi/URN:ISBN:978-952-84-1301-1 [Accessed: 24 May 2025]

Kosinski, M. (no date) What is black box artificial intelligence (AI)? Available at: https://www.ibm.com/think/topics/black-box-ai [Accessed: 31 May 2025]

Library of Congress (no date) Exploring Computational Description. Available at: https://labs.loc.gov/work/experiments/ECD [Accessed: 24 May 2025]

Lowagie, H. (2024) Harnessing Power Apps and AI for Automated Cataloguing: Innovations in Bibliographic Record Creation. Catalogue & Index, 209. Available at: https://journals.cilip.org.uk/catalogue-and-index/article/view/697 [Accessed: 01 May 2025]

Luccioni, A. S., Jernite, Y. and Strubell, E. (2024) Power Hungry Processing: Watts driving the cost of AI deployment?, FAccT '24: The 2024 ACM Conference on Fairness, Accountability, and Transparency, Rio de Janeiro, 3-6 June. Available at: https://doi.org/10.1145/3630106.3658542

Luccioni, A. S., Viguier, S. and Ligozat, A. L. (2023) Estimating the carbon footprint of BLOOM, a 176B parameter language model. Journal of Machine Learning Research, 24(253), pp. 1-15. Available at: https://jmlr.org/papers/v24/23-0069.html [Accessed: 31 May 2025]

Luccioni, S., Trevelin, B. and Mitchell, M. (2024) The Environmental Impacts of AI - Policy Primer. Hugging Face Blog, 3 September. Available at: https://doi.org/10.57967/hf/3004

Metz, C. and Weise, K. (2025) AI is getting more powerful, but its hallucinations are getting worse. New York Times (Online), 5 May. Available at: https://www.nytimes.com/2025/05/05/technology/ai-hallucinations-chatgpt-google.html [Accessed: 31 May 2025]

Mollick, E. (2025 a) The End of Search, The Beginning of Research: The first narrow agents are here. One Useful Thing, 3 February. Available at: https://www.oneusefulthing.org/p/the-end-of-search-the-beginning-of [Accessed 6 April 2025]

Mollick, E. (2025 b) A new generation of AIs: Claude 3.7 and Grok 3. One Useful Thing, 24 February. Available at: https://www.oneusefulthing.org/p/a-new-generation-of-ais-claude-37 [Accessed: 6 April 2025]

Moulaison-Sandy, H. and Coble, Z. (2024) Leveraging AI in Cataloging: What Works, and Why? Technical Services Quarterly, 41(4), pp. 375-383. Available at: https://doi.org/10.1080/07317131.2024.2394912

Murgia, M., Clark, D., Learner, S., de la Torre Arenas, I., Joiner, S., Hemingway, E. and Hawkins, O. (2023) Generative AI exists because of the transformer. Financial Times, 12 September. Available at: https://ig.ft.com/generative-ai/ [Accessed: 24 May 2025]

National Library of Finland (2025) Annif. Available at: https://annif.org/ [Accessed: 24 May 2025]

OCLC (2025) Implementing AI to further scale and accelerate WorldCat de-duplication. Available at: https://www.oclc.org/en/news/announcements/2025/ai-worldcat-deduplication.html [Accessed: 24 May 2025]

OpenAI (2025) OpenAI o3 and o4-mini System Card. Available at: https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f3722c1/o3-and-o4-mini-system-card.pdf [Accessed: 1 June 2025]

Poley, C., Uhlmann, S., Busse, F., Jacobs, J.-H., Kähler, M., Nagelschmidt, M. and Schumacher, M. (2025) Automatic Subject Cataloguing at the German National Library. Liber Quarterly, 35. Available at: https://doi.org/10.53377/lq.19422

Program for Cooperative Cataloguing (2024) PCC Task Group on Strategic Planning for AI and Machine Learning: Final Report Transmittal & Tracking Sheet. Available at: https://www.loc.gov/aba/pcc/taskgroup/TG-Strategic-Planning-AI-final-report.pdf [Accessed: 21 May 2025]

Resnik, P. (2025) Large Language Models Are Biased Because They Are Large Language Models. To be published in Computational Linguistics [Peer-reviewed accepted version]. Available at: https://doi.org/10.1162/coli_a_00558

Reisner, A. (2025) The Unbelievable Scale of AI’s Pirated-Books Problem. The Atlantic (Online), 20 March. Available at: https://www.theatlantic.com/technology/archive/2025/03/libgen-meta-openai/682093/ [Accessed: 28 May 2025]

Ren, S. and Wierman, A. (2024) The Uneven Distribution of AI’s Environmental Impacts. Harvard Business Review, 15 July. Available at: https://hbr.org/2024/07/the-uneven-distribution-of-ais-environmental-impacts [Accessed: 24 May 2025]

Rosser, C. and Hanegan, M. (2024) Cyborgs and Centaurs, Prophets and Priests: Anywhere Left for Curators? Atla, 17 April. Available at: https://www.atla.com/blog/cyborgs-and-centaurs-prophets-and-priests-anywhere-left-for-curators/ [Accessed: 28 May 2025]

Saccucci, C. and Potter, A. (2024 a) Exploring Computational Description: LC Labs Planning Framework in Action, OCLC RLP Webinar, 12 March. Available at: https://www.oclc.org/research/events/2024/ai-planning-framework-in-action.html [Accessed: 15 May 2025]

Saccucci, C. and Potter, A. (2024 b) Exploring Machine Learning: A Cataloging Experiment at the Library of Congress, PCC Operations Committee Meeting, 2 May. Available at: https://www.loc.gov/aba/pcc/documents/OpCo-2024/Potter-Saccucci-Machine-Learning.pdf [Accessed: 24 May 2025]

Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Benigo, S. and Farajtabar, M. (2025) The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. Available at: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf [Accessed: 10 June 2025]

Strubell, E., Ganesh, A., and McCallum, A. (2020) Energy and policy considerations for modern deep learning research. Proceedings of the AAAI conference on artificial intelligence ,34(9), pp. 13693-13696. Available at: https://doi.org/10.1609/aaai.v34i09.7123

Suominen, O., Inkinen, J., Virolainen, T., Fürneisen, M., Kinoshita, B. P., Veldhoen, S., Sjöberg, M., Zumstein, P., Neatherway, R. and Lehtinen, M. (2023) Annif (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.8262313

Taniguchi, S. (2024) Creating and Evaluating MARC 21 Bibliographic Records Using ChatGPT. Cataloging & Classification Quarterly, 62(5), pp. 527-546. Available at: https://doi.org/10.1080/01639374.2024.2394513

Uhlmann, S. and Grote, C. (2021) Automatic subject indexing with Annif at the German National Library (DNB). Semantic Web in Libraries, Online, 29 November – 3 December. Available at: https://swib.org/swib21/slides/03-02-uhlmann.pdf [Accessed: 24 May 2025]

UN Environment Programme (2024) AI has an environmental problem. Here’s what the world can do about that. Available at: https://www.unep.org/news-and-stories/story/ai-has-environmental-problem-heres-what-world-can-do-about [Accessed: 24 May 2025]

United States Copyright Office (2025) Copyright and artificial intelligence Part 3: Generative AI Training. Pre-publication version. Washington: United States Copyright Office. Available at: https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf [Accessed: 1 June 2025]

Wang, P., Zhang, L.-Y., Tzachor, A. and Chen, W.-Q. (2024) E-waste challenges of generative artificial Intelligence. Nature Computational Science, 4, pp. 818-823. Available at: https://doi.org/10.1038/s43588-024-00712-6

Weinryb-Grohsgal, L., Potter, A. and Saccucci, C. (2024) Could Artificial Intelligence Help Catalog Thousands of Digital Library Books? An Interview with Abigail Potter and Caroline Saccucci. The Signal, 19 November. Available at: https://blogs.loc.gov/thesignal/2024/11/could-artificial-intelligence-help-catalog-thousands-of-digital-library-books-an-interview-with-abigail-potter-and-caroline-saccucci/ [Accessed: 24 May 2025]

Wu, J., Gan, W., Chen, Z., Wan, S. and Yu, P. S. (2023) Multimodal Large Language Models: A Survey. 2023 IEEE International Conference on Big Data, Sorrento, 15 – 18 December. Available at: https://doi.org/10.1109/BigData59044.2023.10386743

York, E. and Hanegbi, D. (2024) Metadata Enrichment using AI First Glance at Research and Findings. Available at: https://knowledge.exlibrisgroup.com/@api/deki/files/166843/AI_enrichment_-_March_27.pdf?revision=1 [Accessed: 24 May 2025]

Zewe, A. (2025) Explained: Generative AI’s environmental impact. MIT News, 17 January. Available at: https://news.mit.edu/2025/explained-generative-ai-environmental-impact-0117 [Accessed: 24 May 2025]

Zhang, M. (2024) Data Center Water Usage: A Comprehensive Guide. Available at: https://dgtlinfra.com/data-center-water-usage/ [Accessed 24 May 2025]

Downloads

Published

2025-06-17

Issue

Section

Articles