Details zu Publikationen

Perspectives on Large Language Models for Relevance Judgment

verfasst von
Guglielmo Faggioli, Charles L.A. Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, Henning Wachsmuth, Laura Dietz
Abstract

When asked, large language models∼(LLMs) like ChatGPT claim that they can assist with relevance judgments but it is not clear whether automated judgments can reliably be used in evaluations of retrieval systems. In this perspectives paper, we discuss possible ways for∼LLMs to support relevance judgments along with concerns and issues that arise. We devise a human - machine collaboration spectrum that allows to categorize different relevance judgment strategies, based on how much humans rely on machines. For the extreme point of 'fully automated judgments', we further include a pilot experiment on whether LLM-based relevance judgments correlate with judgments from trained human assessors. We conclude the paper by providing opposing perspectives for and against the use of∼LLMs for automatic relevance judgments, and a compromise perspective, informed by our analyses of the literature, our preliminary experimental evidence, and our experience as IR∼researchers.

Organisationseinheit(en)
Fachgebiet Maschinelle Sprachverarbeitung
Externe Organisation(en)
Universität Padua
University of Waterloo
University of Queensland
Friedrich-Schiller-Universität Jena
Spotify
Research Organization of Information and Systems National Institute of Informatics
Universiteit van Amsterdam (UvA)
Universität Leipzig
Bauhaus-Universität Weimar
University of New Hampshire
Typ
Aufsatz in Konferenzband
Seiten
39-50
Anzahl der Seiten
12
Publikationsdatum
09.08.2023
Publikationsstatus
Veröffentlicht
Peer-reviewed
Ja
ASJC Scopus Sachgebiete
Informatik (sonstige), Information systems
Elektronische Version(en)
https://doi.org/10.48550/arXiv.2304.09161 (Zugang: Offen)
https://doi.org/10.1145/3578337.3605136 (Zugang: Offen)