Argument Search

Assessing Argument Relevance

verfasst von: Martin Potthast, Lukas Gienapp, Florian Euchner, Nick Heilenkötter, Nico Weidmann, Henning Wachsmuth, Benno Stein, Matthias Hagen
Abstract: We report on the first user study on assessing argument relevance. Based on a search among more than 300,000 arguments, four standard retrieval models are compared on 40 topics for 20 controversial issues: every issue has one topic with a biased stance and another neutral one. Following TREC, the top results of the different models on a topic were pooled and relevance-judged by one assessor per topic. The assessors also judged the arguments' rhetorical, logical, and dialectical quality, the results of which were cross-referenced with the relevance judgments. Furthermore, the assessors were asked for their personal opinion, and whether it matched the predefined stance of a topic. Among other results, we find that Terrier's implementations of DirichletLM and DPH are on par, significantly outperforming TFIDF and BM25. The judgments of relevance and quality hardly correlate, giving rise to a more diverse set of ranking criteria than relevance alone. We did not measure a significant bias of assessors when their stance is at odds with a topic's stance.
Externe Organisation(en): Universität Leipzig
Universität Stuttgart
Universität Bremen
Karlsruher Institut für Technologie (KIT)
Universität Paderborn
Bauhaus-Universität Weimar
Martin-Luther-Universität Halle-Wittenberg
Typ: Aufsatz in Konferenzband
Seiten: 1117-1120
Anzahl der Seiten: 4
Publikationsdatum: 18.07.2019
Publikationsstatus: Veröffentlicht
Peer-reviewed: Ja
ASJC Scopus Sachgebiete: Information systems, Angewandte Mathematik, Software
Elektronische Version(en): https://doi.org/10.1145/3331184.3331327 (Zugang: Geschlossen)

BibTeX

@inproceedings{d7ba1e8958d44449adbc5c5df48d7072,
title = "Argument Search: Assessing Argument Relevance",
abstract = "We report on the first user study on assessing argument relevance. Based on a search among more than 300,000 arguments, four standard retrieval models are compared on 40 topics for 20 controversial issues: every issue has one topic with a biased stance and another neutral one. Following TREC, the top results of the different models on a topic were pooled and relevance-judged by one assessor per topic. The assessors also judged the arguments' rhetorical, logical, and dialectical quality, the results of which were cross-referenced with the relevance judgments. Furthermore, the assessors were asked for their personal opinion, and whether it matched the predefined stance of a topic. Among other results, we find that Terrier's implementations of DirichletLM and DPH are on par, significantly outperforming TFIDF and BM25. The judgments of relevance and quality hardly correlate, giving rise to a more diverse set of ranking criteria than relevance alone. We did not measure a significant bias of assessors when their stance is at odds with a topic's stance.",
author = "Martin Potthast and Lukas Gienapp and Florian Euchner and Nick Heilenk{\"o}tter and Nico Weidmann and Henning Wachsmuth and Benno Stein and Matthias Hagen",
year = "2019",
month = jul,
day = "18",
doi = "10.1145/3331184.3331327",
language = "English",
isbn = "9781450361729",
pages = "1117--1120",
booktitle = "SIGIR 2019",
publisher = "Association for Computing Machinery, Inc",
note = "42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019 ; Conference date: 21-07-2019 Through 25-07-2019",
}

Details zu Publikationen

Argument Search

Assessing Argument Relevance

Gefördert vom