Details zu Publikationen

Learning Efficient Information Extraction on Heterogeneous Texts

verfasst von
Henning Wachsmuth, Benno Stein, Gregor Engels
Abstract

From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.

Externe Organisation(en)
Universität Paderborn
Bauhaus-Universität Weimar
Typ
Aufsatz in Konferenzband
Seiten
534-542
Anzahl der Seiten
9
Publikationsdatum
10.2013
Publikationsstatus
Veröffentlicht
ASJC Scopus Sachgebiete
Artificial intelligence, Software
Elektronische Version(en)
https://aclanthology.org/I13-1061 (Zugang: Offen)