Constructing Efficient Information Extraction Pipelines

verfasst von: Henning Wachsmuth, Benno Stein, Gregor Engels
Abstract: Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.
Externe Organisation(en): Universität Paderborn
Bauhaus-Universität Weimar
Typ: Aufsatz in Konferenzband
Seiten: 2237-2240
Anzahl der Seiten: 4
Publikationsdatum: 10.2011
Publikationsstatus: Veröffentlicht
ASJC Scopus Sachgebiete: Allgemeine Entscheidungswissenschaften, Allgemeine Unternehmensführung und Buchhaltung
Elektronische Version(en): https://doi.org/10.1145/2063576.2063935 (Zugang: Geschlossen)

BibTeX

@inproceedings{546437cc6b654257b4d4c722f3a093ea,
title = "Constructing Efficient Information Extraction Pipelines",
abstract = "Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much {"}efficiency potential{"} depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.",
keywords = "information extraction, run-time efficiency",
author = "Henning Wachsmuth and Benno Stein and Gregor Engels",
year = "2011",
month = oct,
doi = "10.1145/2063576.2063935",
language = "English",
isbn = "9781450307178",
pages = "2237--2240",
booktitle = "CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management",
publisher = "Association for Computing Machinery (ACM)",
address = "United States",
note = "20th ACM Conference on Information and Knowledge Management, CIKM'11 ; Conference date: 24-10-2011 Through 28-10-2011",
}

Details zu Publikationen

Constructing Efficient Information Extraction Pipelines

Gefördert vom