Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Ziawasch Abedjan
  • Mohammad Mahdavi

Externe Organisationen

  • Technische Universität Berlin
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)1948-1961
Seitenumfang14
FachzeitschriftProceedings of the VLDB Endowment
Jahrgang13
Ausgabenummer12
PublikationsstatusVeröffentlicht - 1 Juli 2020

Abstract

Traditional error correction solutions leverage handmaid rules or master data to find the correct values. Both are often amiss in real-world scenarios. Therefore, it is desirable to additionally learn corrections from a limited number of example repairs. To effectively generalize example repairs, it is necessary to capture the entire context of each erroneous value. A context comprises the value itself, the co-occurring values inside the same tuple, and all values that define the attribute type. Typically, an error corrector based on any of these context information undergoes an individual process of operations that is not always easy to integrate with other types of error correctors. In this paper, we present a new error correction system, Baran, which provides a unifying abstraction for integrating multiple error corrector models that can be pretrained and updated in the same way. Because of the holistic nature of our approach, we generate more correction candidates than state of the art and, because of the underlying context-aware data representation, we achieve high precision. We show that, by pretraining our models based on Wikipedia revisions, our system can further improve its overall precision and recall. In our experiments, Baran significantly outperforms state-of-the-art error correction systems in terms of effectiveness and human involvement requiring only 20 labeled tuples.

ASJC Scopus Sachgebiete

Zitieren

Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning. / Abedjan, Ziawasch; Mahdavi, Mohammad.
in: Proceedings of the VLDB Endowment, Jahrgang 13, Nr. 12, 01.07.2020, S. 1948-1961.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Abedjan, Z & Mahdavi, M 2020, 'Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning', Proceedings of the VLDB Endowment, Jg. 13, Nr. 12, S. 1948-1961. https://doi.org/10.14778/3407790.3407801
Abedjan, Z., & Mahdavi, M. (2020). Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning. Proceedings of the VLDB Endowment, 13(12), 1948-1961. https://doi.org/10.14778/3407790.3407801
Abedjan Z, Mahdavi M. Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning. Proceedings of the VLDB Endowment. 2020 Jul 1;13(12):1948-1961. doi: 10.14778/3407790.3407801
Abedjan, Ziawasch ; Mahdavi, Mohammad. / Baran : Effective Error Correction via a Unified Context Representation and Transfer Learning. in: Proceedings of the VLDB Endowment. 2020 ; Jahrgang 13, Nr. 12. S. 1948-1961.
Download
@article{bf845affa5c8417fabb99756528cc2d0,
title = "Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning",
abstract = "Traditional error correction solutions leverage handmaid rules or master data to find the correct values. Both are often amiss in real-world scenarios. Therefore, it is desirable to additionally learn corrections from a limited number of example repairs. To effectively generalize example repairs, it is necessary to capture the entire context of each erroneous value. A context comprises the value itself, the co-occurring values inside the same tuple, and all values that define the attribute type. Typically, an error corrector based on any of these context information undergoes an individual process of operations that is not always easy to integrate with other types of error correctors. In this paper, we present a new error correction system, Baran, which provides a unifying abstraction for integrating multiple error corrector models that can be pretrained and updated in the same way. Because of the holistic nature of our approach, we generate more correction candidates than state of the art and, because of the underlying context-aware data representation, we achieve high precision. We show that, by pretraining our models based on Wikipedia revisions, our system can further improve its overall precision and recall. In our experiments, Baran significantly outperforms state-of-the-art error correction systems in terms of effectiveness and human involvement requiring only 20 labeled tuples.",
author = "Ziawasch Abedjan and Mohammad Mahdavi",
note = "Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445.",
year = "2020",
month = jul,
day = "1",
doi = "10.14778/3407790.3407801",
language = "English",
volume = "13",
pages = "1948--1961",
number = "12",

}

Download

TY - JOUR

T1 - Baran

T2 - Effective Error Correction via a Unified Context Representation and Transfer Learning

AU - Abedjan, Ziawasch

AU - Mahdavi, Mohammad

N1 - Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445.

PY - 2020/7/1

Y1 - 2020/7/1

N2 - Traditional error correction solutions leverage handmaid rules or master data to find the correct values. Both are often amiss in real-world scenarios. Therefore, it is desirable to additionally learn corrections from a limited number of example repairs. To effectively generalize example repairs, it is necessary to capture the entire context of each erroneous value. A context comprises the value itself, the co-occurring values inside the same tuple, and all values that define the attribute type. Typically, an error corrector based on any of these context information undergoes an individual process of operations that is not always easy to integrate with other types of error correctors. In this paper, we present a new error correction system, Baran, which provides a unifying abstraction for integrating multiple error corrector models that can be pretrained and updated in the same way. Because of the holistic nature of our approach, we generate more correction candidates than state of the art and, because of the underlying context-aware data representation, we achieve high precision. We show that, by pretraining our models based on Wikipedia revisions, our system can further improve its overall precision and recall. In our experiments, Baran significantly outperforms state-of-the-art error correction systems in terms of effectiveness and human involvement requiring only 20 labeled tuples.

AB - Traditional error correction solutions leverage handmaid rules or master data to find the correct values. Both are often amiss in real-world scenarios. Therefore, it is desirable to additionally learn corrections from a limited number of example repairs. To effectively generalize example repairs, it is necessary to capture the entire context of each erroneous value. A context comprises the value itself, the co-occurring values inside the same tuple, and all values that define the attribute type. Typically, an error corrector based on any of these context information undergoes an individual process of operations that is not always easy to integrate with other types of error correctors. In this paper, we present a new error correction system, Baran, which provides a unifying abstraction for integrating multiple error corrector models that can be pretrained and updated in the same way. Because of the holistic nature of our approach, we generate more correction candidates than state of the art and, because of the underlying context-aware data representation, we achieve high precision. We show that, by pretraining our models based on Wikipedia revisions, our system can further improve its overall precision and recall. In our experiments, Baran significantly outperforms state-of-the-art error correction systems in terms of effectiveness and human involvement requiring only 20 labeled tuples.

UR - http://www.scopus.com/inward/record.url?scp=85091123761&partnerID=8YFLogxK

U2 - 10.14778/3407790.3407801

DO - 10.14778/3407790.3407801

M3 - Article

VL - 13

SP - 1948

EP - 1961

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

IS - 12

ER -