MATE: Multi-Attribute Table Extraction

Publikation: Beitrag in FachzeitschriftKonferenzaufsatz in FachzeitschriftForschungPeer-Review

Autoren

  • Mahdi Esmailoghli
  • Jorge Arnulfo Quiané-Ruiz
  • Ziawasch Abedjan

Externe Organisationen

  • Technische Universität Berlin
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Aufsatznummer8
Seiten (von - bis)1684-1696
Seitenumfang13
FachzeitschriftContemporary Mathematics
Jahrgang15
Ausgabenummer8
Frühes Online-Datum22 Juni 2022
PublikationsstatusVeröffentlicht - 2022
Veranstaltung48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australien
Dauer: 5 Sept. 20229 Sept. 2022

Abstract

A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000x more false positives and leads to over 60x faster table discovery in comparison to state-of-the-art.

ASJC Scopus Sachgebiete

Zitieren

MATE: Multi-Attribute Table Extraction. / Esmailoghli, Mahdi; Quiané-Ruiz, Jorge Arnulfo; Abedjan, Ziawasch.
in: Contemporary Mathematics, Jahrgang 15, Nr. 8, 8, 2022, S. 1684-1696.

Publikation: Beitrag in FachzeitschriftKonferenzaufsatz in FachzeitschriftForschungPeer-Review

Esmailoghli, M, Quiané-Ruiz, JA & Abedjan, Z 2022, 'MATE: Multi-Attribute Table Extraction', Contemporary Mathematics, Jg. 15, Nr. 8, 8, S. 1684-1696. https://doi.org/10.14778/3529337.3529353
Esmailoghli, M., Quiané-Ruiz, J. A., & Abedjan, Z. (2022). MATE: Multi-Attribute Table Extraction. Contemporary Mathematics, 15(8), 1684-1696. Artikel 8. https://doi.org/10.14778/3529337.3529353
Esmailoghli M, Quiané-Ruiz JA, Abedjan Z. MATE: Multi-Attribute Table Extraction. Contemporary Mathematics. 2022;15(8):1684-1696. 8. Epub 2022 Jun 22. doi: 10.14778/3529337.3529353
Esmailoghli, Mahdi ; Quiané-Ruiz, Jorge Arnulfo ; Abedjan, Ziawasch. / MATE : Multi-Attribute Table Extraction. in: Contemporary Mathematics. 2022 ; Jahrgang 15, Nr. 8. S. 1684-1696.
Download
@article{9e0ba629a1a14cf29d976c93ebed0277,
title = "MATE: Multi-Attribute Table Extraction",
abstract = "A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000x more false positives and leads to over 60x faster table discovery in comparison to state-of-the-art.",
author = "Mahdi Esmailoghli and Quian{\'e}-Ruiz, {Jorge Arnulfo} and Ziawasch Abedjan",
note = "Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445 and the German Ministry for Education and Research as BIFOLD — “Berlin Institute for the Foundations of Learning and Data” (01IS18025A and 01IS18037A). ; 48th International Conference on Very Large Data Bases, VLDB 2022 ; Conference date: 05-09-2022 Through 09-09-2022",
year = "2022",
doi = "10.14778/3529337.3529353",
language = "English",
volume = "15",
pages = "1684--1696",
number = "8",

}

Download

TY - JOUR

T1 - MATE

T2 - 48th International Conference on Very Large Data Bases, VLDB 2022

AU - Esmailoghli, Mahdi

AU - Quiané-Ruiz, Jorge Arnulfo

AU - Abedjan, Ziawasch

N1 - Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445 and the German Ministry for Education and Research as BIFOLD — “Berlin Institute for the Foundations of Learning and Data” (01IS18025A and 01IS18037A).

PY - 2022

Y1 - 2022

N2 - A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000x more false positives and leads to over 60x faster table discovery in comparison to state-of-the-art.

AB - A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000x more false positives and leads to over 60x faster table discovery in comparison to state-of-the-art.

UR - http://www.scopus.com/inward/record.url?scp=85142525351&partnerID=8YFLogxK

U2 - 10.14778/3529337.3529353

DO - 10.14778/3529337.3529353

M3 - Conference article

AN - SCOPUS:85142525351

VL - 15

SP - 1684

EP - 1696

JO - Contemporary Mathematics

JF - Contemporary Mathematics

SN - 0271-4132

IS - 8

M1 - 8

Y2 - 5 September 2022 through 9 September 2022

ER -