Details
Originalsprache | Englisch |
---|---|
Aufsatznummer | 8 |
Seiten (von - bis) | 1684-1696 |
Seitenumfang | 13 |
Fachzeitschrift | Contemporary Mathematics |
Jahrgang | 15 |
Ausgabenummer | 8 |
Frühes Online-Datum | 22 Juni 2022 |
Publikationsstatus | Veröffentlicht - 2022 |
Veranstaltung | 48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australien Dauer: 5 Sept. 2022 → 9 Sept. 2022 |
Abstract
A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000x more false positives and leads to over 60x faster table discovery in comparison to state-of-the-art.
ASJC Scopus Sachgebiete
- Mathematik (insg.)
- Allgemeine Mathematik
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: Contemporary Mathematics, Jahrgang 15, Nr. 8, 8, 2022, S. 1684-1696.
Publikation: Beitrag in Fachzeitschrift › Konferenzaufsatz in Fachzeitschrift › Forschung › Peer-Review
}
TY - JOUR
T1 - MATE
T2 - 48th International Conference on Very Large Data Bases, VLDB 2022
AU - Esmailoghli, Mahdi
AU - Quiané-Ruiz, Jorge Arnulfo
AU - Abedjan, Ziawasch
N1 - Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445 and the German Ministry for Education and Research as BIFOLD — “Berlin Institute for the Foundations of Learning and Data” (01IS18025A and 01IS18037A).
PY - 2022
Y1 - 2022
N2 - A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000x more false positives and leads to over 60x faster table discovery in comparison to state-of-the-art.
AB - A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000x more false positives and leads to over 60x faster table discovery in comparison to state-of-the-art.
UR - http://www.scopus.com/inward/record.url?scp=85142525351&partnerID=8YFLogxK
U2 - 10.14778/3529337.3529353
DO - 10.14778/3529337.3529353
M3 - Conference article
AN - SCOPUS:85142525351
VL - 15
SP - 1684
EP - 1696
JO - Contemporary Mathematics
JF - Contemporary Mathematics
SN - 0271-4132
IS - 8
M1 - 8
Y2 - 5 September 2022 through 9 September 2022
ER -