Details
Original language | English |
---|---|
Article number | 8 |
Pages (from-to) | 1684-1696 |
Number of pages | 13 |
Journal | Contemporary Mathematics |
Volume | 15 |
Issue number | 8 |
Early online date | 22 Jun 2022 |
Publication status | Published - 2022 |
Event | 48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia Duration: 5 Sept 2022 → 9 Sept 2022 |
Abstract
A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000x more false positives and leads to over 60x faster table discovery in comparison to state-of-the-art.
ASJC Scopus subject areas
- Mathematics(all)
- General Mathematics
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Contemporary Mathematics, Vol. 15, No. 8, 8, 2022, p. 1684-1696.
Research output: Contribution to journal › Conference article › Research › peer review
}
TY - JOUR
T1 - MATE
T2 - 48th International Conference on Very Large Data Bases, VLDB 2022
AU - Esmailoghli, Mahdi
AU - Quiané-Ruiz, Jorge Arnulfo
AU - Abedjan, Ziawasch
N1 - Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445 and the German Ministry for Education and Research as BIFOLD — “Berlin Institute for the Foundations of Learning and Data” (01IS18025A and 01IS18037A).
PY - 2022
Y1 - 2022
N2 - A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000x more false positives and leads to over 60x faster table discovery in comparison to state-of-the-art.
AB - A core operation in data discovery is to find joinable tables for a given table. Real-world tables include both unary and n-ary join keys. However, existing table discovery systems are optimized for unary joins and are ineffective and slow in the existence of n-ary keys. In this paper, we introduce Mate, a table discovery system that leverages a novel hash-based index that enables n-ary join discovery through a space-efficient super key. We design a filtering layer that uses a novel hash, Xash. This hash function encodes the syntactic features of all column values and aggregates them into a super key, which allows the system to efficiently prune tables with non-joinable rows. Our join discovery system is able to prune up to 1000x more false positives and leads to over 60x faster table discovery in comparison to state-of-the-art.
UR - http://www.scopus.com/inward/record.url?scp=85142525351&partnerID=8YFLogxK
U2 - 10.14778/3529337.3529353
DO - 10.14778/3529337.3529353
M3 - Conference article
AN - SCOPUS:85142525351
VL - 15
SP - 1684
EP - 1696
JO - Contemporary Mathematics
JF - Contemporary Mathematics
SN - 0271-4132
IS - 8
M1 - 8
Y2 - 5 September 2022 through 9 September 2022
ER -