Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | SIGMOD '23 |
Untertitel | Companion of the 2023 International Conference on Management of Data |
Herausgeber (Verlag) | Association for Computing Machinery (ACM) |
Seiten | 119-122 |
Seitenumfang | 4 |
ISBN (elektronisch) | 9781450395076 |
Publikationsstatus | Veröffentlicht - 5 Juni 2023 |
Veranstaltung | 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 - Seattle, USA / Vereinigte Staaten Dauer: 18 Juni 2023 → 23 Juni 2023 |
Publikationsreihe
Name | Proceedings of the ACM SIGMOD International Conference on Management of Data |
---|---|
ISSN (Print) | 0730-8078 |
Abstract
One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Software
- Informatik (insg.)
- Information systems
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
SIGMOD '23: Companion of the 2023 International Conference on Management of Data. Association for Computing Machinery (ACM), 2023. S. 119-122 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - Demonstrating MATE and COCOA for Data Discovery
AU - Becktepe, Jannis
AU - Esmailoghli, Mahdi
AU - Koch, Maximilian
AU - Abedjan, Ziawasch
N1 - Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445.
PY - 2023/6/5
Y1 - 2023/6/5
N2 - One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.
AB - One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.
KW - data discovery for ML
KW - data integration
KW - index structures
UR - http://www.scopus.com/inward/record.url?scp=85162848351&partnerID=8YFLogxK
U2 - 10.1145/3555041.3589716
DO - 10.1145/3555041.3589716
M3 - Conference contribution
AN - SCOPUS:85162848351
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 119
EP - 122
BT - SIGMOD '23
PB - Association for Computing Machinery (ACM)
T2 - 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023
Y2 - 18 June 2023 through 23 June 2023
ER -