Demonstrating MATE and COCOA for Data Discovery

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Jannis Becktepe
  • Mahdi Esmailoghli
  • Maximilian Koch
  • Ziawasch Abedjan

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksSIGMOD '23
UntertitelCompanion of the 2023 International Conference on Management of Data
Herausgeber (Verlag)Association for Computing Machinery (ACM)
Seiten119-122
Seitenumfang4
ISBN (elektronisch)9781450395076
PublikationsstatusVeröffentlicht - 5 Juni 2023
Veranstaltung2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 - Seattle, USA / Vereinigte Staaten
Dauer: 18 Juni 202323 Juni 2023

Publikationsreihe

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Abstract

One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.

ASJC Scopus Sachgebiete

Zitieren

Demonstrating MATE and COCOA for Data Discovery. / Becktepe, Jannis; Esmailoghli, Mahdi; Koch, Maximilian et al.
SIGMOD '23: Companion of the 2023 International Conference on Management of Data. Association for Computing Machinery (ACM), 2023. S. 119-122 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Becktepe, J, Esmailoghli, M, Koch, M & Abedjan, Z 2023, Demonstrating MATE and COCOA for Data Discovery. in SIGMOD '23: Companion of the 2023 International Conference on Management of Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery (ACM), S. 119-122, 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023, Seattle, USA / Vereinigte Staaten, 18 Juni 2023. https://doi.org/10.1145/3555041.3589716
Becktepe, J., Esmailoghli, M., Koch, M., & Abedjan, Z. (2023). Demonstrating MATE and COCOA for Data Discovery. In SIGMOD '23: Companion of the 2023 International Conference on Management of Data (S. 119-122). (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery (ACM). https://doi.org/10.1145/3555041.3589716
Becktepe J, Esmailoghli M, Koch M, Abedjan Z. Demonstrating MATE and COCOA for Data Discovery. in SIGMOD '23: Companion of the 2023 International Conference on Management of Data. Association for Computing Machinery (ACM). 2023. S. 119-122. (Proceedings of the ACM SIGMOD International Conference on Management of Data). doi: 10.1145/3555041.3589716
Becktepe, Jannis ; Esmailoghli, Mahdi ; Koch, Maximilian et al. / Demonstrating MATE and COCOA for Data Discovery. SIGMOD '23: Companion of the 2023 International Conference on Management of Data. Association for Computing Machinery (ACM), 2023. S. 119-122 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
Download
@inproceedings{a867aa53bde74647b0794c8dde61bca5,
title = "Demonstrating MATE and COCOA for Data Discovery",
abstract = "One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.",
keywords = "data discovery for ML, data integration, index structures",
author = "Jannis Becktepe and Mahdi Esmailoghli and Maximilian Koch and Ziawasch Abedjan",
note = "Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445.; 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 ; Conference date: 18-06-2023 Through 23-06-2023",
year = "2023",
month = jun,
day = "5",
doi = "10.1145/3555041.3589716",
language = "English",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
publisher = "Association for Computing Machinery (ACM)",
pages = "119--122",
booktitle = "SIGMOD '23",
address = "United States",

}

Download

TY - GEN

T1 - Demonstrating MATE and COCOA for Data Discovery

AU - Becktepe, Jannis

AU - Esmailoghli, Mahdi

AU - Koch, Maximilian

AU - Abedjan, Ziawasch

N1 - Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445.

PY - 2023/6/5

Y1 - 2023/6/5

N2 - One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.

AB - One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.

KW - data discovery for ML

KW - data integration

KW - index structures

UR - http://www.scopus.com/inward/record.url?scp=85162848351&partnerID=8YFLogxK

U2 - 10.1145/3555041.3589716

DO - 10.1145/3555041.3589716

M3 - Conference contribution

AN - SCOPUS:85162848351

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 119

EP - 122

BT - SIGMOD '23

PB - Association for Computing Machinery (ACM)

T2 - 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023

Y2 - 18 June 2023 through 23 June 2023

ER -