Demonstrating MATE and COCOA for Data Discovery

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Jannis Becktepe
  • Mahdi Esmailoghli
  • Maximilian Koch
  • Ziawasch Abedjan

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publicationSIGMOD '23
Subtitle of host publicationCompanion of the 2023 International Conference on Management of Data
PublisherAssociation for Computing Machinery (ACM)
Pages119-122
Number of pages4
ISBN (electronic)9781450395076
Publication statusPublished - 5 Jun 2023
Event2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 - Seattle, United States
Duration: 18 Jun 202323 Jun 2023

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Abstract

One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.

Keywords

    data discovery for ML, data integration, index structures

ASJC Scopus subject areas

Cite this

Demonstrating MATE and COCOA for Data Discovery. / Becktepe, Jannis; Esmailoghli, Mahdi; Koch, Maximilian et al.
SIGMOD '23: Companion of the 2023 International Conference on Management of Data. Association for Computing Machinery (ACM), 2023. p. 119-122 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Becktepe, J, Esmailoghli, M, Koch, M & Abedjan, Z 2023, Demonstrating MATE and COCOA for Data Discovery. in SIGMOD '23: Companion of the 2023 International Conference on Management of Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery (ACM), pp. 119-122, 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023, Seattle, United States, 18 Jun 2023. https://doi.org/10.1145/3555041.3589716
Becktepe, J., Esmailoghli, M., Koch, M., & Abedjan, Z. (2023). Demonstrating MATE and COCOA for Data Discovery. In SIGMOD '23: Companion of the 2023 International Conference on Management of Data (pp. 119-122). (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery (ACM). https://doi.org/10.1145/3555041.3589716
Becktepe J, Esmailoghli M, Koch M, Abedjan Z. Demonstrating MATE and COCOA for Data Discovery. In SIGMOD '23: Companion of the 2023 International Conference on Management of Data. Association for Computing Machinery (ACM). 2023. p. 119-122. (Proceedings of the ACM SIGMOD International Conference on Management of Data). doi: 10.1145/3555041.3589716
Becktepe, Jannis ; Esmailoghli, Mahdi ; Koch, Maximilian et al. / Demonstrating MATE and COCOA for Data Discovery. SIGMOD '23: Companion of the 2023 International Conference on Management of Data. Association for Computing Machinery (ACM), 2023. pp. 119-122 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
Download
@inproceedings{a867aa53bde74647b0794c8dde61bca5,
title = "Demonstrating MATE and COCOA for Data Discovery",
abstract = "One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.",
keywords = "data discovery for ML, data integration, index structures",
author = "Jannis Becktepe and Mahdi Esmailoghli and Maximilian Koch and Ziawasch Abedjan",
note = "Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445.; 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 ; Conference date: 18-06-2023 Through 23-06-2023",
year = "2023",
month = jun,
day = "5",
doi = "10.1145/3555041.3589716",
language = "English",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
publisher = "Association for Computing Machinery (ACM)",
pages = "119--122",
booktitle = "SIGMOD '23",
address = "United States",

}

Download

TY - GEN

T1 - Demonstrating MATE and COCOA for Data Discovery

AU - Becktepe, Jannis

AU - Esmailoghli, Mahdi

AU - Koch, Maximilian

AU - Abedjan, Ziawasch

N1 - Funding Information: This project has been supported by the German Research Foundation (DFG) under grant agreement 387872445.

PY - 2023/6/5

Y1 - 2023/6/5

N2 - One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.

AB - One of the common use cases for data discovery is to enrich a given table with additional columns from related tables inside a data lake. We have recently introduced MATE and COCOA, two systems for joinability discovery and correlation calculation, respectively. By leveraging two novel index structures, a hash-based Super Key Index, and an Order Index, our system is capable of efficiently identifying tables that join on multiple columns and contain relevant features. We show how the data exploration and enrichment process benefits from our index structures by demonstrating MaCo, a unified system on top of open web and large table corpora.

KW - data discovery for ML

KW - data integration

KW - index structures

UR - http://www.scopus.com/inward/record.url?scp=85162848351&partnerID=8YFLogxK

U2 - 10.1145/3555041.3589716

DO - 10.1145/3555041.3589716

M3 - Conference contribution

AN - SCOPUS:85162848351

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 119

EP - 122

BT - SIGMOD '23

PB - Association for Computing Machinery (ACM)

T2 - 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023

Y2 - 18 June 2023 through 23 June 2023

ER -