Details
Original language | English |
---|---|
Title of host publication | Web Information Systems Engineering |
Subtitle of host publication | WISE 2024 - 25th International Conference, Proceedings |
Editors | Mahmoud Barhamgi, Hua Wang, Xin Wang |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 467-483 |
Number of pages | 17 |
ISBN (electronic) | 978-981-96-0567-5 |
ISBN (print) | 9789819605668 |
Publication status | Published - 2025 |
Event | 25th International Conference on Web Information Systems Engineering, WISE 2024 - Doha, Qatar Duration: 2 Dec 2024 → 5 Dec 2024 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 15437 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (electronic) | 1611-3349 |
Abstract
Causal inference is used in various domains such as healthcare, economics, and political science to infer causal effects from observational data where each unit (entity) has different properties. Existing approaches often assume data completeness, and thus exclude all units with incomplete data when performing causal inference, which can lead to inaccurate causal estimates. In addition, existing approaches follow the Close World Assumption, where facts not present in the database are assumed to be false, limiting the ability to reason under data incompleteness assumption. Knowledge graphs (KGs) are data structures that represent data in semi-structured formats and model the meaning of data via ontologies. We propose a method, SemMatch, based on KGs to enhance causal inference under a data incompleteness assumption.SemMatch relies on a semantic reasoning process specified by a set of logical rules over KGs, to infer implicit facts and partially address data incompleteness. Then, SemMatch applies machine learning methods to estimate the importance of properties. Finally, SemMatch employs causal estimation methods that consider property importance, facilitating causal reasoning across units with incomplete data to determine the causal effect. We evaluate SemMatch on synthetic datasets, and demonstrate that it achieves a lower mean absolute error (MAE) and square root of precision in estimation of heterogeneous effect (PEHE) in causal effect estimation compared to existing state-of-the-art methods. Observed results suggest that accounting for semantic reasoning and including units with incomplete data improves causal estimation accuracy.
Keywords
- Causal Inference, Knowledge Graph, Matching, Semantics
ASJC Scopus subject areas
- Mathematics(all)
- Theoretical Computer Science
- Computer Science(all)
- General Computer Science
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Web Information Systems Engineering : WISE 2024 - 25th International Conference, Proceedings. ed. / Mahmoud Barhamgi; Hua Wang; Xin Wang. Springer Science and Business Media Deutschland GmbH, 2025. p. 467-483 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15437 LNCS).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - SemMatch
T2 - 25th International Conference on Web Information Systems Engineering, WISE 2024
AU - Huang, Hao
AU - Vidal, Maria Esther
N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Causal inference is used in various domains such as healthcare, economics, and political science to infer causal effects from observational data where each unit (entity) has different properties. Existing approaches often assume data completeness, and thus exclude all units with incomplete data when performing causal inference, which can lead to inaccurate causal estimates. In addition, existing approaches follow the Close World Assumption, where facts not present in the database are assumed to be false, limiting the ability to reason under data incompleteness assumption. Knowledge graphs (KGs) are data structures that represent data in semi-structured formats and model the meaning of data via ontologies. We propose a method, SemMatch, based on KGs to enhance causal inference under a data incompleteness assumption.SemMatch relies on a semantic reasoning process specified by a set of logical rules over KGs, to infer implicit facts and partially address data incompleteness. Then, SemMatch applies machine learning methods to estimate the importance of properties. Finally, SemMatch employs causal estimation methods that consider property importance, facilitating causal reasoning across units with incomplete data to determine the causal effect. We evaluate SemMatch on synthetic datasets, and demonstrate that it achieves a lower mean absolute error (MAE) and square root of precision in estimation of heterogeneous effect (PEHE) in causal effect estimation compared to existing state-of-the-art methods. Observed results suggest that accounting for semantic reasoning and including units with incomplete data improves causal estimation accuracy.
AB - Causal inference is used in various domains such as healthcare, economics, and political science to infer causal effects from observational data where each unit (entity) has different properties. Existing approaches often assume data completeness, and thus exclude all units with incomplete data when performing causal inference, which can lead to inaccurate causal estimates. In addition, existing approaches follow the Close World Assumption, where facts not present in the database are assumed to be false, limiting the ability to reason under data incompleteness assumption. Knowledge graphs (KGs) are data structures that represent data in semi-structured formats and model the meaning of data via ontologies. We propose a method, SemMatch, based on KGs to enhance causal inference under a data incompleteness assumption.SemMatch relies on a semantic reasoning process specified by a set of logical rules over KGs, to infer implicit facts and partially address data incompleteness. Then, SemMatch applies machine learning methods to estimate the importance of properties. Finally, SemMatch employs causal estimation methods that consider property importance, facilitating causal reasoning across units with incomplete data to determine the causal effect. We evaluate SemMatch on synthetic datasets, and demonstrate that it achieves a lower mean absolute error (MAE) and square root of precision in estimation of heterogeneous effect (PEHE) in causal effect estimation compared to existing state-of-the-art methods. Observed results suggest that accounting for semantic reasoning and including units with incomplete data improves causal estimation accuracy.
KW - Causal Inference
KW - Knowledge Graph
KW - Matching
KW - Semantics
UR - http://www.scopus.com/inward/record.url?scp=85211921518&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-0567-5_33
DO - 10.1007/978-981-96-0567-5_33
M3 - Conference contribution
AN - SCOPUS:85211921518
SN - 9789819605668
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 467
EP - 483
BT - Web Information Systems Engineering
A2 - Barhamgi, Mahmoud
A2 - Wang, Hua
A2 - Wang, Xin
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 2 December 2024 through 5 December 2024
ER -