Details
Originalsprache | Englisch |
---|---|
Aufsatznummer | 111774 |
Seitenumfang | 10 |
Fachzeitschrift | Knowledge-based systems |
Jahrgang | 294 |
Frühes Online-Datum | 5 Apr. 2024 |
Publikationsstatus | Veröffentlicht - 21 Juni 2024 |
Abstract
Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Software
- Betriebswirtschaft, Management und Rechnungswesen (insg.)
- Management-Informationssysteme
- Entscheidungswissenschaften (insg.)
- Informationssysteme und -management
- Informatik (insg.)
- Artificial intelligence
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: Knowledge-based systems, Jahrgang 294, 111774, 21.06.2024.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - Utilizing domain knowledge
T2 - Robust machine learning for building energy performance prediction with small, inconsistent datasets
AU - Chen, Xia
AU - Singh, Manav Mahan
AU - Geyer, Philipp
N1 - Funding Information: We acknowledge the German Research Foundation (DFG) support for funding the project under grant GE 1652/3-2 in the Researcher Unit FOR 2363 and under Heisenberg grant GE 1652/4-1 .
PY - 2024/6/21
Y1 - 2024/6/21
N2 - Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.
AB - Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.
KW - Building engineering
KW - Component-based machine learning
KW - Compositionality
KW - Data utilization
KW - Model organization
UR - http://www.scopus.com/inward/record.url?scp=85189755232&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2024.111774
DO - 10.1016/j.knosys.2024.111774
M3 - Article
AN - SCOPUS:85189755232
VL - 294
JO - Knowledge-based systems
JF - Knowledge-based systems
SN - 0950-7051
M1 - 111774
ER -