Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets

Xia Chen; Manav Mahan Singh; Philipp Geyer

doi:10.1016/j.knosys.2024.111774

Details

Original language	English
Article number	111774
Number of pages	10
Journal	Knowledge-based systems
Volume	294
Early online date	5 Apr 2024
Publication status	Published - 21 Jun 2024

Abstract

Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.

Keywords

Building engineering, Component-based machine learning, Compositionality, Data utilization, Model organization

ASJC Scopus subject areas

Computer Science(all)
Software
Business, Management and Accounting(all)
Management Information Systems
Decision Sciences(all)
Information Systems and Management
Computer Science(all)
Artificial Intelligence

Cite this

Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets. / Chen, Xia; Singh, Manav Mahan; Geyer, Philipp.
In: Knowledge-based systems, Vol. 294, 111774, 21.06.2024.

Research output: Contribution to journal › Article › Research › peer review

Chen, X, Singh, MM & Geyer, P 2024, 'Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets', Knowledge-based systems, vol. 294, 111774. https://doi.org/10.1016/j.knosys.2024.111774

Chen, X., Singh, M. M., & Geyer, P. (2024). Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets. Knowledge-based systems, 294, Article 111774. https://doi.org/10.1016/j.knosys.2024.111774

Chen X, Singh MM, Geyer P. Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets. Knowledge-based systems. 2024 Jun 21;294:111774. Epub 2024 Apr 5. doi: 10.1016/j.knosys.2024.111774

Chen, Xia ; Singh, Manav Mahan ; Geyer, Philipp. / Utilizing domain knowledge : Robust machine learning for building energy performance prediction with small, inconsistent datasets. In: Knowledge-based systems. 2024 ; Vol. 294.

Download

@article{2aacad45582546b992e538b0dad47ca9,

title = "Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets",

abstract = "Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.",

keywords = "Building engineering, Component-based machine learning, Compositionality, Data utilization, Model organization",

author = "Xia Chen and Singh, {Manav Mahan} and Philipp Geyer",

note = "Funding Information: We acknowledge the German Research Foundation (DFG) support for funding the project under grant GE 1652/3-2 in the Researcher Unit FOR 2363 and under Heisenberg grant GE 1652/4-1 . ",

year = "2024",

month = jun,

day = "21",

doi = "10.1016/j.knosys.2024.111774",

language = "English",

volume = "294",

journal = "Knowledge-based systems",

issn = "0950-7051",

publisher = "Elsevier",

}

Download

TY - JOUR

T1 - Utilizing domain knowledge

T2 - Robust machine learning for building energy performance prediction with small, inconsistent datasets

AU - Chen, Xia

AU - Singh, Manav Mahan

AU - Geyer, Philipp

N1 - Funding Information: We acknowledge the German Research Foundation (DFG) support for funding the project under grant GE 1652/3-2 in the Researcher Unit FOR 2363 and under Heisenberg grant GE 1652/4-1 .

PY - 2024/6/21

Y1 - 2024/6/21

N2 - Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.

AB - Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.

KW - Building engineering

KW - Component-based machine learning

KW - Compositionality

KW - Data utilization

KW - Model organization

UR - http://www.scopus.com/inward/record.url?scp=85189755232&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2024.111774

DO - 10.1016/j.knosys.2024.111774

M3 - Article

AN - SCOPUS:85189755232

VL - 294

JO - Knowledge-based systems

JF - Knowledge-based systems

SN - 0950-7051

M1 - 111774

ER -

Research@Leibniz University

Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this