Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets

Xia Chen; Manav Mahan Singh; Philipp Geyer

doi:10.1016/j.knosys.2024.111774

Details

Originalsprache	Englisch
Aufsatznummer	111774
Seitenumfang	10
Fachzeitschrift	Knowledge-based systems
Jahrgang	294
Frühes Online-Datum	5 Apr. 2024
Publikationsstatus	Veröffentlicht - 21 Juni 2024

Abstract

Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Betriebswirtschaft, Management und Rechnungswesen (insg.)
Management-Informationssysteme
Entscheidungswissenschaften (insg.)
Informationssysteme und -management
Informatik (insg.)
Artificial intelligence

Zitieren

Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets. / Chen, Xia; Singh, Manav Mahan; Geyer, Philipp.
in: Knowledge-based systems, Jahrgang 294, 111774, 21.06.2024.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Chen, X, Singh, MM & Geyer, P 2024, 'Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets', Knowledge-based systems, Jg. 294, 111774. https://doi.org/10.1016/j.knosys.2024.111774

Chen, X., Singh, M. M., & Geyer, P. (2024). Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets. Knowledge-based systems, 294, Artikel 111774. https://doi.org/10.1016/j.knosys.2024.111774

Chen X, Singh MM, Geyer P. Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets. Knowledge-based systems. 2024 Jun 21;294:111774. Epub 2024 Apr 5. doi: 10.1016/j.knosys.2024.111774

Chen, Xia ; Singh, Manav Mahan ; Geyer, Philipp. / Utilizing domain knowledge : Robust machine learning for building energy performance prediction with small, inconsistent datasets. in: Knowledge-based systems. 2024 ; Jahrgang 294.

Download

@article{2aacad45582546b992e538b0dad47ca9,

title = "Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets",

abstract = "Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.",

keywords = "Building engineering, Component-based machine learning, Compositionality, Data utilization, Model organization",

author = "Xia Chen and Singh, {Manav Mahan} and Philipp Geyer",

note = "Funding Information: We acknowledge the German Research Foundation (DFG) support for funding the project under grant GE 1652/3-2 in the Researcher Unit FOR 2363 and under Heisenberg grant GE 1652/4-1 . ",

year = "2024",

month = jun,

day = "21",

doi = "10.1016/j.knosys.2024.111774",

language = "English",

volume = "294",

journal = "Knowledge-based systems",

issn = "0950-7051",

publisher = "Elsevier",

}

Download

TY - JOUR

T1 - Utilizing domain knowledge

T2 - Robust machine learning for building energy performance prediction with small, inconsistent datasets

AU - Chen, Xia

AU - Singh, Manav Mahan

AU - Geyer, Philipp

N1 - Funding Information: We acknowledge the German Research Foundation (DFG) support for funding the project under grant GE 1652/3-2 in the Researcher Unit FOR 2363 and under Heisenberg grant GE 1652/4-1 .

PY - 2024/6/21

Y1 - 2024/6/21

N2 - Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.

AB - Machine learning (ML) applications often require large datasets, a requirement that can pose a major challenge in fields where data is sparse or inconsistent. To address this issue, we propose a novel approach that combines prior knowledge with data-driven methods to significantly reduce data dependency. This study represents a disentangled system compositionality knowledge by the method of Component-Based Machine Learning (CBML) in the context of energy-efficient building engineering. In this way, CBML incorporates semantic domain knowledge within the structure of a data-driven model. To understand the advantage of CBML, we conducted a case experiment to assess the effectiveness of this knowledge-encoded ML approach in scenarios with sparse data input (1 % - 0.0125 % sampling rate) and several typical ML methods. Our findings reveal three key advantages of this approach over traditional ML methods: 1) It significantly improves the robustness of ML models when dealing with extremely small and inconsistent datasets; 2) It allows for efficient utilization of data from diverse record collections; 3) It can handle incomplete data while maintaining high interpretability and reducing training time. These features offer a promising solution to the challenges associated with deploying data-intensive methods and contribute to more efficient real-world data usage. Additionally, we outline four essential prerequisites to ensure the successful integration of prior knowledge and ML generalization in target scenarios and open-sourced the code and dataset for community reproduction.

KW - Building engineering

KW - Component-based machine learning

KW - Compositionality

KW - Data utilization

KW - Model organization

UR - http://www.scopus.com/inward/record.url?scp=85189755232&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2024.111774

DO - 10.1016/j.knosys.2024.111774

M3 - Article

AN - SCOPUS:85189755232

VL - 294

JO - Knowledge-based systems

JF - Knowledge-based systems

SN - 0950-7051

M1 - 111774

ER -

Research@Leibniz University

Utilizing domain knowledge: Robust machine learning for building energy performance prediction with small, inconsistent datasets

Autoren

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren