Machine learning for early stage building energy prediction: Increment and enrichment

Manav Mahan Singh; Sundaravelpandian Singaravel; Philipp Florian Geyer

doi:10.1016/j.apenergy.2021.117787

Details

Original language	English
Article number	117787
Journal	Applied energy
Volume	304
Early online date	10 Sept 2021
Publication status	Published - 15 Dec 2021
Externally published	Yes

Abstract

Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.

Keywords

Data analysis, Data collection, Energy performance, Generalisation error, Training data

ASJC Scopus subject areas

Engineering(all)
Building and Construction
Engineering(all)
Mechanical Engineering
Energy(all)
General Energy
Environmental Science(all)
Management, Monitoring, Policy and Law

Cite this

Machine learning for early stage building energy prediction: Increment and enrichment. / Singh, Manav Mahan; Singaravel, Sundaravelpandian; Geyer, Philipp Florian.
In: Applied energy, Vol. 304, 117787, 15.12.2021.

Research output: Contribution to journal › Article › Research › peer review

Singh, MM, Singaravel, S & Geyer, PF 2021, 'Machine learning for early stage building energy prediction: Increment and enrichment', Applied energy, vol. 304, 117787. https://doi.org/10.1016/j.apenergy.2021.117787

Singh, M. M., Singaravel, S., & Geyer, P. F. (2021). Machine learning for early stage building energy prediction: Increment and enrichment. Applied energy, 304, Article 117787. https://doi.org/10.1016/j.apenergy.2021.117787

Singh MM, Singaravel S, Geyer PF. Machine learning for early stage building energy prediction: Increment and enrichment. Applied energy. 2021 Dec 15;304:117787. Epub 2021 Sept 10. doi: 10.1016/j.apenergy.2021.117787

Singh, Manav Mahan ; Singaravel, Sundaravelpandian ; Geyer, Philipp Florian. / Machine learning for early stage building energy prediction : Increment and enrichment. In: Applied energy. 2021 ; Vol. 304.

Download

@article{d4d8c291105a48a8ad98375a2e01795b,

title = "Machine learning for early stage building energy prediction: Increment and enrichment",

abstract = "Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.",

keywords = "Data analysis, Data collection, Energy performance, Generalisation error, Training data",

author = "Singh, {Manav Mahan} and Sundaravelpandian Singaravel and Geyer, {Philipp Florian}",

note = "Funding Information: The authors want to acknowledge the support of Deutsche Forschungsgemeinschaft (DFG), Germany, for funding the research through the grant GE1652/3-1 within the research unit FOR 2363. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government – department EWI, Belgium. We would like to express our sincere gratitude to the Institute of Energy Efficient and Sustainable Design and Building, Technical University, Munich (TUM) and Ferdinand Tausendpfund GmbH & Co. KG for providing energy consumption and design data of Tausendpfund building. The datasets related to this article can be found at http://dx.doi.org/10.17632/9jvh8ckjbw.3, an open-source online data repository hosted at Mendeley Data [59].",

year = "2021",

month = dec,

day = "15",

doi = "10.1016/j.apenergy.2021.117787",

language = "English",

volume = "304",

journal = "Applied energy",

issn = "0306-2619",

publisher = "Elsevier BV",

}

Download

TY - JOUR

T1 - Machine learning for early stage building energy prediction

T2 - Increment and enrichment

AU - Singh, Manav Mahan

AU - Singaravel, Sundaravelpandian

AU - Geyer, Philipp Florian

N1 - Funding Information: The authors want to acknowledge the support of Deutsche Forschungsgemeinschaft (DFG), Germany, for funding the research through the grant GE1652/3-1 within the research unit FOR 2363. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government – department EWI, Belgium. We would like to express our sincere gratitude to the Institute of Energy Efficient and Sustainable Design and Building, Technical University, Munich (TUM) and Ferdinand Tausendpfund GmbH & Co. KG for providing energy consumption and design data of Tausendpfund building. The datasets related to this article can be found at http://dx.doi.org/10.17632/9jvh8ckjbw.3, an open-source online data repository hosted at Mendeley Data [59].

PY - 2021/12/15

Y1 - 2021/12/15

N2 - Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.

AB - Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.

KW - Data analysis

KW - Data collection

KW - Energy performance

KW - Generalisation error

KW - Training data

UR - http://www.scopus.com/inward/record.url?scp=85115664490&partnerID=8YFLogxK

U2 - 10.1016/j.apenergy.2021.117787

DO - 10.1016/j.apenergy.2021.117787

M3 - Article

AN - SCOPUS:85115664490

VL - 304

JO - Applied energy

JF - Applied energy

SN - 0306-2619

M1 - 117787

ER -

Research@Leibniz University

Machine learning for early stage building energy prediction: Increment and enrichment

Authors

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this