Machine learning for early stage building energy prediction: Increment and enrichment

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Manav Mahan Singh
  • Sundaravelpandian Singaravel
  • Philipp Florian Geyer

External Research Organisations

  • KU Leuven
  • Technische Universität Berlin
View graph of relations

Details

Original languageEnglish
Article number117787
JournalApplied energy
Volume304
Early online date10 Sept 2021
Publication statusPublished - 15 Dec 2021
Externally publishedYes

Abstract

Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.

Keywords

    Data analysis, Data collection, Energy performance, Generalisation error, Training data

ASJC Scopus subject areas

Cite this

Machine learning for early stage building energy prediction: Increment and enrichment. / Singh, Manav Mahan; Singaravel, Sundaravelpandian; Geyer, Philipp Florian.
In: Applied energy, Vol. 304, 117787, 15.12.2021.

Research output: Contribution to journalArticleResearchpeer review

Singh MM, Singaravel S, Geyer PF. Machine learning for early stage building energy prediction: Increment and enrichment. Applied energy. 2021 Dec 15;304:117787. Epub 2021 Sept 10. doi: 10.1016/j.apenergy.2021.117787
Singh, Manav Mahan ; Singaravel, Sundaravelpandian ; Geyer, Philipp Florian. / Machine learning for early stage building energy prediction : Increment and enrichment. In: Applied energy. 2021 ; Vol. 304.
Download
@article{d4d8c291105a48a8ad98375a2e01795b,
title = "Machine learning for early stage building energy prediction: Increment and enrichment",
abstract = "Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.",
keywords = "Data analysis, Data collection, Energy performance, Generalisation error, Training data",
author = "Singh, {Manav Mahan} and Sundaravelpandian Singaravel and Geyer, {Philipp Florian}",
note = "Funding Information: The authors want to acknowledge the support of Deutsche Forschungsgemeinschaft (DFG), Germany, for funding the research through the grant GE1652/3-1 within the research unit FOR 2363. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government – department EWI, Belgium. We would like to express our sincere gratitude to the Institute of Energy Efficient and Sustainable Design and Building, Technical University, Munich (TUM) and Ferdinand Tausendpfund GmbH & Co. KG for providing energy consumption and design data of Tausendpfund building. The datasets related to this article can be found at http://dx.doi.org/10.17632/9jvh8ckjbw.3, an open-source online data repository hosted at Mendeley Data [59].",
year = "2021",
month = dec,
day = "15",
doi = "10.1016/j.apenergy.2021.117787",
language = "English",
volume = "304",
journal = "Applied energy",
issn = "0306-2619",
publisher = "Elsevier BV",

}

Download

TY - JOUR

T1 - Machine learning for early stage building energy prediction

T2 - Increment and enrichment

AU - Singh, Manav Mahan

AU - Singaravel, Sundaravelpandian

AU - Geyer, Philipp Florian

N1 - Funding Information: The authors want to acknowledge the support of Deutsche Forschungsgemeinschaft (DFG), Germany, for funding the research through the grant GE1652/3-1 within the research unit FOR 2363. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government – department EWI, Belgium. We would like to express our sincere gratitude to the Institute of Energy Efficient and Sustainable Design and Building, Technical University, Munich (TUM) and Ferdinand Tausendpfund GmbH & Co. KG for providing energy consumption and design data of Tausendpfund building. The datasets related to this article can be found at http://dx.doi.org/10.17632/9jvh8ckjbw.3, an open-source online data repository hosted at Mendeley Data [59].

PY - 2021/12/15

Y1 - 2021/12/15

N2 - Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.

AB - Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.

KW - Data analysis

KW - Data collection

KW - Energy performance

KW - Generalisation error

KW - Training data

UR - http://www.scopus.com/inward/record.url?scp=85115664490&partnerID=8YFLogxK

U2 - 10.1016/j.apenergy.2021.117787

DO - 10.1016/j.apenergy.2021.117787

M3 - Article

AN - SCOPUS:85115664490

VL - 304

JO - Applied energy

JF - Applied energy

SN - 0306-2619

M1 - 117787

ER -