Details
Original language | English |
---|---|
Article number | 117787 |
Journal | Applied energy |
Volume | 304 |
Early online date | 10 Sept 2021 |
Publication status | Published - 15 Dec 2021 |
Externally published | Yes |
Abstract
Keywords
- Data analysis, Data collection, Energy performance, Generalisation error, Training data
ASJC Scopus subject areas
- Engineering(all)
- Building and Construction
- Engineering(all)
- Mechanical Engineering
- Energy(all)
- General Energy
- Environmental Science(all)
- Management, Monitoring, Policy and Law
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Applied energy, Vol. 304, 117787, 15.12.2021.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Machine learning for early stage building energy prediction
T2 - Increment and enrichment
AU - Singh, Manav Mahan
AU - Singaravel, Sundaravelpandian
AU - Geyer, Philipp Florian
N1 - Funding Information: The authors want to acknowledge the support of Deutsche Forschungsgemeinschaft (DFG), Germany, for funding the research through the grant GE1652/3-1 within the research unit FOR 2363. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government – department EWI, Belgium. We would like to express our sincere gratitude to the Institute of Energy Efficient and Sustainable Design and Building, Technical University, Munich (TUM) and Ferdinand Tausendpfund GmbH & Co. KG for providing energy consumption and design data of Tausendpfund building. The datasets related to this article can be found at http://dx.doi.org/10.17632/9jvh8ckjbw.3, an open-source online data repository hosted at Mendeley Data [59].
PY - 2021/12/15
Y1 - 2021/12/15
N2 - Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.
AB - Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.
KW - Data analysis
KW - Data collection
KW - Energy performance
KW - Generalisation error
KW - Training data
UR - http://www.scopus.com/inward/record.url?scp=85115664490&partnerID=8YFLogxK
U2 - 10.1016/j.apenergy.2021.117787
DO - 10.1016/j.apenergy.2021.117787
M3 - Article
AN - SCOPUS:85115664490
VL - 304
JO - Applied energy
JF - Applied energy
SN - 0306-2619
M1 - 117787
ER -