Creating a Large-Scale National Residential Building Energy Dataset Using a Two-Stage Machine Learning Approach
Document Type
Conference Proceeding
Publication Date
1-1-2024
Abstract
Buildings account for 40% of total energy demand in the US. Consequently, there is a pressing need for a dataset that provides comprehensive information on the energy consumption of household units in the US. The current practice on large-scale energy simulations may not reflect the actual energy consumption patterns. Additionally, the existing national building energy datasets, such as the RECS, have a limited number of datapoint and do not reflect the social aspects of the households. This study aimed to create a large-scale national residential building energy dataset using a two-stage machine learning approach, combining two national datasets of the RECS and the AHS. The outcome of this study is a large-scale and comprehensive national dataset that contains information about energy consumption in household units as well as their detailed building features. Three machine learning algorithms, including artificial neural networks (ANN), random forest (RF), and gradient boosting regression (GBR), were used to develop a data-integration framework. The results showed that RF had the best performance in predicting the end-use energy consumption. Additionally, the predicted energy consumption in the generated large-scale dataset had an accuracy of over 80%. These findings have significant implications for energy-efficient building design and operation.
Publication Source (Journal or Book title)
Construction Research Congress 2024, CRC 2024
First Page
305
Last Page
315
Recommended Citation
Vosoughkhosravi, S., & Jafari, A. (2024). Creating a Large-Scale National Residential Building Energy Dataset Using a Two-Stage Machine Learning Approach. Construction Research Congress 2024, CRC 2024, 2, 305-315. https://doi.org/10.1061/9780784485279.032