Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data

Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group,...

Full description

Bibliographic Details
Main Author: Hsu, Yuin-Jen David
Other Authors: Massachusetts Institute of Technology. Department of Urban Studies and Planning
Format: Article
Published: Elsevier 2019
Online Access:http://hdl.handle.net/1721.1/120459
https://orcid.org/0000-0003-1108-9656
_version_ 1826199670429319168
author Hsu, Yuin-Jen David
author2 Massachusetts Institute of Technology. Department of Urban Studies and Planning
author_facet Massachusetts Institute of Technology. Department of Urban Studies and Planning
Hsu, Yuin-Jen David
author_sort Hsu, Yuin-Jen David
collection MIT
description Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression, also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes. Keywords: Cluster-wise regression; Buildings; Energy consumption; Prediction accuracy; Cluster stability; Latent class regression
first_indexed 2024-09-23T11:23:39Z
format Article
id mit-1721.1/120459
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T11:23:39Z
publishDate 2019
publisher Elsevier
record_format dspace
spelling mit-1721.1/1204592022-10-01T03:21:39Z Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data Hsu, Yuin-Jen David Massachusetts Institute of Technology. Department of Urban Studies and Planning Hsu, Yuin-Jen David Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression, also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes. Keywords: Cluster-wise regression; Buildings; Energy consumption; Prediction accuracy; Cluster stability; Latent class regression United States. Department of Energy (Grant DE-EE0004261) 2019-02-14T19:26:40Z 2019-02-14T19:26:40Z 2015-09 2015-08 2019-01-22T15:50:06Z Article http://purl.org/eprint/type/JournalArticle 0306-2619 http://hdl.handle.net/1721.1/120459 Hsu, David. “Comparison of Integrated Clustering Methods for Accurate and Stable Prediction of Building Energy Consumption Data.” Applied Energy 160 (December 2015): 153–163 © 2015 The Author https://orcid.org/0000-0003-1108-9656 http://dx.doi.org/10.1016/j.apenergy.2015.08.126 Applied Energy Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/ application/pdf Elsevier Elsevier
spellingShingle Hsu, Yuin-Jen David
Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data
title Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data
title_full Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data
title_fullStr Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data
title_full_unstemmed Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data
title_short Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data
title_sort comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data
url http://hdl.handle.net/1721.1/120459
https://orcid.org/0000-0003-1108-9656
work_keys_str_mv AT hsuyuinjendavid comparisonofintegratedclusteringmethodsforaccurateandstablepredictionofbuildingenergyconsumptiondata