Journal Article
Identifying key variables and interactions in statistical models of building energy consumption using regularization

Statistical models can only be as good as the data put into them. Data about energy consumption continues to grow, particularly its non-technical aspects, but these variables are often interpreted differently among disciplines, datasets, and contexts. Selecting key variables and interactions is therefore an important step in achieving more accurate predictions, better interpretation, and identification of key subgroups for further analysis.

This paper therefore makes two main contributions to the modeling and analysis of energy con- sumption of buildings. First, it introduces regularization, also known as penalized regression, for prin- cipled selection of variables and interactions. Second, this approach is demonstrated by application to a comprehensive dataset of energy consumption for commercial office and multifamily buildings in New York City. Using cross-validation, this paper finds that a newly-developed method, hierarchical group- lasso regularization, significantly outperforms ridge, lasso, elastic net and ordinary least squares ap- proaches in terms of prediction accuracy; develops a parsimonious model for large New York City buildings; and identifies several interactions between technical and non-technical parameters for further analysis, policy development and targeting. This method is generalizable to other local contexts, and is likely to be useful for the modeling of other sectors of energy consumption as well. 

Title
Publication TypeJournal Article
Year of Publication2015
AuthorsHsu D
JournalEnergy
Volume83
Start Page144
Date Published04/2015
Abstract

Statistical models can only be as good as the data put into them. Data about energy consumption continues to grow, particularly its non-technical aspects, but these variables are often interpreted differently among disciplines, datasets, and contexts. Selecting key variables and interactions is therefore an important step in achieving more accurate predictions, better interpretation, and identification of key subgroups for further analysis.

This paper therefore makes two main contributions to the modeling and analysis of energy con- sumption of buildings. First, it introduces regularization, also known as penalized regression, for prin- cipled selection of variables and interactions. Second, this approach is demonstrated by application to a comprehensive dataset of energy consumption for commercial office and multifamily buildings in New York City. Using cross-validation, this paper finds that a newly-developed method, hierarchical group- lasso regularization, significantly outperforms ridge, lasso, elastic net and ordinary least squares ap- proaches in terms of prediction accuracy; develops a parsimonious model for large New York City buildings; and identifies several interactions between technical and non-technical parameters for further analysis, policy development and targeting. This method is generalizable to other local contexts, and is likely to be useful for the modeling of other sectors of energy consumption as well. 

URLhttp://www.sciencedirect.com/science/article/pii/S0360544215001590
DOI10.1016/j.energy.2015.02.008