This code was written in support of the project "Putting Buildings to Sleep", funded by the University of California Office of the President's Carbon Neutrality Initiative grant. Research was undertaken by Alan Meier, Lawrence Berkeley National Lab Lisa Slaughter, UC Davis Energy Graduate Group Alex Sloan, UC Davis Energy Graduate Group with assistance from Marco Pritoni, Lawrence Berkeley National Lab Shrinivasa Upadhyaya, UC Davis Kurt Kornbluth, UC Davis The Western Cooling and Efficiency Center
The goal of this project is to develop a platform that infers whether a building is vacant based on available sensor data. Examples of these data sources include carbon dioxide levels, electricity demand, humidity, temperature, and the number of active Wi-Fi connections. It is envisioned that this vacancy signal can be used in real time to enact energy savings through automatic equipment shut-downs. See below for the abstract to the thesis manuscript that documents this work.
Inputs: - A .csv file containing historical data, one row per timestamp. First column is timestamp, subsequent columns hold the raw sensor values at that timestamp. The final column holds the ground truth values at each timestamp. Note that there is currently not handling for sensor data with different timestamps. - A .csv file containing sensor metadata. Column definitions: "Sensor-Name": String. Name of sensor; Must be unique. "Sensor-Type": String. Defines what the sensor is measuring. Options: -"carbon dioxide": Air concentration of carbon dioxide. -"wifi": Count of active Wi-Fi connections. -"elec": Electricity demand. "Update-Frequency": String. Data sample rate in minutes (optional). "Measurement-Units": String. Units attached to sensor measurement. "Data-Access-Type": String. Location information for data retrieval (optional). "Vacancy-Relationship-Type": String. Defines the modeling approach. -"logistic": Uses logistic regression to model vacancy. -"percentile": Uses the proposed percentile method to model vacancy. "Training-Data-Set": Determines what parts of the training set are used. Options: -"full": Use the full training data set. -"cherry": Use only times of expected vacancy from the training set (between 12am - 4am). "Data-Retrieval-File-Name": String. Name of data extraction file for this sensor (.py file; omit extension). "Preprocessing-File-Name": String. Name of preprocessing file for this sensor (.py file; omit extension). "Relationship-Builder-File-Name": String. Name of percentile method training file for this sensor (.py file; omit extension). "Std-Dev": Float. Standard deviation of the training set. "Parameter-1": Object. Placeholder for a model coefficient (leave empty to evaluate). "Parameter-2": Object. Placeholder for a model coefficient (leave empty to evaluate). "Parameter-3": Object. Placeholder for a model coefficient (leave empty to evaluate). "Parameter-4": Object. Placeholder for a model coefficient (leave empty to evaluate). Outputs: -The original .csv of historical data, with additional columns for: The fused probability of vacancy, and the probability of vacancy predicted by each sensor stream. -Various plots generated by preanalysis.py that explore the input data -Various plots and metrics generated by postanalysis.py that evaluate model performance -Various plots comparing the raw inputs, the intermediate probabilities of vacancy, the fused probability of vacancy, and ground truth over time
To run the program, call Main() from main.py. Input parameters to this function define how the program will run. See main.py for more information. The code will progress through the data acquisition, data exploration, training, testing, and evaluation phases of modeling.
Abstract: As building systems such as heating, cooling, ventilation, and lighting continue to reduce their energy consumption, the energy use of miscellaneous plug loads becomes a growing concern. Though the efficiency of these devices should be addressed, it is also important to turn them off or place them into low- or no- power modes when no service is being provided by their operation. Current methods of determining vacancy require the time-consuming task of gathering ground truth in order to train an inference model. This study develops, tests, and evaluates a method of inferring vacancy in buildings that uses easily obtainable data during model training. The approach infers vacancy from any numerical building data having a suitable correlation with vacancy patterns using their cumulative distributions during times of expected vacancy. These times are easily extracted from general knowledge of building vacancy patterns. Decision-level sensor fusion allows the usage of one or many input data streams, where the use of multiple inputs can improve the quality of the vacancy inference. The proposed method was piloted at an office space in Davis, CA USA using the following data streams: electricity demand, room carbon dioxide levels, and the number of active connections to the office’s Wi-Fi network. Evaluation is performed by comparing model outputs against ground truth obtained from security camera footage. ROC analysis is performed, and a new comparison is proposed that compares false positive and false negative rates in an ROC-like manner, called the CMC curve. The proposed method of generating inference curves using root mean square for fusion shows an area under the ROC curve and an area under the CMC curve of 0.960 and 0.041 (where 1 and 0 are the best possible), respectively. This moderately outperforms logistic regression using any of the applied fusion methods, for which the best area under the ROC curve and area under the CMC curve are 0.955 and 0.045, respectively. This shows that using the proposed method allows for quality vacancy inference while reducing the up-front data requirement inherent to model training.