-
Notifications
You must be signed in to change notification settings - Fork 15
QC fixes for residential data transformation scripts #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
08ff125
e7e4afb
93cbb6d
d433f00
60d3047
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,11 +11,19 @@ Before using these scripts make sure: | |
| After the simulation is done, please make sure simulation is finished successfully, where a particular subdirectory in the measures folder (i.e. **SWSV001-05 Duct Seal_DMo**) contains the file _results-summary.csv_ as well as the subfolder **runs**, which contains climate-zone specific output files. | ||
|
|
||
| ### Post-processing steps for residential measures: | ||
| If preparing a new measure or update to an existing measure, then create a measure list workbook to define permutations to be calculated in post-processing. | ||
| 1. Make a copy of the measure list workbook template, `DEER_EnergyPlus_Modelkit_Measure_list_working.xlsx`. | ||
| 2. Permute rows for all combinations of building type, vintage, HVAC type, and pairings of 'PreTechID', 'StdTechID', 'MeasTechID'. For each unique pairing of 'PreTechID', 'StdTechID', 'MeasTechID', enter one unique MeasureID name; the outputs will be labeled by MeasureID rather than by TechID. All rows for a given MeasureID should share the same Normunit option. | ||
| 3. Save your measure list workbook as part of the measure setup documentation. | ||
|
|
||
| Apply the following data transformation steps for each subfolder under your measure (e.g. DMo, MFm-Ex, MFm-New, SFm-Ex, SFm-New): | ||
| 1. Open one of the provided .py scripts in the **data transformation** directory (either DMo.py, MFm.py, or SFm.py). The corresponding building type script should be used. | ||
| 2. Open up the accompanying excel spreadsheet ***DEER_EnergyPlus_Modelkit_Measure_list_working.xlsx***, identify the corresponding measure name in column A of the sheet `Measure_list`. | ||
| 3. In line 23 of the python script (line 26 for the Com script), specify the measure name identified in step 2. For example: `measure_name = 'SWSV001-05 Duct Seal_DMo'` | ||
| 4. In line 33 of the python script (line 34 and 35 in the Single Family script, line 40 in Com script but that one should be automatic, double check to make sure), specify the path to the simulation directory starting with the folder Analysis. For example: `path = 'residential measures/SWSV001-05 Duct Seal_DMo'` (For Single Family, if existing vintage, assign both the 1975 and 1985 directory to path_1975 and path_1985 respectively, and leave path_new blank; if new vintage, assign the New directory to path_new ) | ||
| 5. run the python script. The script should produce 3 files, ***current_msr_mat.csv***, ***sim_annual.csv***, ***sim_hourly_wb.csv*** (***sfm_annual.csv*** and ***sfm_hourly_csv*** instead for Single Family). These files should appear in the same directory as the python script. For better organization, save these files somewhere else trackable. Note that these files are part of **gitignore**, but the user can produce them in their local repo and move them to a desirable location after the process is finished. | ||
|
Comment on lines
+19
to
24
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we clarify that these steps should be repeated for each cohort subfolder. |
||
|
|
||
| Apply the following post-processing steps to compute peak demand, synthetic building type 'SFm', normalized energy savings, and weighted average data for building type 'Res'. | ||
| 6. If tables "wts_res_bldg.csv" and "wts_res_hvac.csv" is not consistent with the DEER weights table (DEER.BldgWts), run DEER_weights_extraction.py to extract the most up-to-date weights table needed for post-procesing. Use the the most up-to-date tables during the POSTgreSQL steps. | ||
| 7. In a postgreSQL database management software (such as [pgadmin4](https://www.pgadmin.org/download/)), import the csv files generated from step 5, along with the other csv tables (specifically, "NumBldgs.csv", "NumStor.csv", "peakperspec.csv", "wts_res_bldg.csv", "wts_res_hvac.csv", "FloorArea_2022.csv") provided in the **energy savings** folder. Also, run the ***ImpactProfiles.sql*** in the postgreSQL environment to create its corresponding support table. The support tables provided in the **energy savings** folder only needed to be imported once. | ||
| 8. From the **energy savings** folder, run provided .sql queries labelled "R1..", "R2.." etc. in the following order: R1, R2, R3, R4, P1, P2, P3, P4, P5, P6, P7, P8. Only run P2.1A and P2.1B after P2 if processing Duct optimization or Duct seal measure. (for Commercial, a separate set of commercial scripts are provided, use those instead and run in numerical order) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,7 @@ | |
| print(measures) | ||
| #%% | ||
| #Define measure name here | ||
| measure_name = 'Windows' | ||
| measure_name = 'SEER Rated AC HP' | ||
|
|
||
| # %% | ||
| #DMo only script | ||
|
|
@@ -32,8 +32,11 @@ | |
|
|
||
| #12/20/2023 After finishing Com, try to condense Res script so one script takes care of one measure folder? | ||
| #to do: use for loop to loop over each folder, using if-else to process different building types for Res | ||
| #The folder path here is specifc to Windows measure which has three subfolders ending with Msr1,Msr2,Msr3. Modify the path for each submeasure. | ||
| path = 'residential measures/SWBE011-01 Windows/SWBE011-01 Windows_DMo/SWBE011-01 Windows_DMo_Msr1' | ||
| # For most measures, specify a path to a cohort folder, e.g. | ||
| # path = 'residential measures/SWHC049-08 SEER Rated AC HP/SWHC049-08 SEER Rated AC HP_DMo' | ||
| # For SWBE011 Windows, for each subfolder, specify the path to the subfolder and run the script, e.g. | ||
| # path = 'residential measures/SWBE011-01 Windows/SWBE011-01 Windows_DMo/SWBE011-01 Windows_DMo_Msr1'. | ||
| path = 'residential measures/SWHC049-08 SEER Rated AC HP/SWHC049-08 SEER Rated AC HP_DMo' | ||
|
|
||
| # %% | ||
| #extract only the 5th portion of the measure group name for expected_att | ||
|
|
@@ -254,9 +257,12 @@ def end_use_rearrange(df_in): | |
| full_path = hrly_path + "/" + split_meta_cols_eu.iloc[i][0] + "/" + split_meta_cols_eu.iloc[i][1] + "/" + split_meta_cols_eu.iloc[i][2] + "/instance-var.csv" | ||
| df = pd.read_csv(full_path, low_memory=False) | ||
|
|
||
| #remove traling spaces on col headers | ||
| df.columns = df.columns.str.rstrip() | ||
|
|
||
| #extract the last column (the total elec hrly profile) | ||
| #if for enduse hourly, then extract the relevant end use column | ||
| extracted_df = pd.DataFrame(df.iloc[:,-1]) | ||
| extracted_df = pd.DataFrame(df['Electricity:Facility [J](Hourly)']) | ||
|
Comment on lines
+260
to
+265
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here, we make hourly data extraction more robust using column name (1/3) |
||
|
|
||
| #create the column name based on the permutations | ||
| col_name = split_meta_cols_eu.iloc[i][0] + "/" + split_meta_cols_eu.iloc[i][1] + "/" + split_meta_cols_eu.iloc[i][2] + "/instance-var.csv" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,7 @@ | |
| print(measures) | ||
| #%% | ||
| #Define measure name here | ||
| measure_name = 'Windows' | ||
| measure_name = 'SEER Rated AC HP' | ||
|
|
||
| # %% | ||
| #MFm only script | ||
|
|
@@ -30,7 +30,7 @@ | |
| os.chdir("../..") #go up two directory | ||
| print(os.path.abspath(os.curdir)) | ||
|
|
||
| path = 'residential measures/SWBE011-01 Windows\SWBE011-01 Windows_MFm\SWBE011-01 Windows_MFm_Msr1' | ||
| path = 'residential measures/SWHC049-08 SEER Rated AC HP/SWHC049-08 SEER Rated AC HP_MFm_New' | ||
| # %% | ||
| #extract only the 5th portion of the measure group name for expected_att | ||
| #split argument 4 means only split 4 times maximum | ||
|
|
@@ -251,9 +251,12 @@ def end_use_rearrange(df_in): | |
| full_path = hrly_path + "/" + split_meta_cols_eu.iloc[i][0] + "/" + split_meta_cols_eu.iloc[i][1] + "/" + split_meta_cols_eu.iloc[i][2] + "/instance-var.csv" | ||
| df = pd.read_csv(full_path, low_memory=False) | ||
|
|
||
| #remove traling spaces on col headers | ||
| df.columns = df.columns.str.rstrip() | ||
|
|
||
| #extract the last column (the total elec hrly profile) | ||
| #if for enduse hourly, then extract the relevant end use column | ||
| extracted_df = pd.DataFrame(df.iloc[:,-1]) | ||
| extracted_df = pd.DataFrame(df['Electricity:Facility [J](Hourly)']) | ||
|
Comment on lines
+254
to
+259
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here, we make hourly data extraction more robust using column name (2/3) |
||
|
|
||
| #create the column name based on the permutations | ||
| col_name = split_meta_cols_eu.iloc[i][0] + "/" + split_meta_cols_eu.iloc[i][1] + "/" + split_meta_cols_eu.iloc[i][2] + "/instance-var.csv" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,7 @@ | |
| print(measures) | ||
| #%% | ||
| #Define measure name here | ||
| measure_name = 'Windows' | ||
| measure_name = 'SEER Rated AC HP' | ||
|
|
||
| # %% | ||
| #SFm only script | ||
|
|
@@ -31,14 +31,18 @@ | |
| print(os.path.abspath(os.curdir)) | ||
|
|
||
| #input the two subdirectory of SFm, one being 1975, the other 1985. If New vintage, input path at path_new and leave other blank. | ||
| path_1975 = 'residential measures/SWBE011-01 Windows/SWBE011-01 Windows_SFm_1975/SWBE011-01 Windows_SFm_1975_Msr1' | ||
| path_1985 = 'residential measures/SWBE011-01 Windows/SWBE011-01 Windows_SFm_1985/SWBE011-01 Windows_SFm_1985_Msr1' | ||
| path_new = '' | ||
|
|
||
| paths = [path_1975, path_1985] | ||
|
|
||
| if path_new != '' : | ||
| path_1975 = 'residential measures/SWHC049-08 SEER Rated AC HP/SWHC049-08 SEER Rated AC HP_SFm_1975' | ||
| path_1985 = 'residential measures/SWHC049-08 SEER Rated AC HP/SWHC049-08 SEER Rated AC HP_SFm_1985' | ||
| path_new = 'residential measures/SWHC049-08 SEER Rated AC HP/SWHC049-08 SEER Rated AC HP_SFm_New' | ||
|
|
||
| # Select whether to process New or Existing vintage models. | ||
| # The script is not compatible with processing both New and Existing in a single batch. | ||
| MODE_NEW_VINTAGE = False | ||
| if MODE_NEW_VINTAGE: | ||
| paths = [path_new] | ||
| else: | ||
| paths = [path_1975, path_1985] | ||
|
|
||
|
Comment on lines
33
to
+45
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we introduce a new variable that triggers distinct logic for new and existing vintage later on in SFm.py |
||
| # %% | ||
| #extract only the 5th portion of the measure group name for expected_att | ||
| #split argument 4 means only split 4 times maximum | ||
|
|
@@ -222,10 +226,10 @@ def end_use_rearrange(df): | |
| sim_annual_raw = pd.DataFrame() | ||
| for path in paths: | ||
| print(f'processing data in {path}') | ||
| df_raw = pd.read_csv(path+'/'+'/results-summary.csv', usecols=['File Name']) | ||
| df_raw = pd.read_csv(path+'/results-summary.csv', usecols=['File Name']) | ||
| num_runs = len(df_raw['File Name'].dropna().unique()) - 1 | ||
| #Read annual data | ||
| annual_df = pd.read_csv(path+'/'+'/results-summary.csv', nrows=num_runs, skiprows=num_runs+2) | ||
| annual_df = pd.read_csv(path+'/results-summary.csv', nrows=num_runs, skiprows=num_runs+2) | ||
| split_meta_cols_eu = annual_df['File Name'].str.split('/', expand=True) | ||
|
|
||
| #looping over multiple folders/cohort cases, use a list | ||
|
|
@@ -273,9 +277,9 @@ def end_use_rearrange(df): | |
| #extract data per bldgtype-bldghvac-bldgvint group | ||
| hourly_df = pd.DataFrame(index=range(0,8760)) | ||
| #extract num_runs / split_meta_cols_eu | ||
| df_raw = pd.read_csv(path+'/'+'/results-summary.csv', usecols=['File Name']) | ||
| df_raw = pd.read_csv(path+'/results-summary.csv', usecols=['File Name']) | ||
| num_runs = len(df_raw['File Name'].dropna().unique()) - 1 | ||
| annual_df = pd.read_csv(path+'/'+'/results-summary.csv', nrows=num_runs, skiprows=num_runs+2) | ||
| annual_df = pd.read_csv(path+'/results-summary.csv', nrows=num_runs, skiprows=num_runs+2) | ||
| split_meta_cols_eu = annual_df['File Name'].str.split('/', expand=True) | ||
| for i in range(0,num_runs): | ||
| print(f"merging record {i}") | ||
|
|
@@ -284,9 +288,12 @@ def end_use_rearrange(df): | |
| full_path = hrly_path + "/" + split_meta_cols_eu.iloc[i][0] + "/" + split_meta_cols_eu.iloc[i][1] + "/" + split_meta_cols_eu.iloc[i][2] + "/instance-var.csv" | ||
| df = pd.read_csv(full_path, low_memory=False) | ||
|
|
||
| #remove traling spaces on col headers | ||
| df.columns = df.columns.str.rstrip() | ||
|
|
||
| #extract the last column (the total elec hrly profile) | ||
| #if for enduse hourly, then extract the relevant end use column | ||
| extracted_df = pd.DataFrame(df.iloc[:,-1]) | ||
| extracted_df = pd.DataFrame(df['Electricity:Facility [J](Hourly)']) | ||
|
Comment on lines
+291
to
+296
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here, we make hourly data extraction more robust using column name (3/3) |
||
|
|
||
| #create the column name based on the permutations | ||
| col_name = split_meta_cols_eu.iloc[i][0] + "/" + split_meta_cols_eu.iloc[i][1] + "/" + split_meta_cols_eu.iloc[i][2] + "/instance-var.csv" | ||
|
|
@@ -635,9 +642,14 @@ def end_use_rearrange(df): | |
| cz_vint_dict = cz_vint_dict1 | cz_vint_dict2 | ||
|
|
||
| #%% | ||
| ##BldgVint label correction for NumStor weights | ||
| sim_annual_f['BldgVint'] = sim_annual_f['BldgLoc'].map(cz_vint_dict) | ||
| sim_hourly_final['BldgVint'] = sim_hourly_final['BldgLoc'].map(cz_vint_dict) | ||
| if MODE_NEW_VINTAGE: | ||
| pass | ||
| else: | ||
| ##BldgVint label correction for NumStor weights | ||
| # Intended to be used only for Existing vintage models. | ||
| # This overwrites the BldgVint attribute from the model, regardless of New or Existing. | ||
| sim_annual_f['BldgVint'] = sim_annual_f['BldgLoc'].map(cz_vint_dict) | ||
| sim_hourly_final['BldgVint'] = sim_hourly_final['BldgLoc'].map(cz_vint_dict) | ||
|
Comment on lines
644
to
+652
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we prevent triggering the vintage label replacement logic intended for Existing vintage when the processing a batch of New vintage models. |
||
|
|
||
| # %% | ||
| ##STEP 4: Measure setup file (current_msr_mat.csv) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we clarify some of the requirements for entering measure definitions into the measure list workbook.