Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file removed PerpendicularDist.py
Empty file.
360 changes: 360 additions & 0 deletions data_integration.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,360 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"Script to integrate the rainfall data from CUrW and attenuation data from Dialog Axiata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"IMPORTANT : This script is hard coded to this specific task. it is heavily dependent on some requirements. So please make sure to follow the exact steps metioned below."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from os import listdir\n",
"from os.path import isfile, join\n",
"from datetime import datetime, timedelta\n",
"import time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"IMPORTANT : please comment out the reduce_to_single_file() method if you have run it once or already have the file with all weather stations"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# folder_path : Path to CML data file.\n",
"# edited_file_path : Path to folder which contained station wise weather data. \n",
"# csv_file_rainfall_data_all : Path to file all rainfall data \n",
"# integrated_file_loc : location of the final integrated file\n",
"def get_all_files(folder_path, edited_file_path, csv_file_rainfall_data_all, integrated_file_loc):\n",
"# Read the file names in a folder path\n",
" onlyfiles = [f for f in listdir(edited_file_path) if (isfile(join(edited_file_path, f)) and (not f.startswith(\".\")))]\n",
" print(onlyfiles)\n",
"\n",
" for i in onlyfiles:\n",
" reduce_to_single_file(edited_file_path,i, csv_file_rainfall_data_all)\n",
" data_integration(folder_path, csv_file_rainfall_data_all, integrated_file_loc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below method reduce all the files in to a single file. \n",
"IMPORTATNT : This method must run only once. otherwise it will keep appending the same file over and over. You will get an completely wrong precipitation. because if you append the same file twice you get twice the percipitation as correct one."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"global count\n",
"def reduce_to_single_file(edited_file_path, i, csv_file_rainfall_data_all):\n",
" df = pd.read_csv(edited_file_path + i)\n",
" df[\"PrecipStation\"] = i[0:-4]\n",
" # if file does not exist write header \n",
" if not isfile(csv_file_rainfall_data_all):\n",
" df.to_csv(csv_file_rainfall_data_all, index =False)\n",
" else: # else it exists so append without writing the header\n",
" df.to_csv(csv_file_rainfall_data_all, index=False, mode='a', header=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the CML data, if the label says 00:00 then that means it contains data for next 15 minutes. we have \n",
"calculated the attenuation for the time period 00:00 -00:15\n",
"\n",
"we used that convention here as well.\n",
"if the data says 11:15 then,\n",
"that means the sum of the readings at 11:20,11:25,11:30"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"IMPORTANT : this code require the file name to be same as the weather station name used in the CML data file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please Uncomment as the weather stations you have"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def data_integration(csv_path_cml, csv_path_rainfall, integrated_file_loc):\n",
" \n",
" cml_data_frame = pd.read_csv(csv_path_cml, index_col=[2,1])\n",
" \n",
" currently_using_stns = [\"Ibattara 2\", \"Orugodawatta\", \"Kottawa North Dharmapala School\", \"Waga\"]\n",
" \n",
"# # creating different dataframes for each weather station\n",
"# ambewela_cml_df = cml_data_frame.loc[\"Ambewela\"]\n",
"# dickOya_cml_df = cml_data_frame.loc[\"Dick oya\"]\n",
"# hingurana_cml_df = cml_data_frame.loc[\"Hingurana\"]\n",
"# jaffna_cml_df = cml_data_frame.loc[\"Jaffna\"]\n",
"# mahapallegama_cml_df = cml_data_frame.loc[\"Mahapallegama\"]\n",
"# malabe_cml_df = cml_data_frame.loc[\"Malabe\"]\n",
"# mulleriyawa_cml_df = cml_data_frame.loc[\"Mulleriyawa\"]\n",
"# mutwal_cml_df = cml_data_frame.loc[\"Mutwal\"]\n",
" ibattara_cml_df = cml_data_frame.loc[\"Ibattara 2\"]\n",
" orugodawatta_cml_df = cml_data_frame.loc[\"Orugodawatta\"]\n",
"# thurstan_college_cml_df = cml_data_frame.loc[\"Thurstan College\"]\n",
"# uduwawala_cml_df = cml_data_frame.loc[\"Uduwawala\"]\n",
"# urumewella_cml_df = cml_data_frame.loc[\"Urumewella\"]\n",
" dhrmapala_scl_cml_df = cml_data_frame.loc[\"Kottawa North Dharmapala School\"]\n",
" waga_cml_df = cml_data_frame.loc[\"Waga\"]\n",
" \n",
" \n",
" weather_stations =['ambewela.csv', 'dick oya.csv', 'hingurana.csv', 'ibattara 2.csv', 'jaffna.csv', \n",
" 'kottawa north dharmapala school.csv', 'mahapallegama.csv', 'malabe.csv',\n",
" 'mulleriyawa.csv', 'mutwal.csv',\n",
" 'orugodawatta.csv', 'thurstan college.csv', 'uduwawala.csv', \n",
" 'urumewella.csv', 'waga.csv']\n",
" \n",
" \n",
" rainfall_data_frame = pd.read_csv(csv_path_rainfall, parse_dates=[\"date_time\"], index_col=[2,0])\n",
" \n",
" #creating different dataframes for each weather station rainfall data\n",
"# ambewela_rainfall_df = rainfall_data_frame.loc[\"Ambewela\"]\n",
"# dickOya_rainfall_df = rainfall_data_frame.loc[\"Dick oya\"]\n",
"# hingurana_rainfall_df = rainfall_data_frame.loc[\"Hingurana\"]\n",
"# jaffna_rainfall_df = rainfall_data_frame.loc[\"Jaffna\"]\n",
"# mahapallegama_rainfall_df = rainfall_data_frame.loc[\"Mahapallegama\"]\n",
"# malabe_rainfall_df = rainfall_data_frame.loc[\"Malabe\"]\n",
"# mulleriyawa_rainfall_df = rainfall_data_frame.loc[\"Mulleriyawa\"]\n",
"# mutwal_rainfall_df = rainfall_data_frame.loc[\"Mutwal\"]\n",
" ibattara_rainfall_df = rainfall_data_frame.loc[\"Ibattara 2\"]\n",
" orugodawatta_rainfall_df = rainfall_data_frame.loc[\"Orugodawatta\"]\n",
"# thurstan_college_rainfall_df = rainfall_data_frame.loc[\"Thurstan College\"]\n",
"# uduwawala_rainfall_df = rainfall_data_frame.loc[\"Uduwawala\"]\n",
"# urumewella_rainfall_df = rainfall_data_frame.loc[\"Urumewella\"]\n",
" dhrmapala_scl_rainfall_df = rainfall_data_frame.loc[\"Kottawa North Dharmapala School\"]\n",
" waga_rainfall_df = rainfall_data_frame.loc[\"Waga\"]\n",
" \n",
"# ambewela_rainfall_df[\"precipitation(mm)\"] = ambewela_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# ambewela_rainfall_df = ambewela_rainfall_df.dropna()\n",
"# ambewela_cml_df = ambewela_cml_df.merge(ambewela_rainfall_df, left_index=True, right_index=True)\n",
" \n",
" \n",
"# dickOya_rainfall_df[\"precipitation(mm)\"] = dickOya_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# dickOya_rainfall_df = dickOya_rainfall_df.dropna()\n",
"# dickOya_cml_df = dickOya_cml_df.merge(dickOya_rainfall_df, left_index=True, right_index=True)\n",
" \n",
"# hingurana_rainfall_df[\"precipitation(mm)\"] = hingurana_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# hingurana_rainfall_df = hingurana_rainfall_df.dropna()\n",
"# hingurana_cml_df = hingurana_cml_df.merge(hingurana_rainfall_df, left_index=True, right_index=True)\n",
" \n",
"# jaffna_rainfall_df[\"precipitation(mm)\"] = jaffna_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# jaffna_rainfall_df = jaffna_rainfall_df.dropna()\n",
"# jaffna_cml_df = jaffna_cml_df.merge(jaffna_rainfall_df, left_index=True, right_index=True)\n",
" \n",
"# mahapallegama_rainfall_df[\"precipitation(mm)\"] = mahapallegama_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# mahapallegama_rainfall_df = mahapallegama_rainfall_df.dropna()\n",
"# mahapallegama_cml_df = mahapallegama_cml_df.merge(mahapallegama_rainfall_df, left_index=True, right_index=True)\n",
" \n",
"# malabe_rainfall_df[\"precipitation(mm)\"] = malabe_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# malabe_rainfall_df = malabe_rainfall_df.dropna()\n",
"# malabe_cml_df = malabe_cml_df.merge(malabe_rainfall_df, left_index=True, right_index=True)\n",
" \n",
"# mulleriyawa_rainfall_df[\"precipitation(mm)\"] = mulleriyawa_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# mulleriyawa_rainfall_df = mulleriyawa_rainfall_df.dropna()\n",
"# mulleriyawa_cml_df = mulleriyawa_cml_df.merge(mulleriyawa_rainfall_df, left_index=True, right_index=True)\n",
" \n",
"# mutwal_rainfall_df[\"precipitation(mm)\"] = mutwal_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# mutwal_rainfall_df = mutwal_rainfall_df.dropna()\n",
"# mutwal_cml_df = mutwal_cml_df.merge(mutwal_rainfall_df, left_index=True, right_index=True)\n",
" \n",
" ibattara_rainfall_df[\"precipitation(mm)\"] = ibattara_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
" ibattara_rainfall_df = ibattara_rainfall_df.dropna()\n",
" ibattara_cml_df = ibattara_cml_df.merge(ibattara_rainfall_df, left_index=True, right_index=True)\n",
" ibattara_cml_df.insert(1,\"PrecipStation\", \"Ibattara 2\")\n",
"\n",
" orugodawatta_rainfall_df[\"precipitation(mm)\"] = orugodawatta_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
" orugodawatta_rainfall_df = orugodawatta_rainfall_df.dropna()\n",
" orugodawatta_cml_df = orugodawatta_cml_df.merge(orugodawatta_rainfall_df, left_index=True, right_index=True)\n",
" orugodawatta_cml_df.insert(1,\"PrecipStation\", \"Orugodawatta\")\n",
" \n",
"# thurstan_college_rainfall_df[\"precipitation(mm)\"] = thurstan_college_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# thurstan_college_rainfall_df = thurstan_college_rainfall_df.dropna()\n",
"# thurstan_college_cml_df = thurstan_college_cml_df.merge(thurstan_college_rainfall_df, left_index=True, right_index=True)\n",
" \n",
"# uduwawala_rainfall_df[\"precipitation(mm)\"] = uduwawala_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# uduwawala_rainfall_df = uduwawala_rainfall_df.dropna()\n",
"# uduwawala_cml_df = uduwawala_cml_df.merge(uduwawala_rainfall_df, left_index=True, right_index=True)\n",
" \n",
"# urumewella_rainfall_df[\"precipitation(mm)\"] = urumewella_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
"# urumewella_rainfall_df = urumewella_rainfall_df.dropna()\n",
"# urumewella_cml_df = urumewella_cml_df.merge(urumewella_rainfall_df, left_index=True, right_index=True)\n",
" \n",
" dhrmapala_scl_rainfall_df[\"precipitation(mm)\"] = dhrmapala_scl_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
" dhrmapala_scl_rainfall_df = dhrmapala_scl_rainfall_df.dropna()\n",
" dhrmapala_scl_cml_df = dhrmapala_scl_cml_df.merge(dhrmapala_scl_rainfall_df, left_index=True, right_index=True)\n",
" dhrmapala_scl_cml_df.insert(1,\"PrecipStation\", \"Kottawa North Dharmapala School\")\n",
" \n",
" waga_rainfall_df[\"precipitation(mm)\"] = waga_rainfall_df[\"precipitation(mm)\"].resample(\"15Min\", closed=\"right\", label=\"right\").sum(min_count=3)\n",
" waga_rainfall_df = waga_rainfall_df.dropna()\n",
" waga_cml_df = waga_cml_df.merge(waga_rainfall_df, left_index=True, right_index=True)\n",
" dhrmapala_scl_cml_df\n",
" waga_cml_df.insert(1,\"PrecipStation\", \"Waga\")\n",
" \n",
" all_data_frames = [ibattara_cml_df,orugodawatta_cml_df,\n",
"# ambewela_cml_df, dickOya_cml_df, hingurana_cml_df, \n",
"# jaffna_cml_df,\n",
"# mahapallegama_cml_df, \n",
"# malabe_cml_df, mulleriyawa_cml_df, mutwal_cml_df, \n",
"# thurstan_college_cml_df,\n",
"# uduwawala_cml_df, urumewella_cml_df, \n",
" dhrmapala_scl_cml_df,waga_cml_df]\n",
" \n",
" final_data_frame = pd.concat(all_data_frames)\n",
" \n",
"# for df in all_data_frames:\n",
"# if not isfile(integrated_file_loc):\n",
"# df.to_csv(integrated_file_loc)\n",
"# else: # else it exists so append without writing the header\n",
"# df.to_csv(integrated_file_loc, mode='a', header=False)\n",
" \n",
" final_data_frame.to_csv(integrated_file_loc, index_label= \"date_time\")\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Ambewela.csv', 'Dick oya.csv', 'Hingurana.csv', 'Ibattara 2.csv', 'Jaffna.csv', 'Kottawa North Dharmapala School.csv', 'Mahapallegama.csv', 'Malabe.csv', 'Mulleriyawa.csv', 'Mutwal.csv', 'Orugodawatta.csv', 'Thurstan College.csv', 'Uduwawala.csv', 'Urumewella.csv', 'Waga.csv']\n"
]
}
],
"source": [
"get_all_files(\"/media/akalanka/Engineering/Final_Year_Project/1_DATA/CML/Proccessed/Gayan/2018-05-08 to 2018-05-15.csv\",\n",
" \"/media/akalanka/Engineering/Final_Year_Project/1_DATA/RAIN/curw/\", \n",
" \"/media/akalanka/Engineering/Final_Year_Project/1_DATA/RAIN/curw/rainfall_data_all.csv\",\n",
" \"/media/akalanka/Engineering/Final_Year_Project/1_DATA/CML/Proccessed/Gayan/2018-05-08_to_2018-05-15_integrated.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"below method is to create the classified data.\n",
"\n",
"in here, \n",
"\n",
"* 0 < A < 0.5 \n",
"* 0.5 <= B < 1.0 \n",
"* 1.0 <= C < 2.5 \n",
"* 2.5 <= D < 5.0 \n",
"* 5.0 <= E \n",
"\n",
"\n",
"* A = small rain\n",
"* B = Average Rain\n",
"* C = Above average\n",
"* D = Heavy rain\n",
"* E = Very Heavy rain"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"def classify_data(file_path):\n",
" \n",
" classify_data_frame = pd.read_csv(file_path, index_col=[0])\n",
" \n",
"# define bins 0-0.5, 0.6-1.0, 1.1-2.5, 2.6-5.0, 5.1-infinty\n",
" bins = [-1.0, 0.5, 1.0, 2.5, 5.0, 1000.0]\n",
" \n",
" group_names = [\"A\", \"B\", \"C\", \"D\", \"E\"]\n",
" \n",
" classify_data_frame[\"class\"] = pd.cut(classify_data_frame[\"precipitation(mm)\"], bins, labels=group_names)\n",
" new_file = file_path[0:-4] + \"_classified.csv\"\n",
" classify_data_frame.to_csv(new_file)\n",
" \n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"classify_data(\"/media/akalanka/Engineering/Final_Year_Project/1_DATA/CML/Proccessed/Gayan/2018-05-08_to_2018-05-15_integrated.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Loading