-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Output of test.py
Oddities with these particular IMERG files
Model simulated dataset that has been interpolated to the IMERG grids (0.1x0.1). Provided by Jiun-Dar Chern on Discover.
From today's zoom meeting I see that unlike IMERG data these are instantaneous values rather than spanning an time interval.
Well, my general question still stands regarding how to encode IMERG files in terms of time. For example, should each time sample overlap? If not how and by how much to I force a separation in terms of STARE time covers? As I have done it below, they are in principle separated by 1 ms. However, they still report as intersecting. So I must be doing something wrong there.
Mike
IMERG Time Standards
For IMERG a single day consists of 48 half-open intervals centered on the hour or half-hour: ['00:00', '00:30', ... '23:00', '23:30']
The half-open intervals are such that the Lower Bound (LB), Center (CR) and Upper Bound (UB) of the interval are written as (LB CR UB], or equally, LB < CR <= UB
For example (using format HH:MM:SS.ms):
(00:45:00.000 01:00:00.000 01:15:00.000]
(01:15:00.000 01:30:00.000 01:45:00.000]
Reference: Integrated Multi-satellitE Retrievals for GPM (IMERG) Technical Documentation
In practice, I use a 1 millisecond forward-offset to ensure that the LB is half-open or (CR-15m+1ms, CR, CR+15m]
For example,
(00:45:00.001 01:00:00.000 01:15:00.000]
(01:15:00.001 01:30:00.000 01:45:00.000]
Thus, IMERG intervals defined this way are non-overlapping. Except, when I use pystare.temporal_overlap to test this on the TIV covers I find that they do overlap.
Usage Summary:
Basically, I
- For each IMERG file I form three datetimes around a central time to make an half-open interval.
- Convert these to
numpy datetime int64.. - Apply
pystare.from_utc_variable()to get a TIV triplet for the interval. - Apply
pystare.from_temporal_triple()to this to get a TIV cover for the interval. - Use the
hex()orpystare.hex16()(seemingly give the same answer) as the<tcover>part of thechunk_name:
chunk_name format:
<pod-root>/<sid>/<tcover>-<dataset-name-pattern>
where
<sid> is the hex representation of the SID
<tcover> is the hex representation of the TIV covering the temporal range/interval of the granule.
I also see pystare.format_tpod(), pystare.make_tpod_tuple(), so should I be using these or similar instead?
Does this seem correct?
Example based three consecutive files for the PMOD discover data directory:
'DYAMONDv2_PE3600x1800-DE.prectot.20200116_0000z.nc4' => <tcover> = 0x1f90220025001c71
'DYAMONDv2_PE3600x1800-DE.prectot.20200116_0030z.nc4' => <tcover> = 0x1f902207a5001c71
'DYAMONDv2_PE3600x1800-DE.prectot.20200116_0100z.nc4' => <tcover> = 0x1f90221025001c71
Screen dump
halfstep = 0 days 00:15:00, <class 'pandas._libs.tslibs.timedeltas.Timedelta'>
offset = 0 days 00:00:00.001000, <class 'pandas._libs.tslibs.timedeltas.Timedelta'>
imerg_time_res = 28, <class 'int'>
imerg_time_res_arr = [28 28 28], <class 'numpy.int64'>
Reading /Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0000z.nc4
t_val = 0.0
t_units = minutes since 2020-01-16 00:00:00
t_begin_date = 20200116
t_begin_time = 0
t_begin_time_str = 0
t_time_increment = 164000
cr_ts_str = 2020-01-16 00:00:00, <class 'str'>
cr_ts = 2020-01-16 00:00:00, <class 'datetime.datetime'>
cr_dt = 2020-01-16 00:00:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
lb_ts = 2020-01-15 23:45:00.001000, <class 'datetime.datetime'>
lb_dt = 2020-01-15 23:45:00.001000, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
ub_ts = 2020-01-16 00:15:00, <class 'datetime.datetime'>
ub_dt = 2020-01-16 00:15:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
interval_npdt = [datetime.datetime(2020, 1, 15, 23, 45, 0, 1000), datetime.datetime(2020, 1, 16, 0, 0), datetime.datetime(2020, 1, 16, 0, 15)], type(interval_npdt[0]) = <class 'numpy.datetime64'>
interval_npdt_int = [1579131900001, 1579132800000, 1579133700000], type(interval_npdt_int[0]) = <class 'numpy.int64'>
interval_tiv_triple = [2274354625681316977, 2274355195838209137, 2274355211944336497], type(interval_tiv_triple[0]) = <class 'numpy.int64'>
interval_tiv_cover = 2274355195838209137, <class 'numpy.int64'>
2020-01-16 00:00:00
TimeStamps (2020-01-15 23:45:00.001000 <-> 2020-01-16 00:00:00.000000 <-> 2020-01-16 00:15:00.000000)
TIV triple (2274354625681316977 <-> 2274355195838209137 <-> 2274355211944336497)
TIV cover ( 2274355195838209137 )
TIV cover hex ( 0x1f90220025001c71 )
TIV cover bin ( 0b0001111110010000001000100000000000100101000000000001110001110001 )
Reading /Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0030z.nc4
t_val = 0.0
t_units = minutes since 2020-01-16 00:30:00
t_begin_date = 20200116
t_begin_time = 3000
t_begin_time_str = 3000
t_time_increment = 164000
cr_ts_str = 2020-01-16 00:30:00, <class 'str'>
cr_ts = 2020-01-16 00:30:00, <class 'datetime.datetime'>
cr_dt = 2020-01-16 00:30:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
lb_ts = 2020-01-16 00:15:00.001000, <class 'datetime.datetime'>
lb_dt = 2020-01-16 00:15:00.001000, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
ub_ts = 2020-01-16 00:45:00, <class 'datetime.datetime'>
ub_dt = 2020-01-16 00:45:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
interval_npdt = [datetime.datetime(2020, 1, 16, 0, 15, 0, 1000), datetime.datetime(2020, 1, 16, 0, 30), datetime.datetime(2020, 1, 16, 0, 45)], type(interval_npdt[0]) = <class 'numpy.datetime64'>
interval_npdt_int = [1579133700001, 1579134600000, 1579135500000], type(interval_npdt_int[0]) = <class 'numpy.int64'>
interval_tiv_triple = [2274355211944352881, 2274355228050463857, 2274355244156591217], type(interval_tiv_triple[0]) = <class 'numpy.int64'>
interval_tiv_cover = 2274355228050463857, <class 'numpy.int64'>
2020-01-16 00:30:00
TimeStamps (2020-01-16 00:15:00.001000 <-> 2020-01-16 00:30:00.000000 <-> 2020-01-16 00:45:00.000000)
TIV triple (2274355211944352881 <-> 2274355228050463857 <-> 2274355244156591217)
TIV cover ( 2274355228050463857 )
TIV cover hex ( 0x1f902207a5001c71 )
TIV cover bin ( 0b0001111110010000001000100000011110100101000000000001110001110001 )
Reading /Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0100z.nc4
t_val = 0.0
t_units = minutes since 2020-01-16 01:00:00
t_begin_date = 20200116
t_begin_time = 10000
t_begin_time_str = 10000
t_time_increment = 164000
cr_ts_str = 2020-01-16 01:00:00, <class 'str'>
cr_ts = 2020-01-16 01:00:00, <class 'datetime.datetime'>
cr_dt = 2020-01-16 01:00:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
lb_ts = 2020-01-16 00:45:00.001000, <class 'datetime.datetime'>
lb_dt = 2020-01-16 00:45:00.001000, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
ub_ts = 2020-01-16 01:15:00, <class 'datetime.datetime'>
ub_dt = 2020-01-16 01:15:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
interval_npdt = [datetime.datetime(2020, 1, 16, 0, 45, 0, 1000), datetime.datetime(2020, 1, 16, 1, 0), datetime.datetime(2020, 1, 16, 1, 15)], type(interval_npdt[0]) = <class 'numpy.datetime64'>
interval_npdt_int = [1579135500001, 1579136400000, 1579137300000], type(interval_npdt_int[0]) = <class 'numpy.int64'>
interval_tiv_triple = [2274355244156607601, 2274355264557685873, 2274355280663813233], type(interval_tiv_triple[0]) = <class 'numpy.int64'>
interval_tiv_cover = 2274355264557685873, <class 'numpy.int64'>
2020-01-16 01:00:00
TimeStamps (2020-01-16 00:45:00.001000 <-> 2020-01-16 01:00:00.000000 <-> 2020-01-16 01:15:00.000000)
TIV triple (2274355244156607601 <-> 2274355264557685873 <-> 2274355280663813233)
TIV cover ( 2274355264557685873 )
TIV cover hex ( 0x1f90221025001c71 )
TIV cover bin ( 0b0001111110010000001000100001000000100101000000000001110001110001 )
Temporal Overlap between 2020-01-16 00:00:00 and 2020-01-16 00:30:00 = True
Temporal Overlap between 2020-01-16 00:30:00 and 2020-01-16 01:00:00 = True
Temporal Overlap between 2020-01-16 00:00:00 and 2020-01-16 01:00:00 = False
Example code
#! /usr/bin/env python -tt
# -*- coding: utf-8; mode: python -*-
r"""
imergL3
~~~~~~~
"""
# Standard Imports
from datetime import datetime, timezone, timedelta
# Third-Party Imports
import numpy
import pandas
import netCDF4
# STARE Imports
import pystare
import starepandas
# Local Imports
##
# Markup Language Specification (see Google Python Style Guide https://google.github.io/styleguide/pyguide.html)
__docformat__ = "Google en"
# ---
##
# IMEG time interval info
imerg_tstep = 1800000 # 30 minutes in milliseconds
imerg_halfstep = 9e+11 # 15 minutes in nanoseconds
imerg_toffset = 1e+6 # 1 millisecond in nanoseconds
halfstep = pandas.Timedelta(imerg_halfstep, unit ='ns')
offset = pandas.Timedelta(imerg_toffset, unit ='ns')
print(f"halfstep = {halfstep}, {type(halfstep)}")
print(f"offset = {offset}, {type(offset)}")
##
# Determine the STARE time resolution
times = numpy.array([imerg_tstep], dtype=numpy.int64)
time_res_array = pystare.coarsest_resolution_finer_or_equal_ms(times)
imerg_time_res = int(time_res_array[0])
imerg_time_res_arr = numpy.array([imerg_time_res, imerg_time_res, imerg_time_res], dtype=numpy.int64)
print(f"imerg_time_res = {imerg_time_res}, {type(imerg_time_res)}")
print(f"imerg_time_res_arr = {imerg_time_res_arr}, {type(imerg_time_res_arr[0])}")
pod_cover_tivs = []
pod_ts_str = []
for idx, ifile in enumerate(["/Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0000z.nc4",
"/Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0030z.nc4",
"/Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0100z.nc4"]):
##
# Grab time data
print(f"\nReading {ifile}")
nc_file = netCDF4.Dataset(ifile, mode='r', format='NETCDF4')
##
# Pull from netCDF4 into numpy variables
time_var = nc_file.variables['time']
t_val = time_var[:][0]
t_units = time_var.units
t_begin_date = time_var.begin_date
t_begin_time = time_var.begin_time
t_begin_time_str = str(t_begin_time)
len_t_begin_time_str = len(t_begin_time_str)
t_time_increment = time_var.time_increment
print(f"\tt_val = {t_val}")
print(f"\tt_units = {t_units}")
print(f"\tt_begin_date = {t_begin_date}")
print(f"\tt_begin_time = {t_begin_time}")
print(f"\tt_begin_time_str = {t_begin_time_str}")
print(f"\tt_time_increment = {t_time_increment}\n")
##
# Use time units string as the CR datetime.
cr_ts_str = t_units[14:]
print(f"\tcr_ts_str = {cr_ts_str}, {type(cr_ts_str)}")
##
# Form the CR datetime
cr_ts = datetime.strptime(cr_ts_str, '%Y-%m-%d %H:%M:%S')
print(f"\tcr_ts = {cr_ts}, {type(cr_ts)}")
##
# Make it a Pandas TimeStamp
cr_dt = pandas.to_datetime(cr_ts, utc=False, unit='ns')
print(f"\tcr_dt = {cr_dt}, {type(cr_dt)}")
##
# Find the half-open Lower Bound (LB)
lb_ts = cr_ts - halfstep + offset
print(f"\tlb_ts = {lb_ts}, {type(lb_ts)}")
lb_dt = pandas.to_datetime(lb_ts, utc=False, unit='ns')
print(f"\tlb_dt = {lb_dt}, {type(lb_dt)}")
##
# Find the closed Upper Bound (UB)
ub_ts = cr_ts + halfstep
print(f"\tub_ts = {ub_ts}, {type(ub_ts)}")
ub_dt = pandas.to_datetime(ub_ts, utc=False, unit='ns')
print(f"\tub_dt = {ub_dt}, {type(ub_dt)}\n")
##
# Create an interval
interval = [lb_dt, cr_dt, ub_dt]
##
# Convert interval to numpy datetimes
interval_npdt = numpy.array(interval, dtype='datetime64[ms]')
print(f"\tinterval_npdt = {interval_npdt.tolist()}, {type(interval_npdt[0]) = }")
##
# Convert to 64-bit integer representation (based on milliseconds & UTC)
interval_npdt_int = numpy.array(interval, dtype='datetime64[ms]').astype(numpy.int64)
print(f"\tinterval_npdt_int = {interval_npdt_int.tolist()}, {type(interval_npdt_int[0]) = }")
##
# Convert integer representation to TIV triplet
interval_tiv_triple = pystare.from_utc_variable(interval_npdt_int, imerg_time_res_arr, imerg_time_res_arr)
print(f"\tinterval_tiv_triple = {interval_tiv_triple.tolist()}, {type(interval_tiv_triple[0]) = }")
##
# Calculate an interval TIV (cover).
interval_tiv_cover = pystare.from_temporal_triple(interval_tiv_triple)[0]
print(f"\tinterval_tiv_cover = {interval_tiv_cover}, {type(interval_tiv_cover)}")
##
# Summary
print("\n")
print(f"\t{cr_ts_str}")
print(f"\t\tTimeStamps ({interval[0].strftime('%Y-%m-%d %H:%M:%S.%f'):<26} <-> {interval[1].strftime('%Y-%m-%d %H:%M:%S.%f'):^26} <-> {interval[2].strftime('%Y-%m-%d %H:%M:%S.%f'):>26})")
print(f"\t\tTIV triple ({interval_tiv_triple[0]:<26d} <-> {interval_tiv_triple[1]:^26d} <-> {interval_tiv_triple[2]:>26d})")
print(f"\t\tTIV cover ({'':<26s} {interval_tiv_cover:^26d} {'':>26s})")
print(f"\t\tTIV cover hex ({'':<26s} {hex(interval_tiv_cover):^26} {'':>26s})")
print(f"\t\tTIV cover bin ({'':<10s}0b{numpy.binary_repr(interval_tiv_cover, width=64):64s}{'':>12s})")
pod_ts_str.append(cr_ts_str)
pod_cover_tivs.append(interval_tiv_cover)
##
# Compare if overlapping
overlap = pystare.temporal_overlap(numpy.array([pod_cover_tivs[0]]), numpy.array([pod_cover_tivs[1]]))
print(f"\nTemporal Overlap between {pod_ts_str[0]} and {pod_ts_str[1]} = {bool(overlap[0])}")
overlap = pystare.temporal_overlap(numpy.array([pod_cover_tivs[1]]), numpy.array([pod_cover_tivs[2]]))
print(f"Temporal Overlap between {pod_ts_str[1]} and {pod_ts_str[2]} = {bool(overlap[0])}")
overlap = pystare.temporal_overlap(numpy.array([pod_cover_tivs[0]]), numpy.array([pod_cover_tivs[2]]))
print(f"Temporal Overlap between {pod_ts_str[0]} and {pod_ts_str[2]} = {bool(overlap[0])}")