Skip to content

Advise for pod chucking name formation (example). #147

@mbauer288

Description

@mbauer288

Output of test.py

Oddities with these particular IMERG files

Model simulated dataset that has been interpolated to the IMERG grids (0.1x0.1). Provided by Jiun-Dar Chern on Discover.

From today's zoom meeting I see that unlike IMERG data these are instantaneous values rather than spanning an time interval.

Well, my general question still stands regarding how to encode IMERG files in terms of time. For example, should each time sample overlap? If not how and by how much to I force a separation in terms of STARE time covers? As I have done it below, they are in principle separated by 1 ms. However, they still report as intersecting. So I must be doing something wrong there.

Mike

IMERG Time Standards

For IMERG a single day consists of 48 half-open intervals centered on the hour or half-hour: ['00:00', '00:30', ... '23:00', '23:30']

The half-open intervals are such that the Lower Bound (LB), Center (CR) and Upper Bound (UB) of the interval are written as (LB CR UB], or equally, LB < CR <= UB

For example (using format HH:MM:SS.ms):

    (00:45:00.000 01:00:00.000 01:15:00.000]
                                           (01:15:00.000 01:30:00.000 01:45:00.000]

Reference: Integrated Multi-satellitE Retrievals for GPM (IMERG) Technical Documentation

In practice, I use a 1 millisecond forward-offset to ensure that the LB is half-open or (CR-15m+1ms, CR, CR+15m]

For example,

    (00:45:00.001 01:00:00.000 01:15:00.000]
                                           (01:15:00.001 01:30:00.000 01:45:00.000]

Thus, IMERG intervals defined this way are non-overlapping. Except, when I use pystare.temporal_overlap to test this on the TIV covers I find that they do overlap.

Usage Summary:

Basically, I

  1. For each IMERG file I form three datetimes around a central time to make an half-open interval.
  2. Convert these to numpy datetime int64..
  3. Apply pystare.from_utc_variable() to get a TIV triplet for the interval.
  4. Apply pystare.from_temporal_triple() to this to get a TIV cover for the interval.
  5. Use the hex() or pystare.hex16() (seemingly give the same answer) as the <tcover> part of the chunk_name:
    chunk_name format:
        <pod-root>/<sid>/<tcover>-<dataset-name-pattern>
    where
        <sid>    is the hex representation of the SID
        <tcover> is the hex representation of the TIV covering the temporal range/interval of the granule.

I also see pystare.format_tpod(), pystare.make_tpod_tuple(), so should I be using these or similar instead?

Does this seem correct?


Example based three consecutive files for the PMOD discover data directory:

    'DYAMONDv2_PE3600x1800-DE.prectot.20200116_0000z.nc4' => <tcover> = 0x1f90220025001c71
    'DYAMONDv2_PE3600x1800-DE.prectot.20200116_0030z.nc4' => <tcover> = 0x1f902207a5001c71
    'DYAMONDv2_PE3600x1800-DE.prectot.20200116_0100z.nc4' => <tcover> = 0x1f90221025001c71

Screen dump

halfstep                = 0 days 00:15:00, <class 'pandas._libs.tslibs.timedeltas.Timedelta'>
offset                  = 0 days 00:00:00.001000, <class 'pandas._libs.tslibs.timedeltas.Timedelta'>
imerg_time_res          = 28, <class 'int'>
imerg_time_res_arr      = [28 28 28], <class 'numpy.int64'>

Reading /Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0000z.nc4
    t_val               = 0.0
    t_units             = minutes since 2020-01-16 00:00:00
    t_begin_date        = 20200116
    t_begin_time        = 0
    t_begin_time_str    = 0
    t_time_increment    = 164000

    cr_ts_str           = 2020-01-16 00:00:00, <class 'str'>
    cr_ts               = 2020-01-16 00:00:00, <class 'datetime.datetime'>
    cr_dt               = 2020-01-16 00:00:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
    lb_ts               = 2020-01-15 23:45:00.001000, <class 'datetime.datetime'>
    lb_dt               = 2020-01-15 23:45:00.001000, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
    ub_ts               = 2020-01-16 00:15:00, <class 'datetime.datetime'>
    ub_dt               = 2020-01-16 00:15:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>

    interval_npdt       = [datetime.datetime(2020, 1, 15, 23, 45, 0, 1000), datetime.datetime(2020, 1, 16, 0, 0), datetime.datetime(2020, 1, 16, 0, 15)], type(interval_npdt[0]) = <class 'numpy.datetime64'>
    interval_npdt_int   = [1579131900001, 1579132800000, 1579133700000], type(interval_npdt_int[0]) = <class 'numpy.int64'>
    interval_tiv_triple = [2274354625681316977, 2274355195838209137, 2274355211944336497], type(interval_tiv_triple[0]) = <class 'numpy.int64'>
    interval_tiv_cover  = 2274355195838209137, <class 'numpy.int64'>

    2020-01-16 00:00:00
        TimeStamps    (2020-01-15 23:45:00.001000 <-> 2020-01-16 00:00:00.000000 <-> 2020-01-16 00:15:00.000000)
        TIV triple    (2274354625681316977        <->    2274355195838209137     <->        2274355211944336497)
        TIV cover     (                                  2274355195838209137                                   )
        TIV cover hex (                                   0x1f90220025001c71                                   )
        TIV cover bin (          0b0001111110010000001000100000000000100101000000000001110001110001            )

Reading /Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0030z.nc4
    t_val               = 0.0
    t_units             = minutes since 2020-01-16 00:30:00
    t_begin_date        = 20200116
    t_begin_time        = 3000
    t_begin_time_str    = 3000
    t_time_increment    = 164000

    cr_ts_str           = 2020-01-16 00:30:00, <class 'str'>
    cr_ts               = 2020-01-16 00:30:00, <class 'datetime.datetime'>
    cr_dt               = 2020-01-16 00:30:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
    lb_ts               = 2020-01-16 00:15:00.001000, <class 'datetime.datetime'>
    lb_dt               = 2020-01-16 00:15:00.001000, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
    ub_ts               = 2020-01-16 00:45:00, <class 'datetime.datetime'>
    ub_dt               = 2020-01-16 00:45:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>

    interval_npdt       = [datetime.datetime(2020, 1, 16, 0, 15, 0, 1000), datetime.datetime(2020, 1, 16, 0, 30), datetime.datetime(2020, 1, 16, 0, 45)], type(interval_npdt[0]) = <class 'numpy.datetime64'>
    interval_npdt_int   = [1579133700001, 1579134600000, 1579135500000], type(interval_npdt_int[0]) = <class 'numpy.int64'>
    interval_tiv_triple = [2274355211944352881, 2274355228050463857, 2274355244156591217], type(interval_tiv_triple[0]) = <class 'numpy.int64'>
    interval_tiv_cover  = 2274355228050463857, <class 'numpy.int64'>

    2020-01-16 00:30:00
        TimeStamps    (2020-01-16 00:15:00.001000 <-> 2020-01-16 00:30:00.000000 <-> 2020-01-16 00:45:00.000000)
        TIV triple    (2274355211944352881        <->    2274355228050463857     <->        2274355244156591217)
        TIV cover     (                                  2274355228050463857                                   )
        TIV cover hex (                                   0x1f902207a5001c71                                   )
        TIV cover bin (          0b0001111110010000001000100000011110100101000000000001110001110001            )

Reading /Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0100z.nc4
    t_val               = 0.0
    t_units             = minutes since 2020-01-16 01:00:00
    t_begin_date        = 20200116
    t_begin_time        = 10000
    t_begin_time_str    = 10000
    t_time_increment    = 164000

    cr_ts_str           = 2020-01-16 01:00:00, <class 'str'>
    cr_ts               = 2020-01-16 01:00:00, <class 'datetime.datetime'>
    cr_dt               = 2020-01-16 01:00:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
    lb_ts               = 2020-01-16 00:45:00.001000, <class 'datetime.datetime'>
    lb_dt               = 2020-01-16 00:45:00.001000, <class 'pandas._libs.tslibs.timestamps.Timestamp'>
    ub_ts               = 2020-01-16 01:15:00, <class 'datetime.datetime'>
    ub_dt               = 2020-01-16 01:15:00, <class 'pandas._libs.tslibs.timestamps.Timestamp'>

    interval_npdt       = [datetime.datetime(2020, 1, 16, 0, 45, 0, 1000), datetime.datetime(2020, 1, 16, 1, 0), datetime.datetime(2020, 1, 16, 1, 15)], type(interval_npdt[0]) = <class 'numpy.datetime64'>
    interval_npdt_int   = [1579135500001, 1579136400000, 1579137300000], type(interval_npdt_int[0]) = <class 'numpy.int64'>
    interval_tiv_triple = [2274355244156607601, 2274355264557685873, 2274355280663813233], type(interval_tiv_triple[0]) = <class 'numpy.int64'>
    interval_tiv_cover  = 2274355264557685873, <class 'numpy.int64'>

    2020-01-16 01:00:00
        TimeStamps    (2020-01-16 00:45:00.001000 <-> 2020-01-16 01:00:00.000000 <-> 2020-01-16 01:15:00.000000)
        TIV triple    (2274355244156607601        <->    2274355264557685873     <->        2274355280663813233)
        TIV cover     (                                  2274355264557685873                                   )
        TIV cover hex (                                   0x1f90221025001c71                                   )
        TIV cover bin (          0b0001111110010000001000100001000000100101000000000001110001110001            )

Temporal Overlap between 2020-01-16 00:00:00 and 2020-01-16 00:30:00 = True
Temporal Overlap between 2020-01-16 00:30:00 and 2020-01-16 01:00:00 = True
Temporal Overlap between 2020-01-16 00:00:00 and 2020-01-16 01:00:00 = False

Example code

#! /usr/bin/env python -tt
# -*- coding: utf-8; mode: python -*-
r"""

imergL3
~~~~~~~
"""
# Standard Imports
from datetime import datetime, timezone, timedelta

# Third-Party Imports
import numpy
import pandas
import netCDF4

# STARE Imports
import pystare
import starepandas

# Local Imports

##
# Markup Language Specification (see Google Python Style Guide https://google.github.io/styleguide/pyguide.html)
__docformat__ = "Google en"
# ---

##
# IMEG time interval info
imerg_tstep = 1800000       # 30 minutes in milliseconds
imerg_halfstep = 9e+11      # 15 minutes in nanoseconds
imerg_toffset = 1e+6        # 1 millisecond in nanoseconds

halfstep = pandas.Timedelta(imerg_halfstep, unit ='ns')
offset = pandas.Timedelta(imerg_toffset, unit ='ns')
print(f"halfstep                = {halfstep}, {type(halfstep)}")
print(f"offset                  = {offset}, {type(offset)}")

##
# Determine the STARE time resolution
times = numpy.array([imerg_tstep], dtype=numpy.int64)
time_res_array = pystare.coarsest_resolution_finer_or_equal_ms(times)
imerg_time_res = int(time_res_array[0])
imerg_time_res_arr = numpy.array([imerg_time_res, imerg_time_res, imerg_time_res], dtype=numpy.int64)
print(f"imerg_time_res          = {imerg_time_res}, {type(imerg_time_res)}")
print(f"imerg_time_res_arr      = {imerg_time_res_arr}, {type(imerg_time_res_arr[0])}")

pod_cover_tivs = []
pod_ts_str = []
for idx, ifile in enumerate(["/Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0000z.nc4",
                             "/Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0030z.nc4",
                             "/Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.prectot.20200116_0100z.nc4"]):
    ##
    # Grab time data
    print(f"\nReading {ifile}")
    nc_file = netCDF4.Dataset(ifile, mode='r', format='NETCDF4')

    ##
    # Pull from netCDF4 into numpy variables
    time_var = nc_file.variables['time']
    t_val = time_var[:][0]
    t_units = time_var.units
    t_begin_date = time_var.begin_date
    t_begin_time = time_var.begin_time
    t_begin_time_str = str(t_begin_time)
    len_t_begin_time_str = len(t_begin_time_str)
    t_time_increment = time_var.time_increment
    print(f"\tt_val               = {t_val}")
    print(f"\tt_units             = {t_units}")
    print(f"\tt_begin_date        = {t_begin_date}")
    print(f"\tt_begin_time        = {t_begin_time}")
    print(f"\tt_begin_time_str    = {t_begin_time_str}")
    print(f"\tt_time_increment    = {t_time_increment}\n")

    ##
    # Use time units string as the CR datetime.
    cr_ts_str = t_units[14:]
    print(f"\tcr_ts_str           = {cr_ts_str}, {type(cr_ts_str)}")

    ##
    # Form the CR datetime
    cr_ts = datetime.strptime(cr_ts_str, '%Y-%m-%d %H:%M:%S')
    print(f"\tcr_ts               = {cr_ts}, {type(cr_ts)}")

    ##
    # Make it a Pandas TimeStamp
    cr_dt = pandas.to_datetime(cr_ts, utc=False, unit='ns')
    print(f"\tcr_dt               = {cr_dt}, {type(cr_dt)}")

    ##
    # Find the half-open Lower Bound (LB)
    lb_ts = cr_ts - halfstep + offset
    print(f"\tlb_ts               = {lb_ts}, {type(lb_ts)}")
    lb_dt = pandas.to_datetime(lb_ts, utc=False, unit='ns')
    print(f"\tlb_dt               = {lb_dt}, {type(lb_dt)}")

    ##
    # Find the closed Upper Bound (UB)
    ub_ts = cr_ts + halfstep
    print(f"\tub_ts               = {ub_ts}, {type(ub_ts)}")
    ub_dt = pandas.to_datetime(ub_ts, utc=False, unit='ns')
    print(f"\tub_dt               = {ub_dt}, {type(ub_dt)}\n")

    ##
    # Create an interval
    interval = [lb_dt, cr_dt, ub_dt]

    ##
    # Convert interval to numpy datetimes
    interval_npdt = numpy.array(interval, dtype='datetime64[ms]')
    print(f"\tinterval_npdt       = {interval_npdt.tolist()}, {type(interval_npdt[0]) = }")

    ##
    # Convert to 64-bit integer representation (based on milliseconds & UTC)
    interval_npdt_int = numpy.array(interval, dtype='datetime64[ms]').astype(numpy.int64)
    print(f"\tinterval_npdt_int   = {interval_npdt_int.tolist()}, {type(interval_npdt_int[0]) = }")

    ##
    # Convert integer representation to TIV triplet
    interval_tiv_triple = pystare.from_utc_variable(interval_npdt_int, imerg_time_res_arr, imerg_time_res_arr)
    print(f"\tinterval_tiv_triple = {interval_tiv_triple.tolist()}, {type(interval_tiv_triple[0]) = }")

    ##
    # Calculate an interval TIV (cover).
    interval_tiv_cover = pystare.from_temporal_triple(interval_tiv_triple)[0]
    print(f"\tinterval_tiv_cover  = {interval_tiv_cover}, {type(interval_tiv_cover)}")

    ##
    # Summary
    print("\n")
    print(f"\t{cr_ts_str}")
    print(f"\t\tTimeStamps    ({interval[0].strftime('%Y-%m-%d %H:%M:%S.%f'):<26} <-> {interval[1].strftime('%Y-%m-%d %H:%M:%S.%f'):^26} <-> {interval[2].strftime('%Y-%m-%d %H:%M:%S.%f'):>26})")
    print(f"\t\tTIV triple    ({interval_tiv_triple[0]:<26d} <-> {interval_tiv_triple[1]:^26d} <-> {interval_tiv_triple[2]:>26d})")
    print(f"\t\tTIV cover     ({'':<26s}     {interval_tiv_cover:^26d}     {'':>26s})")
    print(f"\t\tTIV cover hex ({'':<26s}     {hex(interval_tiv_cover):^26}     {'':>26s})")
    print(f"\t\tTIV cover bin ({'':<10s}0b{numpy.binary_repr(interval_tiv_cover, width=64):64s}{'':>12s})")

    pod_ts_str.append(cr_ts_str)
    pod_cover_tivs.append(interval_tiv_cover)

##
# Compare if overlapping
overlap = pystare.temporal_overlap(numpy.array([pod_cover_tivs[0]]), numpy.array([pod_cover_tivs[1]]))
print(f"\nTemporal Overlap between {pod_ts_str[0]} and {pod_ts_str[1]} = {bool(overlap[0])}")

overlap = pystare.temporal_overlap(numpy.array([pod_cover_tivs[1]]), numpy.array([pod_cover_tivs[2]]))
print(f"Temporal Overlap between {pod_ts_str[1]} and {pod_ts_str[2]} = {bool(overlap[0])}")

overlap = pystare.temporal_overlap(numpy.array([pod_cover_tivs[0]]), numpy.array([pod_cover_tivs[2]]))
print(f"Temporal Overlap between {pod_ts_str[0]} and {pod_ts_str[2]} = {bool(overlap[0])}")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions