Skip to content

how to do many-to-many operations on STAREPandas DFs #151

@mbauer288

Description

@mbauer288

STAREPandas DFs restricted to the same timestamp (i.e., contemporaneous via temporal intersection).

From DFs with all timestamps:

imerg_ts = np.sort(imerg_sdf['timestamp'].unique())
mcms_ts = np.sort(mcms_sdf['timestamp'].unique())
# Merged, unique datetimes
merged_ts = np.union1d(imerg_ts, mcms_ts)    

##
# Sort by TimeStamp
imerg_sdf_by_ts = imerg_sdf.sort_values(by=["timestamp"])
mcms_sdf_by_ts = mcms_sdf.sort_values(by=["timestamp"])

for aidx, a_time in enumerate(merged_ts):
   ##
   # MCMS subset with just this DTime.
   mcms_sdf_now = mcms_sdf_by_ts[mcms_sdf_by_ts.timestamp == a_time]
   mcms_sdf_now.reset_index(inplace=True, drop=True)        

   ##
   # IMERG subset with just this DTime.
   imerg_sdf_now = imerg_sdf_by_ts[imerg_sdf_by_ts.timestamp == a_time]
   imerg_sdf_now.reset_index(inplace=True, drop=True)

This give something like this for each a_time:

 imerg_sdf_now   
        label timestamp   itivs                x y cell_areas tot_area precips tot_precip  sids cover trixels
     0  87    2021-01-10  2275465702582262897  ...                                                        ...
     1  91    2021-01-10  2275465702582262897  ...                                                        ...

 mcms_sdf_now  
        usi                   uci                   timestamp  tivs30               lon lat cslp ctype cinten tinten depth sarea  sa_fill vert_poly_geo verts sids cover trixels
     0  20210109150539835085  20210110000540035625 2021-01-10  2275465702582262897  ...                                                                                      ...
     1  20210109030280028312  20210110000255028312 2021-01-10  2275465702582262897  ...                                                                                      ...
     2  20210109150515029437  20210110000500029687 2021-01-10  2275465702582262897  ...                                                                                      ...
     3  20210109150530500800  20210110000525001062 2021-01-10  2275465702582262897  ...                                                                                      ...
     4  20210108030470001937  20210110000500002687 2021-01-10  2275465702582262897  ...                                                                                      ...
     5  20210107180425030375  20210110000370031000 2021-01-10  2275465702582262897  ...                                                                                      ...
     6  20210106150557234913  20210110000495000125 2021-01-10  2275465702582262897  ...                                                                                      ...
     7  20210109210595025312  20210110000605025437 2021-01-10  2275465702582262897  ...                                                                                      ...

The problem

Property differences:

  • ETC centers are spatially contiguous but possibly nested (center-B may be wholly enclosed with center-A).
  • IMERG features are not always spatially contiguous (i.e., disjoint), but are never nested or overlapping.

The spatial relationship between contemporaneous ETC centers and IMERG features is thus Many-to-Many:

  • A relationship between sets (dataframes) with two properties:
    1. Members of one set (dataframe row) can potentially link to any member (row) of the other set.
      a. Each ETC center needs to be checked against each IMERG feature.
    2. A member of one set (row) can potentially link to no, one or multiple members (rows) of the other set.
      a. An ETC center may intersect with no, one or many IMERG features, and likewise, an IMERG feature may intersect with no, one or many ETC centers.

Example solution using placeholder data.

mcms_data = {'uci': ["uci-a", "uci-b"], 'vert_poly_geo': ["poly-a", "poly-b"], 'sids': ["sids-a", "sids-b"], 'cover': ["cover-a", "cover-b"], 'trixels': ["trixels-a", "trixels-b"]}
mcms_sdf_now = pandas.DataFrame.from_dict(mcms_data)

imerg_data = {"label": [87, 91], "sids": ["sids-87", "sids-91"], "cover": ["cover-87", "cover-91"], "trixels": ["trixels-87", "trixels-91"]}
imerg_sdf_now = pandas.DataFrame.from_dict(imerg_data)

# Merge so info about each IMERG feature is available for each ETC center (uci)
combined = mcms_sdf_now.merge(imerg_sdf_now, how='cross', suffixes=('_mcms', '_imerg'))
     uci vert_poly_geo    sids    cover    trixels
0  uci-a        poly-a  sids-a  cover-a  trixels-a
1  uci-b        poly-b  sids-b  cover-b  trixels-b

   label     sids     cover     trixels
0     87  sids-87  cover-87  trixels-87
1     91  sids-91  cover-91  trixels-91

     uci vert_poly_geo sids_mcms cover_mcms trixels_mcms  label sids_imerg cover_imerg trixels_imerg
0  uci-a        poly-a    sids-a    cover-a    trixels-a     87    sids-87    cover-87    trixels-87
1  uci-a        poly-a    sids-a    cover-a    trixels-a     91    sids-91    cover-91    trixels-91
2  uci-b        poly-b    sids-b    cover-b    trixels-b     87    sids-87    cover-87    trixels-87
3  uci-b        poly-b    sids-b    cover-b    trixels-b     91    sids-91    cover-91    trixels-91

Now I can check the combined DF for spatial intersection between the columns "sids_mcms" and "sids_imerg" for each row, storing the intersecting SIDs (if any) in a new column "sids_st". I guess I could also make a "cover_st" and "trixels_st" column based on "sids_st" as well.

Then I can plot the ETC trixels, the full IMERG trixels or the space-time intersecting (st) IMERG trixels as required.

I know how to brute force this using loops and starepandas [stare_intersection(), to_trixels() and stare_dissolve()], but is there a simple DF set of operations to so this last part?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions