Skip to content

decode_psc duplicates data #18

@JamesMcClung

Description

@JamesMcClung

pscpy/src/pscpy/psc.py

Lines 116 to 121 in 8185aba

data_vars = {}
for var_name in ds:
if var_name in field_to_component:
for field, component in field_to_component[var_name].items(): # type: ignore[index]
data_vars[field] = ds[var_name].isel({f"comp_{var_name}": component})
ds = ds.assign(data_vars)

The code above creates 9 new variables (jx_ec, etc.) from the original jeh variable, but retains jeh. Unfortunately, the new variables aren't views; they are copies. To see this, note the "Size" field printed by the following demo:

import pscpy
import xarray

ds = xarray.load_dataset("path/to/pfd.000000000.bp")

print("Before decode:")
print(ds)

ds = pscpy.decode_psc(ds, ["e", "i"]) # or whatever species names there are

print("After decode")
print(ds)

When I run this on my data, the first print gives "Size: 18kB", and the second gives "Size: 37kB". When ds = ds.assign(data_vars) is removed, both have a size of 18 kB, but of course, then the latter dataset is missing the decoded variables. I think dropping the original jeh would fix this issue, but if jeh is required for some other reason, then this might be tricky to fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions