Skip to content

calculate_dataframe has problems when combinations of person_id and other id types are used #398

@baogorek

Description

@baogorek

Consider pulling a micro_df of all children under the age of 4 in the CPS:

from policyengine_us import Microsimulation
sim = Microsimulation(dataset='hf://policyengine/policyengine-us-data/cps_2023.h5')
df_person = sim.calculate_dataframe(['person_id', 'age'])
df_person[df_person['age'] < 4]

which yields:

In [4]: df_person[df_person['age'] < 4]
Out[4]: 
            weight  person_id  age
19     9418.161133       9704  2.0
36     4709.080566      12603  0.0
...
50835  4709.080566    8937804  2.0
50847  4709.080566    8940004  2.0

[2084 rows x 3 columns]

Now, add household_id:

df_hh = sim.calculate_dataframe(['household_id', 'person_id', 'age'])
df_hh[df_hh['age'] < 4]

The result is an empty data frame. To get up to about 90% of the data set, you'd have to go up to 160:

In [10]: df_hh[df_hh['age'] < 160]
Out[10]: 
            weight  household_id   person_id    age
0      4709.080566            12      2403.0  124.0
1      4709.080566            21      6306.0  137.0
...
20653  4709.080566         89444  17888803.0   50.0
20654  4709.080566         89467  17893403.0   83.0

[18896 rows x 4 columns]

And this is the result of summing the ages. If you try to map to person to get around this problem, you hit an error:

df_hh = sim.calculate_dataframe(['household_id', 'person_id', 'age'], map_to = "person")

leads to:

ValueError: Length of weights (20655) does not match length of DataFrame (50863).

Workaround

Just use calculate and map to "person".

import pandas as pd
df = pd.DataFrame({
    "household_id": sim.calculate("household_id", map_to="person"),
    "tax_unit_id": sim.calculate("tax_unit_id", map_to="person"),
     "person_id": sim.calculate("person_id", map_to="person"),
     "age": sim.calculate("age", map_to="person")
})
df[df['age'] < 4]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions