-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
Consider pulling a micro_df of all children under the age of 4 in the CPS:
from policyengine_us import Microsimulation
sim = Microsimulation(dataset='hf://policyengine/policyengine-us-data/cps_2023.h5')
df_person = sim.calculate_dataframe(['person_id', 'age'])
df_person[df_person['age'] < 4]
which yields:
In [4]: df_person[df_person['age'] < 4]
Out[4]:
weight person_id age
19 9418.161133 9704 2.0
36 4709.080566 12603 0.0
...
50835 4709.080566 8937804 2.0
50847 4709.080566 8940004 2.0
[2084 rows x 3 columns]
Now, add household_id:
df_hh = sim.calculate_dataframe(['household_id', 'person_id', 'age'])
df_hh[df_hh['age'] < 4]
The result is an empty data frame. To get up to about 90% of the data set, you'd have to go up to 160:
In [10]: df_hh[df_hh['age'] < 160]
Out[10]:
weight household_id person_id age
0 4709.080566 12 2403.0 124.0
1 4709.080566 21 6306.0 137.0
...
20653 4709.080566 89444 17888803.0 50.0
20654 4709.080566 89467 17893403.0 83.0
[18896 rows x 4 columns]
And this is the result of summing the ages. If you try to map to person to get around this problem, you hit an error:
df_hh = sim.calculate_dataframe(['household_id', 'person_id', 'age'], map_to = "person")
leads to:
ValueError: Length of weights (20655) does not match length of DataFrame (50863).
Workaround
Just use calculate and map to "person".
import pandas as pd
df = pd.DataFrame({
"household_id": sim.calculate("household_id", map_to="person"),
"tax_unit_id": sim.calculate("tax_unit_id", map_to="person"),
"person_id": sim.calculate("person_id", map_to="person"),
"age": sim.calculate("age", map_to="person")
})
df[df['age'] < 4]
Metadata
Metadata
Assignees
Labels
No labels