-
Notifications
You must be signed in to change notification settings - Fork 18
Description
As you know, Stata basically stores value-labeled data as a vector of integers or doubles, not necessarily an ordered sequence starting at 1, and a Dict going from Int => String.
Accessing the string values, which we generally care the most about, is hard with ReadStat. You have to
- Use ReadStat not StatFiles to access the internal fields of the Stata File
- Construct the
DataFamefrom the data and header fields
3 . Use thevalue_label_dictfield to perform the replacement - Use
geton the DataValue elements of the array
This is not the most user friendly thing.
There isn't a great solution for this in Julia as we dont have a CategoricalArray equivalent where the base dict maps arbitrary types to strings. So converting to categorical array will drop the underlying integers, which are useful to keep due to inter-operability.
haven in R recently made a change with how this is handled with the <dbl+lbl> vector type. Though working with it is a bit of a pain, see here.
I can email a data-set to someone with an MWE for more information.