I believe I have a work around for this, so more of a heads up than something that needs solving right now.
The documentation says that the phentoype can be provided as
- a list of strings,
- integer binary data,
- numeric continuous data
- pandas Series, DataFrame or numpy array
I'm using a linear regression - I've got 11 different cancer diagnoses in my dataset. I'm taking the phenotype data from a metadata dataframe. If I pass it in as a Series, it breaks - giving me "Could not understand your pheno_data". In the following, phenotype is a pandas Series, containing strings.
methylize.diff_meth_pos(df, phenotype)
ValueError Traceback (most recent call last)
[/data/projects/classifiers/src/exploration/methylation/differentialMethylation.ipynb](https://vscode-remote+ssh-002dremote-002bmtbnotes-002ddev-002ezerochildhoodcancer-002ecloud.vscode-resource.vscode-cdn.net/data/projects/classifiers/src/exploration/methylation/differentialMethylation.ipynb) Cell 17 in ()
----> [1](vscode-notebook-cell://ssh-remote%2Bmtbnotes-dev.zerochildhoodcancer.cloud/data/projects/classifiers/src/exploration/methylation/differentialMethylation.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) methylize.diff_meth_pos(meth_data, phenotype)
File [/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/methylize/diff_meth_pos.py:210](https://vscode-remote+ssh-002dremote-002bmtbnotes-002ddev-002ezerochildhoodcancer-002ecloud.vscode-resource.vscode-cdn.net/data/projects/classifiers/bin/envs/classifiers/lib/python3.10/site-packages/methylize/diff_meth_pos.py:210), in diff_meth_pos(meth_data, pheno_data, regression_method, impute, **kwargs)
208 regression_method = 'linear'
209 else:
--> 210 raise ValueError("Could not understand your pheno_data.")
211 else:
212 raise ValueError(f"pheno_data must be list-like, or if a DataFrame, specify the 'column' to use.")
ValueError: Could not understand your pheno_data.
It won't accept a pandas Series.
It won't accept a list of strings if I convert the series to a list.
It will accept it, and run if I map the strings to integers, i.e.:
unique_strings = phenotype.unique()
string_to_int_map = {string: i for i, string in enumerate(unique_strings)}
phenotype = [string_to_int_map[string] for string in phenotype]
results = methylize.diff_meth_pos(df, phenotype)
It was my understanding from the documentation that methylize would internally maps strings to integers, but that doesn't appear to be working, if my understanding of it is correct.
Cheers
Ben.
I believe I have a work around for this, so more of a heads up than something that needs solving right now.
The documentation says that the phentoype can be provided as
I'm using a linear regression - I've got 11 different cancer diagnoses in my dataset. I'm taking the phenotype data from a metadata dataframe. If I pass it in as a Series, it breaks - giving me "Could not understand your pheno_data". In the following, phenotype is a pandas Series, containing strings.
It won't accept a pandas Series.
It won't accept a list of strings if I convert the series to a list.
It will accept it, and run if I map the strings to integers, i.e.:
It was my understanding from the documentation that methylize would internally maps strings to integers, but that doesn't appear to be working, if my understanding of it is correct.
Cheers
Ben.