-
Notifications
You must be signed in to change notification settings - Fork 11
Description
hello phyr team.
I am interested in replicating the analysis carried out by Vincze et al. 2022 (https://doi.org/10.1038/s41586-021-04224-5), employing other explicatory variables using the library phyr since this package might allow better analysis. For my question, I used the data frame that these authors make available (https://github.com/OrsolyaVincze/VinczeEtal2021Nature/blob/main/SupplementaryData.xls).
The work carried out by Vincze et al. (2022) studied a simple measure of cancer mortality risk (CMR) in mammals in the relationship with variables like body size or lifespan. Addionatilly, it was necessary to incorporate phylogenetic information due to the lack of independence between the species analyzed.
The data frame contains:
Variable Response (CMR) = the ratio between the number of cancer-related deaths (Neoplasia) and the total number of individuals whose postmortem pathological records were entered in the database (knownDeaths).
Like a first approach, we think of the following model:
(cbind(Neoplasia, knownDeaths-Neoplasia) ~ log(BodyMass)+log(lifeexp), data = data, family = "binomial",cov_ranef = list(sp = phy),add.obs.re = FALSE)
However, when trying to make this model, we find some inconveniences.
First of all, I would like to ask to confirm my suspicions: the phyr library does not allow models to be made without any random variable,? Since I did not find any library that allows me to run a binomial model for proportions considering the phylogenetic information and not including any random variable. I ask this because it is possible that a binomial model (for proportions) does not have overdispersion, so in this case for these data I would not need to use mixed models.
However, it is known that binomial distribution models tend to suffer from overdispersion, so we decided to incorporate a random effects variable (OLRE) to take into account overdispersion. In this way, we can use the library without any inconvenience, using the following model:
M1=pglmm(cbind(Neoplasia, knownDeaths-Neoplasia) ~ log(BodyMass)+log(lifeexp)+ (1|OLRE), data = data, family = "binomial",cov_ranef = list(sp = phy),add.obs.re = FALSE)
However, this model shows a bad fit and overdispersion caused by an excess of 0.



So, as a next step, we decided to implement zero-inflated models that this library would allow us to incorporate, using the following model:
M2=pglmm(cbind(Neoplasia, knownDeaths-Neoplasia) ~ log(BodyMass)+log(lifeexp)+ (1|OLRE), data = data, family = "zeroinflated.binomial",cov_ranef = list(sp = phy),add.obs.re = FALSE,bayes=TRUE,verbose=TRUE), but.....

Consider entering the response variable as the proportion (CMR), but according to inla I have to enter it as an interger, but my problem is that I have no way to define the weights or the Ntrial for that proportion.
I admit that I have problems with the syntax of my model in relation to the response variable.
Could someone help me with the syntax to be able to run the model I need?
Thank you very much for your time and help.
Nicolas, thanks, thanks, thankssssssss........
Attached the database and the script used
`library(ggplot2)
library(ape)
library(car)
library(phytools)
library(phylolm) # phyloglm
library(phyr)
library(DHARMa)
library(dplyr)
str(data)
data$OLRE=as.factor(data$OLRE)
data$Species=as.factor(data$Species)
data$order=as.factor(data$order)
Species-specific body mass
data$BodyMass <- (data$MaleMeanMass + data$FemaleMeanMass)/2
phylogeny
phy <- read.nexus("consensus_phylogeny.tre") # consensus vertlife tree
phy <- bind.tip(phy, "Cervus_canadensis", where = which(phy$tip.label=="Cervus_elaphus"),
edge.length=0.5, position = 0.5)
phy <- bind.tip(phy, "Gazella_marica", where = which(phy$tip.label=="Gazella_subgutturosa"),
edge.length=0.5, position = 0.5)
#firts model
M0=pglmm(cbind(Neoplasia, Nodeadbycancer) ~ log(BodyMass), data = data, family = 'binomial',cov_ranef = list(sp = phy))
No random terms specified, use lm or glm instead, There is a possibility that running a model without random effects?
M1=pglmm(cbind(Neoplasia, knownDeaths-Neoplasia) ~ log(BodyMass)+log(lifeexp)+ (1|OLRE), data = data, family = "binomial",cov_ranef = list(sp = phy),add.obs.re = FALSE)
summary(M1)
simulationOut<- simulateResiduals(fittedModel = M1, n = 250,refit = FALSE)
plot(simulationOut) # bad fit
testZeroInflation(simulationOut) # overdispersion by zeros
testOverdispersion(simulationOut) # overdispersion by not zeros is good
M2=pglmm(cbind(Neoplasia, knownDeaths-Neoplasia) ~ log(BodyMass)+log(lifeexp)+ (1|OLRE), data = data, family = " zeroinflated.binomial",cov_ranef = list(sp = phy),add.obs.re = FALSE,bayes=TRUE)
`
dataIZ.txt