Skip to content

Problem with pglmm Bayes=TRUE model syntax #73

@Flaiba

Description

@Flaiba

hello phyr team.
I am interested in replicating the analysis carried out by Vincze et al. 2022 (https://doi.org/10.1038/s41586-021-04224-5), employing other explicatory variables using the library phyr since this package might allow better analysis. For my question, I used the data frame that these authors make available (https://github.com/OrsolyaVincze/VinczeEtal2021Nature/blob/main/SupplementaryData.xls).
The work carried out by Vincze et al. (2022) studied a simple measure of cancer mortality risk (CMR) in mammals in the relationship with variables like body size or lifespan. Addionatilly, it was necessary to incorporate phylogenetic information due to the lack of independence between the species analyzed.
The data frame contains:
Variable Response (CMR) = the ratio between the number of cancer-related deaths (Neoplasia) and the total number of individuals whose postmortem pathological records were entered in the database (knownDeaths).

Like a first approach, we think of the following model:
(cbind(Neoplasia, knownDeaths-Neoplasia) ~ log(BodyMass)+log(lifeexp), data = data, family = "binomial",cov_ranef = list(sp = phy),add.obs.re = FALSE)

However, when trying to make this model, we find some inconveniences.
First of all, I would like to ask to confirm my suspicions: the phyr library does not allow models to be made without any random variable,? Since I did not find any library that allows me to run a binomial model for proportions considering the phylogenetic information and not including any random variable. I ask this because it is possible that a binomial model (for proportions) does not have overdispersion, so in this case for these data I would not need to use mixed models.
However, it is known that binomial distribution models tend to suffer from overdispersion, so we decided to incorporate a random effects variable (OLRE) to take into account overdispersion. In this way, we can use the library without any inconvenience, using the following model:

M1=pglmm(cbind(Neoplasia, knownDeaths-Neoplasia) ~ log(BodyMass)+log(lifeexp)+ (1|OLRE), data = data, family = "binomial",cov_ranef = list(sp = phy),add.obs.re = FALSE)

However, this model shows a bad fit and overdispersion caused by an excess of 0.
M1_FIT
M1_Overdis_by zeros
M1_overdisp

So, as a next step, we decided to implement zero-inflated models that this library would allow us to incorporate, using the following model:
M2=pglmm(cbind(Neoplasia, knownDeaths-Neoplasia) ~ log(BodyMass)+log(lifeexp)+ (1|OLRE), data = data, family = "zeroinflated.binomial",cov_ranef = list(sp = phy),add.obs.re = FALSE,bayes=TRUE,verbose=TRUE), but.....

image
Consider entering the response variable as the proportion (CMR), but according to inla I have to enter it as an interger, but my problem is that I have no way to define the weights or the Ntrial for that proportion.

I admit that I have problems with the syntax of my model in relation to the response variable.
Could someone help me with the syntax to be able to run the model I need?

Thank you very much for your time and help.
Nicolas, thanks, thanks, thankssssssss........
Attached the database and the script used

`library(ggplot2)
library(ape)
library(car)
library(phytools)
library(phylolm) # phyloglm
library(phyr)
library(DHARMa)
library(dplyr)

str(data)
data$OLRE=as.factor(data$OLRE)
data$Species=as.factor(data$Species)
data$order=as.factor(data$order)

Species-specific body mass

data$BodyMass <- (data$MaleMeanMass + data$FemaleMeanMass)/2

phylogeny

phy <- read.nexus("consensus_phylogeny.tre") # consensus vertlife tree
phy <- bind.tip(phy, "Cervus_canadensis", where = which(phy$tip.label=="Cervus_elaphus"),
edge.length=0.5, position = 0.5)
phy <- bind.tip(phy, "Gazella_marica", where = which(phy$tip.label=="Gazella_subgutturosa"),
edge.length=0.5, position = 0.5)

#firts model

M0=pglmm(cbind(Neoplasia, Nodeadbycancer) ~ log(BodyMass), data = data, family = 'binomial',cov_ranef = list(sp = phy))

No random terms specified, use lm or glm instead, There is a possibility that running a model without random effects?

M1=pglmm(cbind(Neoplasia, knownDeaths-Neoplasia) ~ log(BodyMass)+log(lifeexp)+ (1|OLRE), data = data, family = "binomial",cov_ranef = list(sp = phy),add.obs.re = FALSE)
summary(M1)

simulationOut<- simulateResiduals(fittedModel = M1, n = 250,refit = FALSE)
plot(simulationOut) # bad fit
testZeroInflation(simulationOut) # overdispersion by zeros
testOverdispersion(simulationOut) # overdispersion by not zeros is good

M2=pglmm(cbind(Neoplasia, knownDeaths-Neoplasia) ~ log(BodyMass)+log(lifeexp)+ (1|OLRE), data = data, family = " zeroinflated.binomial",cov_ranef = list(sp = phy),add.obs.re = FALSE,bayes=TRUE)
`
dataIZ.txt

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions