| Desc | logs |
|---|---|
| maintainer: | Yifan Yang <mailto: yfyang.86@hotmail.com> |
| Version | 0.4-3 |
| Date | 2024-09-19 |
The Kaplan-Meier estimator is very popular in analysis of survival data. However, it is not easy to compute the constrained Kaplan-Meier. Current computational method uses the expectation maximization (EM) algorithm to achieve this, but can be slow at many situations. In this package we give a recursive computational algorithm for the constrained Kaplan-Meier estimator. The constraint is assumed given in linear estimating equations or mean functions.
We also illustrate how this leads to the empirical likelihood ratio test with right censored data and apply it to test non-parametric AFT problem. The proposed has a significant speed advantage over EM algorithm.
This package is written and maintained by Yifan Yang (mailto:yifan.yang@uky.edu), and co-authored by Dr Zhou (http://www.ms.uky.edu/~mai/). The package is released on CRAN (http://cran.r-project.org/web/packages/kmc/).
One can install the development version using
library(devtools);
install_github('kmc', 'yfyang86');Run the following code in R with only one null hypothesis
library(kmc)
x <- c(1, 1.5, 2, 3, 4.2, 5, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3)
d <- c(1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1)
f <- function(x) {
x - 3.7
}
g = list(f = f)
result = kmc.solve(x, d, g)
print(result)
---------------------------------------------------------------------------------
A Recursive Formula for the Kaplan-Meier Estimator with Constraint
Information:
Number of Constraints: 1
lamda(s): -1.439612
---------------------------------------------------------------------------------
Log-likelihood(Ha) Log-likelihood(H0) -2LLR p-Value(df=1)
Est -17.5198 -17.8273 0.6150 0.4329
---------------------------------------------------------------------------------
If we add another constraint:
> myfun5 <- function(x) {
+ x^2 - 16.5
+ }
> # construnct g as a LIST!
>
> g = list(f1 = f, f2 = myfun5)
> re0 <- kmc.solve(x, d, g)
> re0
---------------------------------------------------------------------------------
A Recursive Formula for the Kaplan-Meier Estimator with Constraint
Information:
Number of Constraints: 2
lamda(s): -0.4148702 -0.1546575
---------------------------------------------------------------------------------
Log-likelihood(Ha) Log-likelihood(H0) -2LLR p-Value(df=2)
Est -17.5198 -17.8345 0.6293 0.7301
---------------------------------------------------------------------------------To make the package more user-friendly, we provide a lite version of the kmc.solve function. It
- ignores experimental options
- always uses
0vector as the initial value - use a faked randomize method to tackle the tie problem
Here is an example, we use the Stanford heart transplant data to illustrate the impact of the initial value on the convergence of the algorithm. The constraint is
library(survival)
stanford5 <- stanford2[!is.na(stanford2$t5), ]
y=log10(stanford5$time)
d <- stanford5$status
g <- list(f = function(x) {(x-2.4 < 0.) - 0.5} ) # \sum I(x_i<2.4) wi = 0.5)
kmc.solvelite(y, d, g)
##################################
## Example (Cont'd): -2LLR Curve
##################################
iters = 100
scale = 8.
starter = 2.7
result = rep(0., 20)
result2 = rep(0., 20)
observe_range = starter + (1:iters)/(iters * scale)
for (i in 1:iters){
g <- list(f = function(x) {(x- (starter + i/(iters * scale)) < 0.) - 0.5})
result[i] = kmc.solve(y, d, g, using.C=TRUE)$`-2LLR`
result2[i] = kmc.solvelite(y, d, g)$`-2LLR`
}
plot(x = observe_range, result, xlab = 'time', ylab = '-2LLR', type = 'b')
rect(2.7, 0, 2.735, 1.25, col = rgb(128/256, 0, 0, 0.1), border = NA)
rect(2.735, 0, 2.825, 1.25, col = rgb(0, 128/256, 0, 0.1), border = NA)
points(x = observe_range, result2, xlab = 'time', ylab = '-2LLR', type = 'b', col = 'red')This clearly shows the impact of the initial value on the convergence of the algorithm.
- Black:
kmc.solvewith EM algorithm iter = 5 as the initial lambda value. - Red:
kmc.solvelitewith0as the initial lambda value.
The red area is the region where the kmc.solvelite fails to converge, i.e. the initial value 0 fails.
There are known issues on some scenario when dealing with more than one constraint. According to our simulation, automatic tuning strategy fails under some constraints. One can always use proper initial values like:
- Always uses the
kmc.solvelitefunction to get a proper initial value; - Uses the
kmc.solvefunction with few (saying, 5 would be enough) EM algorithm steps to get a proper initial value; - Randomizes initial values around 0 and optimize it continuously.
Although KMC is much faster than EM among many others, we should point out that the initial value of lambda is hightly unsteady, and the convergence of the algorithm is highly dependent on the initial value.
In current developing version, this package depends on rootSolve::multiroot. In src/surv2.c::R_zeroin2surv(), we provide a Newton method 1D version to solve the root. It is not integrated in the package yet.
If there were two constraints, we could plot a contour plot for the log-likelihood. Typically,
ZZ <- plotkmc2D(re0)This package offers a naive contour plot. One can use ZZ to draw contour plot with the help of ggplot2.
A careful tuning version is
#!/usr/local/bin/Rscript
# file: TESTKMC.R
args <- commandArgs(TRUE)
t1=as.numeric(args[1])
t2=as.numeric(args[2])
library(kmc)
x <- c(1, 1.5, 2, 3, 4.2, 5, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3)
d <- c(1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1)
f_1 <- function(x) {x - t1}
f_2 <- function(x) {x^2-t2}
g <- list(f1=f_1,f2=f_2);
re0 <- kmc.solve(x, d, g)
ZZ <- plotkmc2D(re0,range0 = c(0.1, .4, 30))This version uses a 30 by 30 grid to construct the contour plot on a iMac2007 2.0Hz Core2 machine and only spend (2s to load R):
time Rscript TESTKMC.R 4.0 18.6
real0m20.202s
user0m18.817s
sys0m0.240sThe speed advantage of KMC algorithm could be used in time consuming analysis such as drawing contour plot. In this real data example, we illustrate the proposed algorithm to analyze the Stanford heart transplants program described in (Miller 1982). There were 157 patients who received transplants collected in the data, among which 55 were still alive and 102 were deceased. Besides, the survival time were scaled by 365.25. We could draw a contour plot of intercept and slope for a AFT model.
library(survival)
LL= 50
beta0 <- 3.218
beta1 <- -0.0145
stanford5 <- stanford2[!is.na(stanford2$t5), ]
beta.grid <- function(x0,range,n0,type="sq",u=5){
n0 = as.double(n0)
if (type=="sq"){
o1 <- c(
-range*(u*(n0:1)^2)/(u*n0^2),0,
range*(u*(1:n0)^2)/(u*n0^2)
)
}else{
if (type=='sqrt'){
o1 <- c(
-range*(u*sqrt(n0:1))/(u*sqrt(n0)),0,
range*(u*sqrt(1:n0))/(u*sqrt(n0)))
}else{
o1=c(
-range*(n0:1)/n0,
0,
range*(1:n0)/n0
)
}
}
return(
x0+o1
);
}
beta.0 <- beta.grid(beta0, 0.05, LL,"l")
beta.1 <- beta.grid(beta1,.003,LL,"l")
set.seed(1234)
y=log10(stanford5$time)+runif(157)/1000
d <- stanford5$status
oy = order(y,-d)
d=d[oy]
y=y[oy]
x=cbind(1,stanford5$age)[oy,]
ZZ=matrix(0,2*LL+1,2*LL+1)
library(kmc)
tic=0
for(jj in 1:(2*LL+1)){
for(ii in 1:(2*LL+1)){
beta=c(beta.0[ii],beta.1[jj])
ZZ[jj,ii]=kmc.bjtest(y,d,x=x,beta=beta,init.st="naive")$"-2LLR"
}
}
ZZ2<-ZZ
ZZ[ZZ<0]=NA ## when KMC.BJTEST fails to converge, it'll return a negative value.
ZZ[ZZ>0.5]=NA
range(ZZ,finite=T) -> zlim
floor.d<-function(x,n=4){floor(x*10^n)/(10^n)}
contour(
y=beta.0,
x=beta.1,
ZZ,
zlim=c(0,1),
levels=unique(floor.d(
beta.grid(x0=mean(zlim),range=diff(zlim)/2, n0=15,type="sqrt",u=10),
4)),
ylab="Intercept",
xlab=expression(beta[Age])
)
The countour plot is
Another one concentrates on two hypothesizes on survival function are considered:
Here,
- Bug fix: rootSolve issue LINK
- Buckley James: Add a
convergetag to indicate the convergence. - Add two uni-tests on
kmc.solveandkmc.bjtest.
- When the initial
lambdais not good, the optimization fails. One may notice there is a negative "LLR" consequently. This is due to theroot solveprocess fails to identify the right branch to searchlambda(p>1 dimensions). Currently, hidden functionskmc_routine5_1dandkmc_routine5_ndcould test this.
Please contact Yifan Yang (mailto:yfyang.86@hotmail.com), or leave feed back on the Github page.





