Kendall's tau for censored data in python

The python function kendall in kendall.py calculates a non-parametric correlation coefficient (Kendall's τ), that measures the strength of correlation for a paired sample of ordinal level data. Here the data may be partially censored (either with upper- or lower- limits, but not with mixed upper- and lower limits). Kendall's τ can also be used as a statistical test to rule out the null-hypothesis that the two variables are uncorrelated.

The calculation of tau and the p-value follow the calculation of Isobe, Feigelson, and Nelson (1986). Originally this formalism was developed in the context of medical science¹ by Brown, Holander & Korwar (1974). With respect to partial correlations the formalism is also presented in Akritas & Seibert (1996).

The p-value calculation requires the distribution and the variance of τ under the null-hypothesis. For uncensored data and large enough n the distribution can be approximated by a normal distribution (e.g., Wikipedia). In this case the resulting expression depends only on the sample size². Thus caution regarding the here calculated p-values is advised when using this code for small samples with uncensored data. Use scipy.stats.kendalltau instead.

For censored data and large n the distribution of τ under the null-hypothesis is approximately normal as well, but the variance depends on the distribution of censored values with respect to the sample proportions (Oakes 1982). Thus, in practice, an estimate of the variance from the data is required. This code follows the approach of Isobe et al. and Brown et al., but more refined approaches exists in the literature. An example developed with astronomical data in mind is given by Akritas, Murphy, and LaValley (1995); as of yet the computation of p-values with this variance estimator is only implemented in R as part of the package NADA (routine cenken). This formalism also support simultaneously left- and right- censored data, and a implementation in python thus appears desireable³.

Additional functionality is included with the function tau_conf. This function determine the robustness of the correlation coefficient due to each individual datum (done by bootstrapping) or uncertainties in the data (done by Monte Carlo sampling). A description of the idea beyond these procedures can be found in Curan (2015, arXiv:1411.3816).

Provided functions

kendall(x, y, censors=None, varcalc="simple", upper=True)

tau_conf(x, y, x_err=None, y_err=None, censors=None, p_conf=0.6826, n_samp=int(1e4), method="montecarlo", varcalc="simple", upper=True)

See online help of those function (or source code) for notes on their usage.

History of this code

A python implementation of the Isobe et al. algorithm was initially written by S. Flury for work presented in Flury et al. (2022). This code assumed the theoretical value for the variance in the case of uncensored data and large n. E.C. Herenz modified the code to use the empirical variance calculation as described in Isobe et al. (1986) for work presented in Herenz et al. (2025).

Requirements

Acknowledging the use of the code

If your research benifits from this code, please cite Isobe et al. (1986) and include a link to this github repository.

Copyright

The code is released under GPLv3 license (see LICENSE). Copyright: E.C. Herenz (2024), S. Flury (2023)

Survival time comparison between patients receiving a heart transplant with patients not receiving such treatment. ↩
For small samples and uncensored data the distribution can not be written down in closed form. It requires evaluation of all possible permutations of the N pairs under the null hypothesis. Then the calculation of the p-value requires the the calulation of all |τ| values for these permutations. While some trickery can simplify this calculation, it is not yet implemented here; scipy provides it since ~2019 and conservatively assumes n<50 as small (R uses n<60) -- see the resolved issue at github. In practice, I found that for n = 15 the critical |τ| values for the threshold p=0.05 differ by ≈ 10⁻². Critical values for |τ| for given p-values are found also tabulated in the statistical literature. ↩
PRs are very welcome. The existing code in NADA seems very confusing and the acompanying book does not shed light on the issue. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LICENSE		LICENSE
README.md		README.md
kendall.py		kendall.py
tau_chatgpt_logo.png		tau_chatgpt_logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kendall's tau for censored data in python

Provided functions

History of this code

Requirements

Acknowledging the use of the code

Copyright

About

Uh oh!

Releases

Packages

Languages

License

Knusper/kendall

Folders and files

Latest commit

History

Repository files navigation

Kendall's tau for censored data in python

Provided functions

History of this code

Requirements

Acknowledging the use of the code

Copyright

Footnotes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages