From 3d17e8a72e91982b186bc069a0506ef35d873b9e Mon Sep 17 00:00:00 2001 From: Espen Hagen <2492641+espenhgn@users.noreply.github.com> Date: Tue, 12 Dec 2023 10:34:26 +0100 Subject: [PATCH] mixer_figures.py combine: numpy.linalg.LinAlgError: singular matrix Fixes #58 --- CHANGELOG.md | 40 +++++++++++++++++++++++++++++++-------- precimed/mixer/figures.py | 2 +- 2 files changed, 33 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 2df0e52..47d4dd4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,11 +1,35 @@ -MiXeR v1.2 -========== +# Changelog + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [Unreleased] + +### Updated + +- Updated `CHANGELOG.md` file to use semantic versioning and keepachangelog.com format + +### Added + +### Fixed + +- Fixed rare crash by allowing for singular matrices in call to `scipy.stats.multivariate_normal` in `mixer/figures.py` + +### Changed + +## MiXeR v1.3 + +* ... + +## MiXeR v1.2 * MATLAB code is removed, from now you should run MiXeR via python interface. This is now described in [README](README.md) in the root of MiXeR github repository. * MiXeR no longer provides standard errors on bivariate parameter estimates. Univariate parameter estimates can still, in principle, be calculated with ``--ci-alpha 0.05`` option, but this is turned off default, and is not recommended. Inference about statistical significance of the parameters should be made in context of the model selection criterion: Akaike (AIC) or Bayesian (BIC). -* You are required to update your scripts and use new reference files, as described below. +* You are required to update your scripts and use new reference files, as described below. There is no need to update input files with summary statistics. * Previous MiXeR version is available from git tag [v1.1](https://github.com/precimed/mixer/tree/v1.1). ``git checkout tags/v1.1 -b master`` allows to fetch it, more info [here](https://stackoverflow.com/questions/791959/download-a-specific-tag-with-git). @@ -16,19 +40,19 @@ MiXeR v1.2 All commands have custom defaults, so you no longer have to specify ``--fit-sequence``, ``--kmax``, ``--qq-plots``, ``--power-curves``, etc. See the [README](README.md) file for examples. * MiXeR estimates in bivariate analysis may, in some cases, depend on which of the two traits is selected as ``--trait1``. - Therefore, it may be reasonable to run MiXeR twice, swapping the traits. However, the main reason for such instability + Therefore, it may be reasonable to run MiXeR twice, swapping the traits. However, the main reason for such instability is due to low power in GWAS, and AIC/BIC will also capture such an issue. Therefore, if you observe significant AIC/BIC values, it may not be needed to run the MiXeR model twice. -* ``--plink-ld-bin0`` and ``--frq-file`` arguments are removed. Use ``--ld-file`` argument instead. +* ``--plink-ld-bin0`` and ``--frq-file`` arguments are removed. Use ``--ld-file`` argument instead. You need to download a new reference files for these arguments, links provided in the [README](README.md). Internally, the new references file are available on NIRD ``/LDSR/1000G_EUR_Phase3_plink/1000G.EUR.QC.@.run4.ld``. Further, there is an internal reference based on UK Biobank. * ``mixer_figures.py`` now generates more data, including AIC and BIC, and also Dice coefficient for bivariate analysis. It also generates negative log-likelihood plots. * Added support for Intel compiler, with build instructions for SAGA -* Added ``mixer.py perf`` option to evaluate performance of MiXeR cost funciton -* ``mixer.py ld`` has simpler procedure to generate files for ``--ld-file`` argument based on your custom genotype panel +* Added ``mixer.py perf`` option to evaluate performance of MiXeR cost funciton +* ``mixer.py ld`` has simpler procedure to generate files for ``--ld-file`` argument based on your custom genotype panel (plink is no longer required). * ``--z1max`` and ``--z2max``allow to specify thresholds for right-censoring. * MiXeR v1.2 uses precicely the same fit procedure as MiXeR v1.1, however it has a certain changes in how the LD matrix is stored internally, and also in how the cost function is evaluated. Particularly, the bivariate cost function based on sampling approach is now stateless, and it now consuming substentially less memory. The univariate cost function is based on convolution approach, as in v1.1, however it is now coupled with sampling for large z-scores to allow for ``--z1max`` and ``--z2max`` parameters. - + \ No newline at end of file diff --git a/precimed/mixer/figures.py b/precimed/mixer/figures.py index d4855d0..712b0ee 100644 --- a/precimed/mixer/figures.py +++ b/precimed/mixer/figures.py @@ -552,7 +552,7 @@ def execute_combine_parser(args): values = [data['ci'][key]['point_estimate'] for data in data_vec if (key in data['ci'])] if values: results['ci'][key] = {'mean': np.mean(values), 'median':np.median(values), 'std': np.std(values), 'min': np.min(values), 'max': np.max(values)} if values and (key=='rho_beta'): - values = [2 * multivariate_normal([0, 0], [[1, rho_beta], [rho_beta, 1]]).cdf([0, 0]) for rho_beta in values] + values = [2 * multivariate_normal([0, 0], [[1, rho_beta], [rho_beta, 1]], allow_singular=True).cdf([0, 0]) for rho_beta in values] results['ci']['fraction_concordant_within_shared'] = {'mean': np.mean(values), 'median':np.median(values), 'std': np.std(values), 'min': np.min(values), 'max': np.max(values)} if results['analysis'] == 'bivariate':