Skip to content

matrix with no cells #305

@brgew

Description

@brgew

Hi Ben,

I think that this is a marginal problem.

Anyway, we can have matrices in which all cells are filtered out; that is, all cells have less than 100 umis, for example.

I use BPCells import_matrix_market() to read the matrix market file and subsequently calculate the column sums and drop columns with fewer than 100 'counts'. Later I use write_matrix_dir() to store the matrix on disk. When I restore the matrix in R using open_matrix_dir() , R reports that the second dimension is 1, where the dim name is set to NULL.

library(BPCells)
mat <- import_matrix_market('Keyhole.umi_counts.mtx')

> str(mat)
Formal class 'MatrixDir' [package "BPCells"] with 7 slots
  ..@ dir        : chr "/tmp/xxx/RtmpqwNpT9/matrix_market23d73452b52716"
  ..@ compressed : logi TRUE
  ..@ buffer_size: int 8192
  ..@ type       : chr "uint32_t"
  ..@ dim        : int [1:2] 70038 1178
  ..@ transpose  : logi FALSE
  ..@ dimnames   :List of 2
  .. ..$ : NULL
  .. ..$ : NULL

> csums <- colSums(mat)
> mat2 <- mat[,csums>100]

> str(mat2)
Formal class 'MatrixSubset' [package "BPCells"] with 7 slots
  ..@ matrix       :Formal class 'MatrixDir' [package "BPCells"] with 7 slots
  .. .. ..@ dir        : chr "/tmp/xxx/RtmpqwNpT9/matrix_market23d73452b52716"
  .. .. ..@ compressed : logi TRUE
  .. .. ..@ buffer_size: int 8192
  .. .. ..@ type       : chr "uint32_t"
  .. .. ..@ dim        : int [1:2] 70038 1178
  .. .. ..@ transpose  : logi FALSE
  .. .. ..@ dimnames   :List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  ..@ row_selection: int(0) 
  ..@ col_selection: int(0) 
  ..@ zero_dims    : logi [1:2] FALSE TRUE
  ..@ dim          : int [1:2] 70038 0
  ..@ transpose    : logi FALSE
  ..@ dimnames     :List of 2
  .. ..$ : NULL
  .. ..$ : NULL

> write_matrix_dir(mat2, 'foo_dir',overwrite=TRUE)
70038 x 1 IterableMatrix object with class MatrixDir

Row names: unknown names
Col names: unknown names

Data type: uint32_t
Storage order: column major

> mat3 <- open_matrix_dir('foo_dir')
> str(mat3)
Formal class 'MatrixDir' [package "BPCells"] with 7 slots
  ..@ dir        : chr "/net/xxx/79/5dc9"| __truncated__
  ..@ compressed : logi TRUE
  ..@ buffer_size: int 8192
  ..@ type       : chr "uint32_t"
  ..@ dim        : int [1:2] 70038 1
  ..@ transpose  : logi FALSE
  ..@ dimnames   :List of 2
  .. ..$ : NULL
  .. ..$ : NULL

> dim(mat3)
[1] 70038     1

The value of 1 creates a problem with a Bioconductor package, which expects a value of 0.

I will try working around this by checking for this condition after the open_matrix_dir() call.

I appreciate your consideration and thoughts.

Ever grateful,
Brent

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions