Skip to content

How do I filter the CGmap files to get ~5.5million sites as described in the paper? #12

@Mansi-Purohit

Description

@Mansi-Purohit

Hi,

I am trying to create the input data for TrainPCClocks.R script using the processed data uploaded to GEO: GSE161141. I am having trouble filtering the sites as described in the Rat PCA clock paper. The closest I've gotten is ~4.4million sites by filtering coverage >=10 and col1 by chr 1-20, X, and Y and counting the 80% across samples using col1 and col3 as the unique identifiers of the location of sites.

Is there any more information that can be provided to help explain how the filtering on the cgmap files was done or should be done to get the final 5.5 million sites?

Thanks.
Mansi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions