The .agg files are aggregate files where each column represents a cluster in the main loom-file with the same name.
- Age: Age as reported by the clinician in post-conceptional weeks
- barcode: 10X barcode
- CellCyle: Fraction of RNA reads in Cell cylcle genes.
- CellID: Unique cell identifier in the form of sample:barcode
- Chemistry: The 10X genomics kit used to acquire the cell
- Class: Broad annotation of cell classes (Radial glia, Neuron, Oligo etc)
- ClusterName: Annotated cluster name
- Clusters: Cluster IDs used in the paper, result of subclustering and merging of class subsets
- Clusters_main: Primary clusters derived from analysis prior to subsetting of data
- Donor: Donor IDs (same as Shortname)
- DoubletFinderScore/Flag: Output from Doubletfinder
- Embedding: 2D embedding as used in the paper (TSNE/UMAP both available as well)
- FRIP: Fraction of Reads in Peak (ATAC)
- FRtss: Fraction of Reads in tss
- GA_colsum: summed gene accessibility per cell
- Id: ID from database
- LSI: Latent Semantic Indexing (LSI_b is LSI over bins, LSI_main is LSI after pooling of subsets)
- Method: Either 'atac-seq' or 'rnaXatac'
- mitochondrial: Number of mitochondrial reads (ATAC)
- Name: Library ID
- NBins/NGenes/NPeaks: Number of positive bins/genes/peaks
- passed_filters: Number of fragments (ATAC)
- peak_region_fragments/cutsites: As reported by cellranger-arc
- preClusters: Basic clustering based on binned data used for peak calling
- PseudoAge: Age smoothed over nearest neighbors
- SEX: Sex as determined based on Y-chromosomal reads
- TSNE/UMAP: Embedding as computed prior and post (_main) pooling of subsets
- TSS_fragments: Total number of TSS fragments (ATAC)
- Tissue/regions/subregions: Region annotation
- total: total reads (ATA)
- TotalUMI: Total number of UMIs (RNA)
- Y: Fraction of Y-chromosomal reads
- Z: Z-score normalization
- Ambiguous: ambiguously mapped reads as returned by velocyto
- Norm: Depth-normalized counts
- Pooled: Here each value is pooled from the 10 nearest multiome neighbours (depth normalized), i.e. also ATAC-only cells will have an imputed value here. Base layer is identical to pooled layer
- Raw: only the measured counts (multiome)
- Spliced: only the spliced counts (velocyto)
- Spliced_pooled: similar to pooled, but only spliced counts
- Unspliced/unspliced_pooled: unspliced counts (Velocyto)
- '': Base layer contains binarized counts
- 'Counts': Counts per peak
To download the 10X output for individual samples use the command below, replacing {sample} with the sample you need. A list of all the sample names can be found in 10X_output, and the metadata providing region, name, sample ID etc can be found in Extended data 1
wget https://storage.googleapis.com/linnarsson-lab-human/ATAC_dev/10X/{sample}
We use the Chromograph pipeline.
Code for making many of the figures is available as Jupyter notebooks The package versions used to generate these figures are in this environment file
Our gene and transcripts annotation is based on Based on GRCh38.p13 gencode V35 primary sequence assembly as previously described in Emelie Braun et al., 2022, in review.
We discarded genes or transcripts that overlapped or mapped to other genes or non-coding RNAs 3’ UTR.
From this link you can download the input files to build the cellranger-arc index yourself.
For more information please see the corresponding github repo.