Skip to content

estimate cell fate potency from single cell RNA-seq data

Notifications You must be signed in to change notification settings

CahanLab/stemfinder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Sep 21, 2024
db8ef0e · Sep 21, 2024

History

26 Commits
Sep 21, 2024
Apr 26, 2023
Jul 18, 2024
Sep 21, 2024
Sep 21, 2024
Jul 11, 2024
Jul 12, 2024
Sep 21, 2024

Repository files navigation

stemFinder vignette

Kathleen Noller 07/11/2024

stemFinder

Single-cell estimation of differentiation time from scRNA-seq data

Setup

options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("devtools")
## 
## The downloaded binary packages are in
##  /var/folders/hb/b7nzqfss2_l63s3qz23cqftr0000gp/T//Rtmpa5pZ4q/downloaded_packages
devtools::install_github("cahanlab/stemfinder")
## Skipping install of 'stemFinder' from a github remote, the SHA1 (83a451f7) has not changed since last install.
##   Use `force = TRUE` to force installation
library(stemFinder)
## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## Loading required package: Seurat

## Attaching SeuratObject

## Loading required package: ggplot2

Load query data - Bone marrow from Tabula Muris

Query data should be a Seurat object containing a log-normalized, scaled single-cell gene expression matrix

Query data must have two metadata columns:

Phenotype (character vector of cell type annotations) and Ground_truth (numeric vector of ascending ground truth values denoting extent of differentiation)
Note: example data has already been filtered, normalized, and scaled

Download query data: Tabula Muris bone marrow, 10X platform

adata = readRDS("MurineBoneMarrow10X_GSE109774.rds")
head(adata,2)
##                            orig.ident nCount_RNA nFeature_RNA
## X10X_P7_3_AAACCTGAGCATCATC       X10X      12994         3468
## X10X_P7_3_AAACCTGCAGAGTGTG       X10X       5437         1764
##                                       Phenotype Ground_truth percent.mt
## X10X_P7_3_AAACCTGAGCATCATC Monocyte_progenitors            2          0
## X10X_P7_3_AAACCTGCAGAGTGTG            Monocytes            3          0
##                            percent.ribo    S.Score  G2M.Score Phase
## X10X_P7_3_AAACCTGAGCATCATC     22.62583  0.3360127  0.3821197   G2M
## X10X_P7_3_AAACCTGCAGAGTGTG     24.03899 -0.1894597 -0.3641231    G1

Prepare inputs to stemFinder

# Select input cell cycle gene list
      ## standard input to stemFinder: G2M and S cell cycle genes
      ## G2M and S gene lists are provided for mouse, human, and C. elegans
cell_cycle_genes = c(s_genes_mouse, g2m_genes_mouse)[c(s_genes_mouse, g2m_genes_mouse) %in% rownames(adata)] 
VariableFeatures(adata) = VariableFeatures(adata)[!(VariableFeatures(adata) %in% cell_cycle_genes)] #make sure cell cycle genes are not among highly variable features

# PCA
adata <- RunPCA(adata, verbose = F)
p1 <- ElbowPlot(adata, ndims = 50)

#Select PCs based on elbow plot
pcs = 32

#Perform K nearest neighbors
k = round(sqrt(ncol(adata))) #default value of k parameter
adata = FindNeighbors(adata, dims = 1:pcs, k.param = k, verbose = F)
knn = adata@graphs$RNA_nn #KNN matrix

Run stemFinder

Inputs:

adata: Seurat object containing log-normalized, scaled gene expression data (features x cells)
k: number of nearest neighbors
nn: KNN matrix (cells x cells)
thresh: threshold for binarizing gene expression data (default = 0)
markers: character vector of cell cycle genes present in query data
method: string denoting which method of computing gene expression heterogeneity to use (default: ‘gini’, other: ‘stdev’ and ‘variance’)
adata = run_stemFinder(adata, k = k, nn = knn, thresh = 0, markers = cell_cycle_genes, method = 'gini')

head(adata,5) 
##                            orig.ident nCount_RNA nFeature_RNA
## X10X_P7_3_AAACCTGAGCATCATC       X10X      12994         3468
## X10X_P7_3_AAACCTGCAGAGTGTG       X10X       5437         1764
## X10X_P7_3_AAACCTGGTCGAACAG       X10X       4466         1526
## X10X_P7_3_AAACCTGTCACTTCAT       X10X      23852         4043
## X10X_P7_3_AAACGGGAGAAGGTTT       X10X       4375          977
##                                       Phenotype Ground_truth percent.mt
## X10X_P7_3_AAACCTGAGCATCATC Monocyte_progenitors            2          0
## X10X_P7_3_AAACCTGCAGAGTGTG            Monocytes            3          0
## X10X_P7_3_AAACCTGGTCGAACAG Monocyte_progenitors            2          0
## X10X_P7_3_AAACCTGTCACTTCAT     Stem_Progenitors            1          0
## X10X_P7_3_AAACGGGAGAAGGTTT         Granulocytes            3          0
##                            percent.ribo     S.Score  G2M.Score Phase
## X10X_P7_3_AAACCTGAGCATCATC    22.625827  0.33601275  0.3821197   G2M
## X10X_P7_3_AAACCTGCAGAGTGTG    24.038992 -0.18945969 -0.3641231    G1
## X10X_P7_3_AAACCTGGTCGAACAG    33.631885  0.30172632 -0.1413534     S
## X10X_P7_3_AAACCTGTCACTTCAT    33.104142 -0.01163238 -0.3062905    G1
## X10X_P7_3_AAACGGGAGAAGGTTT     2.537143 -0.15402552 -0.1239491    G1
##                            stemFinder_raw stemFinder
## X10X_P7_3_AAACCTGAGCATCATC      18.654875 0.08664202
## X10X_P7_3_AAACCTGCAGAGTGTG       5.315398 0.73975374
## X10X_P7_3_AAACCTGGTCGAACAG      16.387931 0.19763346
## X10X_P7_3_AAACCTGTCACTTCAT      19.251784 0.05741689
## X10X_P7_3_AAACGGGAGAAGGTTT       2.994946 0.85336496

The following 2 columns are added to metadata:

-Raw stemFinder score (“stemFinder_raw”)
-stemFinder score with directionality corresponding to pseudotime / ground truth (“stemFinder”)

Check against previously-computed stemFinder results on this dataset

sF_scores = read.csv("bmmc_sF_results.csv", row.names = 1)
head(sF_scores,5)
##                            stemFinder_raw stemFinder
## X10X_P7_3_AAACCTGAGCATCATC      18.654875 0.08664202
## X10X_P7_3_AAACCTGCAGAGTGTG       5.315398 0.73975374
## X10X_P7_3_AAACCTGGTCGAACAG      16.387931 0.19763346
## X10X_P7_3_AAACCTGTCACTTCAT      19.251784 0.05741689
## X10X_P7_3_AAACGGGAGAAGGTTT       2.994946 0.85336496

Quantify stemFinder performance relative to ground truth

# Compute stemFinder performance metrics
list_all = compute_performance_single(adata, competitor = F)
## [1] "Single-cell Spearman Correlation, stemFinder: 0.74"
## [1] "AUC, stemFinder: 0.97"
## [1] "Phenotypic Spearman correlation, stemFinder: 0.89"
pct.recov = pct_recover(adata)
## [1] "Percentage of cells with low degree of differentiation recovered by stemFinder: 84.7117794486216"
## [1] "Relative abundance of cells with low degree of differentiation: 11.6428362999708"

Optional: compare stemFinder performance to another method

CytoTRACE and CCAT scores for BMMC query data

#Load pre-computed competitor scores 
comp_scores = read.csv("bmmc_competitor_results.csv", row.names = 1)
head(comp_scores,2)
##                            CytoTRACE      ccat CytoTRACE_invert ccat_invert
## X10X_P7_3_AAACCTGAGCATCATC      2645 0.3818031        0.2281879   0.2388558
## X10X_P7_3_AAACCTGCAGAGTGTG      1520 0.2712764        0.5564634   0.4591965
adata@meta.data = cbind(adata@meta.data, comp_scores) #add to metadata
adata$competitor = adata$ccat_invert #rename desired competitor column 

#Quantify performance
list_all_withcomp = compute_performance_single(adata, competitor = T, comp.inverted = T)
## [1] "Single-cell Spearman Correlation, stemFinder: 0.74"
## [1] "AUC, stemFinder: 0.97"
## [1] "Phenotypic Spearman correlation, stemFinder: 0.89"
print(list_all_withcomp)
## $`stemFinder results`
## Spearman_SingleCell      Spearman_Pheno                 AUC 
##           0.7428144           0.8883756           0.9724200 
## 
## $`Competitor results`
## Spearman_SingleCell      Spearman_Pheno                 AUC 
##           0.6755672           0.7783368           0.9298890

Visualize stemFinder and competitor results

Feature plot
p2 <- FeaturePlot(adata, features = c('Ground_truth','stemFinder','competitor'), cols = c('blue','red'), ncol = 3)

Box plot
p3 <- ggplot(adata@meta.data, aes(x = Ground_truth, y = stemFinder)) + geom_point() + geom_boxplot(aes(group = Ground_truth, color = Ground_truth)) + theme_bw() + ggtitle("stemFinder score vs. Ground truth") + ylab("stemFinder score") + xlab("Ground truth")

About

estimate cell fate potency from single cell RNA-seq data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages