Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seurat2 with Veloctyo.R #16

Open
Jemkon opened this issue Feb 13, 2018 · 31 comments
Open

Seurat2 with Veloctyo.R #16

Jemkon opened this issue Feb 13, 2018 · 31 comments

Comments

@Jemkon
Copy link

Jemkon commented Feb 13, 2018

Hello,

I have tested the velocyto.R package with Pagoda2 as described in tutorials. But I would really like to test it with Seurat2.
How do I create cell embedding, cell clustering, and accurate cell-cell distance matrix using Seurat2?
Many thanks.

@pkharchenko
Copy link
Contributor

This is more of a question for the Seurat list, but we'll try to add an example when we update tutorials.

@Puriney
Copy link
Contributor

Puriney commented Mar 1, 2018

I happened to be a fan of both packages. You would find the answers at the Seurat2's pbmc3K tutorial.

For cell clustering, see the section "Cluster the cells". The function is FindClusters.

For cell embedding, always run RunPCA first and then RunTSNE as the RunTSNE internally relies on the existence of PCA slot. The actual coordinates of the embedding are fetched by seurat_obj@dr$pca@cell.embeddings and seurat_obj@dr$tsne@cell.embeddings for PCA and tSNE, respectively. With the embeddings, you can easily follow velocyto's tutorial to mask the velocity.

For the better cell distance, in principle, you could adapt the below code from velocyto tutorial to Seurat2 object by replacing the r$reductions$PCA with the Seurat2's PCA coordinates. (I did not actually test this suggestion)

cell.dist <- as.dist(1-armaCor(t(r$reductions$PCA)))

@ramonvidal
Copy link

ramonvidal commented Mar 16, 2018

It worked very well for me using Seurat object in velocyto functions, I just needed to rename the emat and nmat colnames in order to fit with the ones in Seurat Object:

emat <- emat[,rownames(pbmc@meta.data)]; nmat <- nmat[,rownames(pbmc@meta.data)]; 
cluster.label <- pbmc@ident
cell.colors <- pagoda2:::fac2col(cluster.label)
emb <- pbmc@dr$tsne@cell.embeddings
cell.dist <- as.dist(1-armaCor(t(pbmc@dr$cca@cell.embeddings)))

@ramonvidal
Copy link

ramonvidal commented Mar 28, 2018 via email

@Jemkon
Copy link
Author

Jemkon commented Mar 28, 2018

Thanks @ramonvidal.

I was trying to compare and see how pagoda and seurat performs with velocyto. I found the same. The Pagoda2 + velocyto works best compared to Seurat2 + velocyto.

Many thanks for your suggestions.

@charles-bernard
Copy link

I would rahter vote for Seurat2 + velocyto simply because Seurat enables "Diffusion Map" ( See RunDiffusion function ) as a method of dimensionality reduction while Pagoda2 doesn't . I find diffusion maps better at capturing the dynamics of cell trajectories than t-SNE. Therefore, I would most of the time prefer to use a diffusion map embedding ( GetCellEmbeddings(object, reduction.type = "dm") ).

@LineWulff
Copy link

LineWulff commented Oct 10, 2018

Could someone please explain to me how/where I get the emat (spliced count matrix) and nmat (unspliced count matrix) from if I am using a seurat object?
I am trying to use velocyto.R to confirm some of my results.

@jebard
Copy link

jebard commented Oct 12, 2018

@LineWulff - You will need to run the Velocyto command line tools on the dataset first to get the splicing information. That will create the loom file which you can get the emat and nmat. Then you can use the tSNE embeddings from Seurat, with the emat and nmat from the loom file to make your plot... it's a bit tricky.

@x811zou
Copy link

x811zou commented Jan 27, 2019

@ramonvidal How did you make your pbmcY1.rds?

@ramonvidal
Copy link

@ramonvidal How did you make your pbmcY1.rds?

Hi @x811zou , that is a typo, pbmcY1 == pbmc

@chinalex9527
Copy link

@ramonvidal , I was able to create cell embedding and clustering using seurat, but when I rename the emat and nmat colnames by running
emat <- emat[,rownames(Mouse1stArch12@meta.data)]; nmat <- nmat[,rownames(Mouse1stArch12@meta.data)];
It gave me this :
Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) :
invalid character indexing
Do you have any suggestions? Thank you in advance!

@chlee-tabin
Copy link

I may not be following, but "renaming the emat and nmat colnames" should be

colnames(emat) <- rownames(Mouse1stArch12@meta.data)
colnames(nmat) <- rownames(Mouse1stArch12@meta.data)

@chinalex9527
Copy link

@chlee-tabin , sorry I'm completely new to R scripts, I guess my question is in velocity estimation, they used Pagoda2 object for the following command:
emat <- emat[,rownames(r$counts)]; nmat <- nmat[,rownames(r$counts)]
How can I modify it with the Seurat object I created?
Thank you!

@LineWulff
Copy link

LineWulff commented Feb 6, 2019

@chinalex9527 , with a Seurat object I did:
emat <- emat[,rownames(seuratobject@meta.data)]; nmat <- nmat[,rownames(seuratobject@meta.data)]
This step will filter out cells from emat and nmat, which are not in the seurat/pagoda object.
I had to rename my emat and nmat columns first though. You have to inspect how the cells are named in both your seurat object (e.g. rownames(seuratobject@meta.data)) and the nmat/emat format.

@jebard
Copy link

jebard commented Feb 6, 2019

Here is my full code for making this work -- hope this helps you all

#library(devtools)
#install_github("velocyto-team/velocyto.R")
library(velocyto.R)
library(pagoda2)
library(Seurat)


# This is generated from the Velocyto python command line tool.
# You need a loom file before you can proceed
ldat <- read.loom.matrices("possorted_genome_bam_XL2S3.loom")

# Gather the spliced and unspliced estimates
emat <- ldat$spliced
nmat <- ldat$unspliced


# take embedding from the Seurat data object
# NOTE: This assumes you have a seurat data object loaded
# into R memory prior to using this script. STOP and rerun seurat
# pipeline if you do not have this loaded. In my case, my seurat object is simply myData
emb <- myData@dr$tsne@cell.embeddings

# Estimate the cell-cell distances 
cell.dist <- as.dist(1-armaCor(t(myData@dr$tsne@cell.embeddings)))

# This is a little bit of foo-magic that needs to be adjusted on a per-sample
# basis depending on the cell names and how you ran the pipeline. Each cell
# stored in the loom object and seurat have an ID, make sure these are the same.
# in this case -- i need to trim 28 characters off the front to match seurat object.
colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
colnames(nmat) <- paste(substring(colnames(nmat),28,43),sep="")

# What this step does is essentially this:
# > head(colnames(emat))
# [1] "possorted_genome_bam_XL2S3:AAAGATGCATACTACGx"
# [2] "possorted_genome_bam_XL2S3:ACCTTTATCTTTAGTCx"
# [3] "possorted_genome_bam_XL2S3:AAGGAGCCACGCATCGx"

# > colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
# > head(colnames(emat))
# [1] "AAAGATGCATACTACG" "ACCTTTATCTTTAGTC" "AAGGAGCCACGCATCG" 

# Now the names in emat and nmat will match up to the cell names used in my seurat object


# I'm not sure what this parameter does to be honest. 0.02 default
# perform gamma fit on a top/bottom quantiles of expression magnitudes
fit.quantile <- 0.02

# Main velocity estimation
rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
                                            kCells=10,
                                            cell.dist=cell.dist,
                                            fit.quantile=fit.quantile,
                                            n.cores=24)

# This section gets the colors out of the seurat tSNE object so that my seurat and velocyto plots use the same color scheme.
gg <- TSNEPlot(myData)
ggplot_build(gg)$data
colors <- as.list(ggplot_build(gg)$data[[1]]$colour)
names(colors) <- rownames(emb)


 p1 <- show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt',
                                      cell.colors=ac(colors,alpha=0.5),
                                      cex=0.8,arrow.scale=2,show.grid.flow=T,
                                      min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1,
                                      do.par=F,cell.border.alpha = 0.1,
                                      n.cores=24,main="Cell Velocity")#,cc=p1$cc)
 

@chinalex9527
Copy link

@jebard , thank you so much! The code worked beautifully!

@CodeInTheSkies
Copy link

@jebard , Many thanks for your full code!! Very helpful, since applying RNA velocity does not seem to be straightforward, and a clear pipeline to use it in conjunction with Seurat is missing from the creators, as far as I know.

It will be great if you could please clarify two things:

  1. Towards the end of your code when you try to get the color out of the Seurat TSNE object, the code you gave above doesn't seem to work anymore. I'm using Seurat v2.3.4, R v3.5.0 and ggplot v2_3.1.0. Maybe the slots have changed? I spent some time to explore the slots to find out where the color info is stored, but I haven't had luck so far. Can you please help if you know?

  2. Once we successfully run show.velocity.on.embedding.cor, how exactly should we make the plot? Is the output p1 a ggplot object? Can we use print() pdf() or cowplot::save_plot with it?

Many thanks again, in advance!

@CodeInTheSkies
Copy link

Further, the output of show.velocity.on.embedding.cor is a list. So, basically, I would like to know how best to convert this output to a ggplot, gtable, or a grob.

Thank you!

@jebard
Copy link

jebard commented Mar 27, 2019

@CodeInTheSkies

Went back to the code and was able to get the colors out using this:

gg <- TSNEPlot(myData)
colors <- as.list(ggplot_build(gg)$data[[1]]$colour)
names(colors) <- rownames(emb)

I'm not sure why originally i had it as ggplot$Lpo -- i'll update my code above accordingly.

The output of the command show.velocity.on.embedding.cor should result in a plot being generated. I use Rstudio so it automatically pops up a plot. How I typically export to a file is by:

png("~/my-exported-graph.png")
show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt',
cell.colors=ac(colors,alpha=0.5),
cex=0.8,arrow.scale=2,show.grid.flow=T,
min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1,
do.par=F,cell.border.alpha = 0.1,
n.cores=24,main="Cell Velocity")
dev.off()

The reason I have been saving it into the p1 object, is that it can help speed up replotting the figure should you need to make any graphical changes (like arrow size / density) -- to do this, you pass the parameter cc=p1$cc

R version 3.4.2 Seurat_2.3.4 cowplot_0.9.2 ggplot2_3.0.0 velocyto.R_0.6

@MichaelPeibo
Copy link

Hi @jebard

Thanks for very detailed demonstration!
Just one question related to this. Do I need filter.genes.by.cluster.expression before gene.relative.velocity.estimates?
I saw this step in this tutorial

@pkharchenko
Copy link
Contributor

pkharchenko commented Apr 26, 2019 via email

@MichaelPeibo
Copy link

Hi @pkharchenko
Thanks for response!
I will try to do filter and without to to if there is a difference.

@Fu-Luo
Copy link

Fu-Luo commented Jun 29, 2019

Hi @jebard

Thanks for your code.
I was not sure about this command:
emb <- myData@dr$tsne@cell.embeddings

What exactly is dr?

When I was trying to run this command:

emb <- lrc8@dr$tsne@cell.embeddings
Error: no slot of name "dr" for this object of class "Seurat"

@MichaelPeibo
Copy link

Hi @MazzzzzeLuo
This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot.

Also make sure you are using Seuratv2 rather than v3.

@GBeattie
Copy link

This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot.

Also make sure you are using Seuratv2 rather than v3.

@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.

> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+                                             kCells=10,
+                                             cell.dist=cell.dist,
+                                             fit.quantile=fit.quantile,
+                                             n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 
  no 'dimnames[[.]]': cannot use character indexing
In addition: Warning message:
In labels(cell.dist) == colnames(emat) :
  longer object length is not a multiple of shorter object length

@ArcusGears
Copy link

ArcusGears commented Oct 23, 2019

I'm having some trouble with transferring some of the commands over.
When I try to change the cluster label from the pagoda2 imput to a seurat one I keep throwing an error.

cluster.label <- object@ident
    Error: no slot of name "ident" for this object of class "Seurat"

Shouldn't ident be automatically created when the cluster neighbors are generated? When I ran a DimPlot for UMAP I was able to group it by "ident" as well, so I'm not sure what the problem is.

Also like @GBeattie I am trying to use Seurat v3 rather than v2

@XYZuo
Copy link

XYZuo commented Jul 19, 2020

This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot.
Also make sure you are using Seuratv2 rather than v3.

@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.

> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+                                             kCells=10,
+                                             cell.dist=cell.dist,
+                                             fit.quantile=fit.quantile,
+                                             n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 
  no 'dimnames[[.]]': cannot use character indexing
In addition: Warning message:
In labels(cell.dist) == colnames(emat) :
  longer object length is not a multiple of shorter object length

hI,
Have you solven this problem? I met the same error using Seurat 3.
Thanks!

@Bioinformatics-rookie
Copy link

Here is my full code for making this work -- hope this helps you all

#library(devtools)
#install_github("velocyto-team/velocyto.R")
library(velocyto.R)
library(pagoda2)
library(Seurat)


# This is generated from the Velocyto python command line tool.
# You need a loom file before you can proceed
ldat <- read.loom.matrices("possorted_genome_bam_XL2S3.loom")

# Gather the spliced and unspliced estimates
emat <- ldat$spliced
nmat <- ldat$unspliced


# take embedding from the Seurat data object
# NOTE: This assumes you have a seurat data object loaded
# into R memory prior to using this script. STOP and rerun seurat
# pipeline if you do not have this loaded. In my case, my seurat object is simply myData
emb <- myData@dr$tsne@cell.embeddings

# Estimate the cell-cell distances 
cell.dist <- as.dist(1-armaCor(t(myData@dr$tsne@cell.embeddings)))

# This is a little bit of foo-magic that needs to be adjusted on a per-sample
# basis depending on the cell names and how you ran the pipeline. Each cell
# stored in the loom object and seurat have an ID, make sure these are the same.
# in this case -- i need to trim 28 characters off the front to match seurat object.
colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
colnames(nmat) <- paste(substring(colnames(nmat),28,43),sep="")

# What this step does is essentially this:
# > head(colnames(emat))
# [1] "possorted_genome_bam_XL2S3:AAAGATGCATACTACGx"
# [2] "possorted_genome_bam_XL2S3:ACCTTTATCTTTAGTCx"
# [3] "possorted_genome_bam_XL2S3:AAGGAGCCACGCATCGx"

# > colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
# > head(colnames(emat))
# [1] "AAAGATGCATACTACG" "ACCTTTATCTTTAGTC" "AAGGAGCCACGCATCG" 

# Now the names in emat and nmat will match up to the cell names used in my seurat object


# I'm not sure what this parameter does to be honest. 0.02 default
# perform gamma fit on a top/bottom quantiles of expression magnitudes
fit.quantile <- 0.02

# Main velocity estimation
rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
                                            kCells=10,
                                            cell.dist=cell.dist,
                                            fit.quantile=fit.quantile,
                                            n.cores=24)

# This section gets the colors out of the seurat tSNE object so that my seurat and velocyto plots use the same color scheme.
gg <- TSNEPlot(myData)
ggplot_build(gg)$data
colors <- as.list(ggplot_build(gg)$data[[1]]$colour)
names(colors) <- rownames(emb)


 p1 <- show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt',
                                      cell.colors=ac(colors,alpha=0.5),
                                      cex=0.8,arrow.scale=2,show.grid.flow=T,
                                      min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1,
                                      do.par=F,cell.border.alpha = 0.1,
                                      n.cores=24,main="Cell Velocity")#,cc=p1$cc)
 

I ran follow your code, and the last step went wrong,
Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option
how can I slove it?

@XYZuo
Copy link

XYZuo commented Sep 18, 2020

Hi @Bioinformatics-rookie ,
I met the same problem before and solved it by installing libopenblas.
I use anaconda so I ran conda install -c conda-forge libopenblas
Hope it can help you.

@Battamama
Copy link

This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot.
Also make sure you are using Seuratv2 rather than v3.

@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.

> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+                                             kCells=10,
+                                             cell.dist=cell.dist,
+                                             fit.quantile=fit.quantile,
+                                             n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 
  no 'dimnames[[.]]': cannot use character indexing
In addition: Warning message:
In labels(cell.dist) == colnames(emat) :
  longer object length is not a multiple of shorter object length

I have the same problem too. Have you guys solved it?

@yanpinlu
Copy link

yanpinlu commented Jul 17, 2022

Hello, I have encountered this problem, how can I solve it, I still do not understand
sheild.seurat <- readRDS("/home/sheild.nodoublet7.16.rds")
sheild$spliced<-sheild$spliced[,rownames(sheild.seurat@meta.data)]
Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) :
invalid character indexing
@ramonvidal

When you run three more samples separately, one of them will have such a problem at this step, and the other two will work correctly. I found that the number of rows of the seurat object with the error sample did not match the number of row with spliced, while the other two samples matched

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests