Skip to content

SCTransform - Loss of genes #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TobiTekath opened this issue Jun 5, 2019 · 3 comments
Closed

SCTransform - Loss of genes #27

TobiTekath opened this issue Jun 5, 2019 · 3 comments

Comments

@TobiTekath
Copy link

Hi Christoph,

thanks for your great package.

I am using the SCTransform() function from the Seurat package, but as it is just a wrapper for your vst() function I thought I would kindly ask you for help directly.

The problem is, that apparently some genes are lost during the SCTransform calculations (they do not appear in the SCT-assay anymore) and are therefore lost for the downstream analysis.

For example I am having 29801 genes in my RNA-assay (counts, data and metafeatures) but I am only getting 28846 genes in my SCT-assay (counts, data and metafeatures).
Is this gene loss expected? Unfortunately some marker genes I am very interested in are among the removed genes. It would be great to receive the transformed data for all genes.

My call to SCTransform looks like this:

SCTransform(object = seur_object, vars.to.regress = c("nCount_RNA", "percent.mito"), verbose = T, variable.features.n = 5000)

PS:
I also got some warnings (besides the iteration limit reached one):
In dpois(y, mu, log = TRUE) : non-integer x = 0.250000
Is this warning related to my problem?

Best,
Tobi

@ChristophH
Copy link
Collaborator

Hi Tobi,

The vst function has a parameter min_cells set to 5 by default. This means that genes that are detected in fewer than 5 cells are not considered during normalization and are not part of the output. You could lower this threshold, but chances are high that the negative binomial regression will fail for these genes.
You can always look at the raw counts (instead of the sctransformed values) for genes that are not in the normalized data.

The warning in dpois suggests that your input has non-integer values (e.g. 0.25). The vst function assumes integer counts as input -- please check your input.

Sorry, something went wrong.

@TobiTekath
Copy link
Author

Thanks for the very fast and helpful reply.
I indeed overlooked the min_cells parameter, thanks for pointing that out.

Sorry, something went wrong.

@erzakiev
Copy link

Hello Christoph, thanks for the excellent package.

What kind of hot water I am getting myself into if I lower the min_cells parameter to, say, 1? Does this potentially hot water become boiling hot if I do this for a relatively large sample (say, 2000 cells, as compared to a typical sample of 100-200 cells) or it's size-independent?

I am interested in retaining as many genes as possible in the corrected UMI slot for my downstream analysis.

Sorry, something went wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants