-
Notifications
You must be signed in to change notification settings - Fork 223
Closed
Labels
Description
Describe the issue
Hi, I found that signatureEnrichment function doesn't assign all samples in the maf file to certain signature.
I found it when running with custom data, and noticed the same problem in your vignettes as well. https://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html
Command
> laml@summary
ID summary Mean Median
1: NCBI_Build 37 NA NA
2: Center genome.wustl.edu NA NA
3: Samples 193 NA NA
4: nGenes 1241 NA NA
5: Frame_Shift_Del 52 0.269 0
6: Frame_Shift_Ins 91 0.472 0
7: In_Frame_Del 10 0.052 0
8: In_Frame_Ins 42 0.218 0
9: Missense_Mutation 1342 6.953 7
10: Nonsense_Mutation 103 0.534 0
11: Splice_Site 92 0.477 0
12: total 1732 8.974 9
laml.se = signatureEnrichment(maf = laml, sig_res = laml.sig)
##
## Signature_1 Signature_2 Signature_3
## 60 65 63
So you can also notice in laml, sample size is 193, but when running signatureEnrichment, only 188 samples were assigned.
I was wondering if I missed something.
Can you help to explain this?
I have my dataset which includes 12 samples, but only 6 were assigned, which I assume is the same problem.
Thank you.
Session info
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] maftools_2.4.15
loaded via a namespace (and not attached):
[1] compiler_4.0.2 Matrix_1.2-18 tools_4.0.2 RColorBrewer_1.1-2
[5] survival_3.2-3 R.methodsS3_1.8.1 splines_4.0.2 grid_4.0.2
[9] data.table_1.13.0 R.utils_2.10.1 R.oo_1.24.0 lattice_0.20-41
Metadata
Metadata
Assignees
Labels
Projects
Milestone
Relationships
Development
Select code repository
Activity
ShixiangWang commentedon Sep 11, 2020
@alexyfyf This is because several samples without SBS mutation are dropped.
You should check your data with this and report the result.
PoisonAlien commentedon Sep 11, 2020
Hi,
I am guessing mutation load is quite low for your samples? This could lead to exclusion of samples.
alexyfyf commentedon Sep 11, 2020
Thank you for the reply. Looks like it's not because of that. I'll post the codes here:
Here's my maf object with 12 samples
Here's the tnm also 12*96
Then extract 3 signatures
Finally, enrichment, but only 6 samples assigned
I also find if I change the number of signatures to 2, the signatureEnrichment can work well assigning 4 and 8 samples to each signature, which is correct (12 in total), but for the example above using 3 signatures, there's seems to be something wrong.
Thank you if you could help.
ShixiangWang commentedon Sep 11, 2020
Should be a bug, could you post your data for debugging?
PoisonAlien commentedon Sep 11, 2020
I also see that you are using mouse genome. Are these data from mice? I don't think this should affect but let me see if I can reproduce the issue. As @ShixiangWang suggested it would help a great deal if you could share your
tnm
object (as an RDs or rdata)alexyfyf commentedon Sep 12, 2020
Yes it's mouse data. I'm not sure how to share the data here.
I put it on google drive, and here's the link. But somehow the extension is lost, you need download and gunzip, and read it as a rds file.
Please let me know if you can load the data. Thank you so much.
PoisonAlien commentedon Sep 14, 2020
Hi @alexyfyf
Thanks for sharing the file. I can reproduce your issue. I will have a look and let you soon. Sorry things are quite busy.
ShixiangWang commentedon Sep 15, 2020
@PoisonAlien If you need help, @me at any time
PoisonAlien commentedon Sep 15, 2020
Thanks @ShixiangWang for the offer to help. I will definitely keep in mind..
Hi @alexyfyf ,
Two points..
estimateSignatures
first and get the ideal number of signatures.Above plot shows 6 would be a good number since the correlation reaches maximum.
alexyfyf commentedon Sep 16, 2020
@PoisonAlien
Thank you for explaining this. I did the plotCophenetic, and I initially thought I should choose the smallest number which gives a high score. I though large n will introduce more noise. So I picked 3, which is the second-highest here. Maybe I'm not understanding this metric very well.
Do you suggest to choose the n with highest score?
PoisonAlien commentedon Sep 16, 2020
The idea is to look for the point at which it reaches max value and drops significantly. Here it could be 4 or 6. You can run for both and decide upon the number - in case if you think 6 is an overkill use 4. This is always tricky and never the black&white. Hope this helps.
alexyfyf commentedon Sep 17, 2020
Thank you for the explanation.
One more related issue is when I used number = 4 for signature extraction as suggested. And then I ran signatureEnrichment and it actually returned 13 samples, however, I only have 12 in the maf_sub object in the link.
You probably can find the same if you run my data.
Likely those two bugs are related and both point to the k-means.
PoisonAlien commentedon Sep 17, 2020
Yes, definitely that is the case. Its better to skip the function for now..
PoisonAlien commentedon Oct 6, 2020
Hello @alexyfyf
Thanks for reporting the issue. I decided to drop the function entirely since I do not have enough time to fix it. This function will now outputs a warning message - not to use it for any interpretation. I will gradually remove it from the package. I apologize for the inconvenience. I am closing the issue for now, please feel free to reopen if necessary.
alexyfyf commentedon Oct 11, 2020
Thank you for letting me know.