Open
Description
Hi,
I used genomescope from this repo:
https://github.com/schatzlab/genomescope
and
https://github.com/tbenavi1/genomescope2.0
And got weird results:
From v1:
From v2:
What could explain those big differences ?
hist have been generated with:
kmc -k21 -t5 -m64 -ci1 -cs10000 @cat.files cat.kmcdb tmp
kmc_tools transform cat.kmcdb histogram cat.kmcdb_k21.hist -cx10000
Activity
mschatz commentedon Dec 14, 2020
ptranvan commentedon Dec 14, 2020
Hello, we don't know many things about the genomic architecture of this species but it should be diploid. I did change "Average k-mer coverage "for polyploid genome" to 114 but got the same plot:
GenomeScope version 2.0
input file = user_uploads/va4PdkFoOZtVXygmISbA
output directory = user_data/va4PdkFoOZtVXygmISbA
p = 2
k = 21
initial kmercov estimate = 114
property min max
Homozygous (aa) 0% 100%
Heterozygous (ab) 0% 100%
Genome Haploid Length 251,713,776 bp 251,965,450 bp
Genome Repeat Length 52,929,734 bp 52,982,656 bp
Genome Unique Length 198,784,041 bp 198,982,794 bp
Model Fit 79.394% 92.8582%
Read Error Rate 0.236534% 0.236534%
http://genomescope.org/genomescope2.0/analysis.php?code=va4PdkFoOZtVXygmISbA
mschatz commentedon Dec 16, 2020
ViriatoII commentedon Aug 19, 2021
I'm also curious about this. Genomescope1 seems to have estimations in line with literature for my species while genomescope2 not (even when multiplying by 2 because of haploid vs diploid)
mschatz commentedon Aug 20, 2021
ViriatoII commentedon Aug 20, 2021
Hi Mike,
That's very kind of you, thank you.
As an example, this D. erucoides is estimated to have ~500 Mbps haploid genome size, 1000 Mbps in diploid size ( Lysák et al.,2009)
Genomescope1 predicts 435 Mbps haploid length, just short of literature, as well as a reasonable 0.65% of heterozygosity:
http://qb.cshl.edu/genomescope/analysis.php?code=izm95ZeGs1WxSvj8bidT
The genomescope2 run (even using max 100 000 coverage) predicts 212Mbps of haploid genome length, and a surprising max estimated heterozygosity of 20%.
http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=cKhv8wWNKBHMEgf3ilHd
I am applying the same pipeline to 20 species, some of which are tetraploid. I'd wish I could just apply the same parameters to all, or at least only treat the tetraploids differently and give them to genomescope2.
Appreciate the help,
Ricardo
mschatz commentedon Aug 20, 2021