Skip to content

Completely different result between Genomescope v1 and v2 #48

Open
@ptranvan

Description

@ptranvan

Hi,

I used genomescope from this repo:

https://github.com/schatzlab/genomescope
and
https://github.com/tbenavi1/genomescope2.0

And got weird results:

From v1:

https://ibb.co/FXPYPgV

From v2:

https://ibb.co/XC7Y5Nx

What could explain those big differences ?

hist have been generated with:

kmc -k21 -t5 -m64 -ci1 -cs10000 @cat.files cat.kmcdb tmp

kmc_tools transform cat.kmcdb histogram cat.kmcdb_k21.hist -cx10000

Activity

mschatz

mschatz commented on Dec 14, 2020

@mschatz
Contributor
ptranvan

ptranvan commented on Dec 14, 2020

@ptranvan
Author

Hello, we don't know many things about the genomic architecture of this species but it should be diploid. I did change "Average k-mer coverage "for polyploid genome" to 114 but got the same plot:

GenomeScope version 2.0
input file = user_uploads/va4PdkFoOZtVXygmISbA
output directory = user_data/va4PdkFoOZtVXygmISbA
p = 2
k = 21
initial kmercov estimate = 114

property min max
Homozygous (aa) 0% 100%
Heterozygous (ab) 0% 100%
Genome Haploid Length 251,713,776 bp 251,965,450 bp
Genome Repeat Length 52,929,734 bp 52,982,656 bp
Genome Unique Length 198,784,041 bp 198,982,794 bp
Model Fit 79.394% 92.8582%
Read Error Rate 0.236534% 0.236534%

http://genomescope.org/genomescope2.0/analysis.php?code=va4PdkFoOZtVXygmISbA

mschatz

mschatz commented on Dec 16, 2020

@mschatz
Contributor
ViriatoII

ViriatoII commented on Aug 19, 2021

@ViriatoII

I'm also curious about this. Genomescope1 seems to have estimations in line with literature for my species while genomescope2 not (even when multiplying by 2 because of haploid vs diploid)

mschatz

mschatz commented on Aug 20, 2021

@mschatz
Contributor
ViriatoII

ViriatoII commented on Aug 20, 2021

@ViriatoII

Hi Mike,
That's very kind of you, thank you.
As an example, this D. erucoides is estimated to have ~500 Mbps haploid genome size, 1000 Mbps in diploid size ( Lysák et al.,2009)

Genomescope1 predicts 435 Mbps haploid length, just short of literature, as well as a reasonable 0.65% of heterozygosity:
http://qb.cshl.edu/genomescope/analysis.php?code=izm95ZeGs1WxSvj8bidT

The genomescope2 run (even using max 100 000 coverage) predicts 212Mbps of haploid genome length, and a surprising max estimated heterozygosity of 20%.
http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=cKhv8wWNKBHMEgf3ilHd

I am applying the same pipeline to 20 species, some of which are tetraploid. I'd wish I could just apply the same parameters to all, or at least only treat the tetraploids differently and give them to genomescope2.

Appreciate the help,
Ricardo

mschatz

mschatz commented on Aug 20, 2021

@mschatz
Contributor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mschatz@ptranvan@ViriatoII

        Issue actions

          Completely different result between Genomescope v1 and v2 · Issue #48 · schatzlab/genomescope