Skip to content

Genomescope2 dramatically underestimate genome size and overestimate heterozygosity #43

Open
@agroppi

Description

@agroppi

I have run Genomescope2 online with the default parameters
My starting material is : 16 fastq files from 16 SMRTcells (PacBio RSII)
I have run the following command lines :

jellyfish count -C -m 21 -s 12000000000 -t 20 ./*.fastq -o myreads.jf
jellyfish histo -t 20 myreads.jf > myreads.histo

The genome size is around 220 Mb and very homozygous

But in GenomeScope2 the results are :
Genome Haploid Length 16,296,648 bp 16,748,017 bp
Homozygous (aa) 95.0821% 95.7105%
Heterozygous (ab) 4.28954% 4.91792%

Full results are here : http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=RAe5oDd95jhgUdK9zZ8f

How can it be possible ?

Thanks for your help

Activity

mschatz

mschatz commented on Sep 9, 2020

@mschatz
Contributor
agroppi

agroppi commented on Sep 10, 2020

@agroppi
Author

Thanks for your answer;
I looked in the maze of the Falcon pipeline I used, and I found the corrected reads in fasta format
my_falcon_directory/1-preads_ovl/db2falcon/preads4falcon.fasta
It worked perfectly

Heterozygous (ab) 0.211677% 0.250628%
Genome Haploid Length 235,933,558 bp 236,368,387 bp

mschatz

mschatz commented on Sep 10, 2020

@mschatz
Contributor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mschatz@agroppi

        Issue actions

          Genomescope2 dramatically underestimate genome size and overestimate heterozygosity · Issue #43 · schatzlab/genomescope