findGSE

findGSE is a tool for estimating size of (heterozygous diploid or homozygous) genomes by fitting k-mer frequencies iteratively with a skew normal distribution model, which is written in R (code). The current version works on Linux & Mac OS X with R version 3.3.1 or above.

To use findGSE, one needs to input a k value and a corresponding k-mer histo file generated with short reads, which contains two tab-separated columns. The first column gives frequencies at which k-mers occur in reads, while the second column gives counts of such distinct k-mers (example).

Given multiple fastq.gz files, here is a two-step example for counting k-mers with jellyfish:

  zcat *.fastq.gz | jellyfish count /dev/fd/0 -C -o test_21mer -m 21 -t 1 -s 5G
  jellyfish histo -h 3000000 -o test_21mer.histo test_21mer

After getting the .histo file, supposing findGSE has been installed (INSTALL), we can do the following for GSE under R environment:

  library("findGSE")
  findGSE(histo="test_21mer.histo", sizek=21, outdir="hom_test_21mer")

Results will be printed like "Genome size estimate for test_21mer.histo: 1498918 bp." For more information about estimation, one can check the .txt and .pdf files in the output dir.

Two detailed toy examples about GSE for heterozygous and homozygous genomes are provided for playing around.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
R		R
example		example
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
INSTALL		INSTALL
NAMESPACE		NAMESPACE
README.md		README.md
findGSE.Rproj		findGSE.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

example

example

man

man

.Rbuildignore

.Rbuildignore

.gitignore

.gitignore

DESCRIPTION

DESCRIPTION

INSTALL

INSTALL

NAMESPACE

NAMESPACE

README.md

README.md

findGSE.Rproj

findGSE.Rproj

Repository files navigation

findGSE

About

Releases

Packages

Languages

schneebergerlab/findGSE

Folders and files

Latest commit

History

Repository files navigation

findGSE

About

Resources

Stars

Watchers

Forks

Languages