gatk4-data-processing

Purpose :

Workflows for processing high-throughput sequencing data for variant discovery with GATK4 and related tools.

processing-for-variant-discovery-gatk4 :

The processing-for-variant-discovery-gatk4 WDL pipeline implements data pre-processing according to the GATK Best Practices. The workflow takes as input an unmapped BAM list file (text file containing paths to unmapped bam files) to perform preprocessing tasks such as mapping, marking duplicates, and base recalibration. It produces a single BAM file and its index suitable for variant discovery analysis using tools such as Haplotypecaller.

If you are starting with FASTQ files visit the seq-format-conversion repository for workflows to convert FASTQs to unmapped BAMS.
The processing-for-variant-discovery-gatk4 provides quick and general processing for sequence data using the latest releases of GATK. If users are interested in a more elaborate version of this workflow with quality control tasks and routinely tested for validity (useful in production environments) then visit the gatk4-genome-processing-pipeline repository.
The BAM output from processing-for-variant-discovery-gatk4 can be used to perform a variety of other analysis like somatic short variant discovery, germline short variant discovery, or germline copy number variant discovery. Visit the GATK Best Practices documentation to determine what to do next with the BAM files.

Requirements/expectations:

Pair-end sequencing data in unmapped BAM (uBAM) format
One or more read groups, one per uBAM file, all belonging to a single sample (SM)
Input uBAM files must additionally comply with the following requirements:
- filenames all have the same suffix (we use ".unmapped.bam")
- files must pass validation by ValidateSamFile
- reads are provided in query-sorted order
- all reads must have an RG tag
Reference index files must be in the same directory as source (e.g. reference.fasta.fai in the same directory as reference.fasta)

Outputs:

A clean BAM file and its index, suitable for variant discovery analyses.

Software version requirements :

GATK 4 or later
BWA 0.7.15-r1140
Picard 2.16.0-SNAPSHOT
Samtools 1.3.1 (using htslib 1.3.1)
Python 2.7
Cromwell version support
- Successfully tested on v59

Important Notes :

Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
For help running workflows on the Google Cloud Platform or locally please view the following tutorial (How to) Execute Workflows from the gatk-workflows Git Organization.
Please visit the User Guide site for further documentation on our workflows and tools.
Relevant reference and resources bundles can be accessed in Resource Bundle.

Contact Us :

The following material is provided by the Data Science Platforum group at the Broad Institute. Please direct any questions or concerns to one of our forum sites : GATK or Terra.

LICENSING :

Copyright Broad Institute, 2021 | BSD-3 This script is released under the WDL open source code license (BSD-3) (full license text at https://github.com/openwdl/wdl/blob/master/LICENSE). Note however that the programs it calls may be subject to different licenses. Users are responsible for checking that they are authorized to run all programs before running this script.

Name	Name	Last commit message	Last commit date
Latest commit samanthahv Merge pull request #30 from gatk-workflows/sv-update-readme Oct 29, 2021 c44603c · Oct 29, 2021 History 27 Commits
.dockstore.yml	.dockstore.yml	Updated docker versions (#28 )	Apr 1, 2021
LICENSE	LICENSE	First version of generic data processing workflow	Oct 1, 2017
README.md	README.md	Update README.md	Oct 29, 2021
generic.google-papi.options.json	generic.google-papi.options.json	First version of generic data processing workflow	Oct 1, 2017
processing-for-variant-discovery-gatk4.b37.wgs.inputs.json	processing-for-variant-discovery-gatk4.b37.wgs.inputs.json	Bs task files 2 workflow (#27 )	Oct 16, 2020
processing-for-variant-discovery-gatk4.hg38.wgs.inputs.json	processing-for-variant-discovery-gatk4.hg38.wgs.inputs.json	Bs task files 2 workflow (#27 )	Oct 16, 2020
processing-for-variant-discovery-gatk4.wdl	processing-for-variant-discovery-gatk4.wdl	Updated docker versions (#28 )	Apr 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gatk4-data-processing

Purpose :

processing-for-variant-discovery-gatk4 :

Requirements/expectations:

Outputs:

Software version requirements :

Important Notes :

Contact Us :

LICENSING :

About

Releases 8

Packages

Contributors 3

Languages

License

gatk-workflows/gatk4-data-processing

Folders and files

Latest commit

History

Repository files navigation

gatk4-data-processing

Purpose :

processing-for-variant-discovery-gatk4 :

Requirements/expectations:

Outputs:

Software version requirements :

Important Notes :

Contact Us :

LICENSING :

About

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 3

Languages

Packages