Skip to content

SnakeMake pipeline for ichorCNA

Gavin Ha edited this page Aug 1, 2018 · 6 revisions

Snakemake workflow for ichorCNA

Description

This workflow will run the ichorCNA pipeline starting from the BAM files and generating ichorCNA outputs.

Link to pipeline

ichorCNA snakemake

Requirements

Software packages or libraries

Scripts/executables

  1. readCounter (C++ executable; HMMcopy Suite)
  2. runIchorCNA.R

cfDNA sample list

The list of cfDNA samples should be defined in a YAML file. See config/samples.yaml for an example. The field samples must be provided.

samples:
  tumor_sample_1:  /path/to/bam/tumor.bam

snakefiles

ichorCNA.snakefile

Invoking the full snakemake workflow for ichorCNA

# show commands and workflow
snakemake -s ichorCNA.snakefile -np
# run the workflow locally using 5 cores
snakemake -s ichorCNA.snakefile --cores 5
# run the workflow on qsub using a maximum of 50 jobs. 
# Broad UGER cluster parameters can be set directly in config/cluster.sh. 
snakemake -s ichorCNA.snakefile --cluster-sync "qsub" -j 50 --jobscript config/cluster.sh

Modifying ichorCNA parameters in config.yaml

For hg38, please use config_hg38.yaml.
It has paths to reference files specific for hg38. The chromosome naming style is set for UCSC (e.g. "chr1"). Users can set this so that the output can be UCSC or NCBI style. The input files, including tumor and normal wigs, ichorCNA_normalPanel, ichorCNA_gcWig, ichorCNA_mapWig, ichorCNA_centromere, ichorCNA_exons files can be in any style. Also, the ichorCNA_chrs and ichorCNA_chrTrain settings in the config file can be in any style.

ichorCNA_genomeStyle: UCSC # sets output chromosome naming style

Setting chromosomes and bin size to analyze. The bin size should be adjusted to account for sequencing coverage - larger bin sizes for lower coverage. Currently, 1Mb is used for 0.1x coverage.

chrs:
  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y
binSize:  1000000 # set window size to compute coverage

Include paths to the main ichorCNA R script and the normal panel to help normalize the data. The normal panel is optional but if included, should correspond to the same bin size.

# included in GitHub repo
ichorCNA_rscript:  ../runIchorCNA.R
# use panel matching same bin size (optional)
ichorCNA_normalPanel: ../../inst/extdata/HD_ULP_PoN_1Mb_median_normAutosome_mapScoreFiltered_median.rds

The GC and mappability wig files must be provided. These files should correspond to the same bin size.

# must use gc wig file corresponding to same binSize (required)
ichorCNA_gcWig: ../../inst/extdata/gc_hg19_1000kb.wig
# must use map wig file corresponding to same binSize (required)
ichorCNA_mapWig:  ../../inst/extdata/map_hg19_1000kb.wig

Targeted intervals (e.g. exons) and centromere file. Both are optional.

# use bed file if sample has targeted regions, eg. exome data (optional)
ichorCNA_exons:  NULL
ichorCNA_centromere:  ../../inst/extdata/GRCh37.p13_centromere_UCSC-gapTable.txt

Various settings for ichorCNA model parameters. Normal (non-tumor) settings should include various restart values. For cfDNA, non-tumor fraction tends to be higher, therefore including higher values are recommended.

ichorCNA_chrs:  c(1:22, \"X\")
# chrs used for training ichorCNA parameters, e.g. tumor fraction.
ichorCNA_chrTrain:  c(1:22)
# non-tumor fraction parameter restart values; higher values should be included for cfDNA
ichorCNA_normal:  c(0.5,0.6,0.7,0.8,0.9,0.95)
# ploidy parameter restart values
ichorCNA_ploidy:  c(2,3)
ichorCNA_estimateNormal:  TRUE
ichorCNA_estimatePloidy:  TRUE

scStates refers to subclonal copy number states - 1 (deletion) and 3 (gain) subclonal states are included. If you do not wish to model subclonal events, then use ichorCNA_scStates: c() and ichorCNA_estimateClonality: TRUE.

# states to use for subclonal CN
ichorCNA_scStates:  c(1,3)
ichorCNA_estimateClonality: TRUE

Settings for copy number. For low coverage (e.g 0.1x) and therefore large bin size (e.g. 1Mb) is used, then homozygous deletion should not be included (i.e. ichorCNA_includeHOMD: FALSE). For higher coverage data (e.g. >10x), modeling homozygous deletions can be turned on.

# set maximum copy number to use
ichorCNA_maxCN:  5
# TRUE/FALSE to include homozygous deletion state
ichorCNA_includeHOMD: FALSE

Segmentation settings including adjusting sensitivity for events and controlling number of segments.

# higher (e.g. 0.9999999) leads to higher specificity and fewer segments
# lower (e.g. 0.99) leads to higher sensitivity and more segments
ichorCNA_txnE:  0.9999
# higher (e.g. 10000000) leads to higher specificity and fewer segments
# lower (e.g. 100) leads to higher sensitivity and more segments
ichorCNA_txnStrength:  10000