Skip to content

Easter Release - local alignments for single-cell and scNMT-seq

Compare
Choose a tag to compare
@FelixKrueger FelixKrueger released this 16 Apr 15:42
· 279 commits to master since this release

Expanding on our observation that single-cell BS-seq, or PBAT libraries in general, can generate chimeric read pairs, a recent publication by Wu et al. described in further detail that intra-fragment chimeras can hinder the efficient alignment of single-cell BS-seq libraries. In there, the authors described a pipeline that uses paired-end alignments first, followed by a second, single-end alignment step that uses local alignments in a bid to improve the mapping of intra-molecular chimeras. To allow this type of improvement for single-cell or PBAT libraries, we have been experimenting with allowing local alignments.

Please note that we still do not recommend using local alignments as a means to magically increase mapping efficiencies (please see here), but we do acknowledge that PBAT/scBSs-seq/scNMT-seq are exceptional applications where local alignments might indeed make a difference (there is only so much data to be had from a single cell...).
We didn't have the time yet to set more appropriate or stringent default values for local alignments (suggestions welcome), nor did we investigate whether the methylation extraction will require an additional --ignore flag if a read was found to the be soft-clipped (the so called 'micro-homology domains'). This might be added in the near future.

Bismark

  • Added support for local alignments by introducing the new option --local. This means that the CIGAR operation S (soft-clipping) is now supported

  • fixed typo in option --path_to_bowtie2 (a single missing 2 was preventing the specified path to be accepted)

  • fixed typo in option --no-spliced-alignment in HISAT2 mode

  • fixed missing end-of-line character for unmapped or ambiguous FastQ sequences in paired-end FastQ mode

  • fixed output file naming in --hisat2 and --parallel mode (_hisat2 was missing in --parallel mode). Thanks to @phue for spotting this.

bismark_genome_preparation

  • Added option --large-index to force the generation of LARGE genome indexes. This may be required for indexing extremely large genomes (e.g. the Axolotl (32 GigaBases)) in --parallel mode. For more information on why the indexing was failing previously see here

bismark_methylation_extractor

  • Now supporting reads containing soft-clipped bases (CIGAR operation S)

bam2nuc

  • Now supporting reads containing soft-clipped bases (CIGAR operation S)

deduplicate_bismark

  • Now supporting reads containing soft-clipped bases (CIGAR operation S)