Easter Release - local alignments for single-cell and scNMT-seq
Expanding on our observation that single-cell BS-seq, or PBAT libraries in general, can generate chimeric read pairs, a recent publication by Wu et al. described in further detail that intra-fragment chimeras can hinder the efficient alignment of single-cell BS-seq libraries. In there, the authors described a pipeline that uses paired-end alignments first, followed by a second, single-end alignment step that uses local alignments in a bid to improve the mapping of intra-molecular chimeras. To allow this type of improvement for single-cell or PBAT libraries, we have been experimenting with allowing local alignments.
Please note that we still do not recommend using local alignments as a means to magically increase mapping efficiencies (please see here), but we do acknowledge that PBAT/scBSs-seq/scNMT-seq are exceptional applications where local alignments might indeed make a difference (there is only so much data to be had from a single cell...).
We didn't have the time yet to set more appropriate or stringent default values for local alignments (suggestions welcome), nor did we investigate whether the methylation extraction will require an additional --ignore
flag if a read was found to the be soft-clipped (the so called 'micro-homology domains'). This might be added in the near future.
Bismark
-
Added support for local alignments by introducing the new option
--local
. This means that the CIGAR operationS
(soft-clipping) is now supported -
fixed typo in option
--path_to_bowtie2
(a single missing2
was preventing the specified path to be accepted) -
fixed typo in option
--no-spliced-alignment
in HISAT2 mode -
fixed missing end-of-line character for unmapped or ambiguous FastQ sequences in paired-end FastQ mode
-
fixed output file naming in
--hisat2
and--parallel
mode (_hisat2 was missing in--parallel
mode). Thanks to @phue for spotting this.
bismark_genome_preparation
- Added option
--large-index
to force the generation of LARGE genome indexes. This may be required for indexing extremely large genomes (e.g. the Axolotl (32 GigaBases)) in--parallel
mode. For more information on why the indexing was failing previously see here
bismark_methylation_extractor
- Now supporting reads containing soft-clipped bases (CIGAR operation S)
bam2nuc
- Now supporting reads containing soft-clipped bases (CIGAR operation S)
deduplicate_bismark
- Now supporting reads containing soft-clipped bases (CIGAR operation S)