Skip to content
asmariyaz23 edited this page Jul 18, 2018 · 5 revisions

DISCASM: Discordant and Unmapped Read De novo Transcriptome Assembly

DISCASM is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). DISCASM aims to extract reads that map to reference genomes in a discordant fashion and optionally include reads that do not map to the genome at all, and perform a de novo transcriptome assembly of these reads. DISCASM relies on the output from STAR (as run via STAR-Fusion), and supports de novo transcriptome assembly using Trinity or Oases.

3 ways to get started:

  1. git clone https://github.com/DISCASM/DISCASM.git
  2. Conda installation
  3. Galaxy toolshed installation of the tool
  4. Run this tool via our Trinity CTAT Galaxy Portal.

Running DISCASM

To run DISCASM, you must have the following tools installed:

and optionally

Run STAR-Fusion to align your RNA-Seq data to a reference genome. If you do not want to use STAR-Fusion, but rather run STAR directly yourself, see the STAR-Fusion documentation for an example STAR command required for generating the required outputs.

The outputs generated by STAR should include two files:

  • Aligned.sortedByCoord.out.bam :the read alignments in bam format.
  • Chimeric.out.junction :a listing of the discordantly-mapped reads.

Given these two files and your original RNA-Seq reads, you can run DISCASM like so to assemble discordant and unmapped reads:

DISCASM --chimeric_junctions Chimeric.out.junction \
         --aligned_bam Aligned.sortedByCoord.out.bam \
         --left_fq reads_1.fq.gz --right_fq reads_2.fq.gz \
         --denovo_assembler Trinity \
         --out_dir DISCASM_outdir

Options for the --denovo_assembler parameter include Trinity, Oases, or OasesMultiK.

The OasesMultiK option runs Oases at k-mer values of 19, 23, 27, 31, and 35 and then merges the separate assemblies into a single assembly (eg. as done via JAFFA).

To run DISCASM on just the discordant reads, discard the --aligned_bam parameter, and only those reads identified in the Chimeric.out.junction file will be assembled.

Fusion Transcript Discovery using DISCASM Assembled Transcripts

To identify candidate fusion transcripts (such as in cancer), you can use GMAP-fusion.

User support

Contact us on our google group https://groups.google.com/forum/#!forum/trinity_ctat_users