-
Notifications
You must be signed in to change notification settings - Fork 12
How to Interpret matchAnnot Output
TomSkelly edited this page Jan 29, 2015
·
3 revisions
The output of the matchAnnot.py script contains several types of line, each beginning with a tag:
tag | content |
---|---|
isoform: | A mapped isoform, output of IsoSeq. Line shows isoform name, and start and end genomic coordinates of alignment. |
cigar: | The cigar string from the SAM file entry for the isoform. |
cl: | A list of the reads-of-insert which were clustered to create the isoform. This information is printed only if a cluster report file is supplied via the --clusters parameter. Each line lists one or more reads from a single SMRTcell, labelled as either full-length or non-FL. The mapping from SMRTcell number to full SMRTcell name is in the summary at the end of the output. |
polyA: | A list of the positions where polyadenylation motifs were found near the 3' end of the isoform. |
gene: | A gene in the annotation file whose position overlaps the aligned isoform. Line shows gene name, its start and end coordinates, and the differences between those and the isoform start and end. |
tr: | An annotated transcript of the gene under consideration. Line shows transcript name, a score, and the exon-to-exon mapping. Each [] grouping in the exon mapping is a list of transcript exons which match the isoform exons (see example below). The transcript scoring system is described below. |
exon: | Details of a single exon match. Shown only for transcripts with score >= 3. Line shows isoform and transcript start and stop coordinates and the delta between them, plus the number of indels found in the alignment (per the cigar string). |
result: | A one-line summary for the isoform, showing the best gene and trancript found, and the resulting score. |
summary: | Bookkeeping information at the end. |
Each annotated transcript of each gene which maps to an IsoSeq cluster is given a score form 0 to 5. The higher the score, the better the match between the individual exons of the cluster and the exons of the annotation. Scores are as follows:
- 5: IsoSeq exons match annotation exons one-for-one. Sizes agree except for leading and trailing edges.
- 4: Like 5, but leading and trailing edge sizes differ by a larger amount than the score-5 transcript found for this gene.
- 3: One-for-one exon match, but sizes of internal exons disagree.
- 2: The best match among all score=1 transcripts.
- 1: Some exons overlap, overlaps are 1-for-1 where they exist.
- 0: Everyting else: isoform overlaps gene, but little or no exon congruance.