Skip to content

Error: Cannot find first coding exon for transcript when building database #230

Closed
@tshalev

Description

@tshalev

Hello,

I am trying to build a database for trees species (Western Redcedar). I have a draft genome and some annotations in GFF3 format. When I try to build the database I get the following error:

Adjusting transcripts:
Adjusting genes:
Adjusting chromosomes lengths:
Ranking exons: ....................................................................................................
10000 ....................................................................................................
20000 ....................................................................................................
30000 ....................................................................................................
40000 ....................................................................................................
50000 ............................................................
Create UTRs from CDS (if needed):
Correcting exons based on frame information.
....java.lang.RuntimeException: Error: Cannot find first coding exon for transcript:
29184128:-672-2175, strand: -, id:PAC4GC:47054313, bioType:protein_coding, Protein
5'UTR : 29184128 2067-2175 UTR_5_PRIME 'PAC4GC:47054313.five_prime_UTR.1'
Exons:
29184128:-672--546 'PAC4GC:47054313.exon.2', rank: 3, frame: 2, sequence: cttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact
29184128:-200--7 'PAC4GC:47054313.exon.1', rank: 2, frame: ., sequence: tactagtgtaaccctcataatttgcaggctcttctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatg
29184128:37-112 'PAC4GC:47054313.exon.3', rank: 1, frame: 1, sequence: aaaattatcaagcgtggggcttaagggagctctctcaaataaaattggttctctgacagcacttcatactctgtaa
CDS : ctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatgcttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact
Protein : LFLQFPLLLFELLTYFGMTVQIEYEDMFWWVMLDFSFHGFPLLWSHKQRCFYPESDELAVGKYSPNKLEQWYRSL*LSLGWGELHKWPHNVT

at org.snpeff.interval.Transcript.getFirstCodingExon(Transcript.java:1136)
at org.snpeff.interval.Transcript.frameCorrectionFirstCodingExon(Transcript.java:909)
at org.snpeff.interval.Transcript.frameCorrection(Transcript.java:878)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.frameCorrection(SnpEffPredictorFactory.java:596)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.finishUp(SnpEffPredictorFactory.java:545)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:348)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)

java.lang.RuntimeException: Error reading file '/mnt/e/tal/Documents/UBC/GSAT/PhD/WRC/GS/wrc/snps/S_lines/filtering_for_pop_gen/new_analysis/snpEff/./data/tpli_3.1/genes.gff'
java.lang.RuntimeException: Error: Cannot find first coding exon for transcript:
29184128:-672-2175, strand: -, id:PAC4GC:47054313, bioType:protein_coding, Protein
5'UTR : 29184128 2067-2175 UTR_5_PRIME 'PAC4GC:47054313.five_prime_UTR.1'
Exons:
29184128:-672--546 'PAC4GC:47054313.exon.2', rank: 3, frame: 2, sequence: cttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact
29184128:-200--7 'PAC4GC:47054313.exon.1', rank: 2, frame: ., sequence: tactagtgtaaccctcataatttgcaggctcttctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatg
29184128:37-112 'PAC4GC:47054313.exon.3', rank: 1, frame: 1, sequence: aaaattatcaagcgtggggcttaagggagctctctcaaataaaattggttctctgacagcacttcatactctgtaa
CDS : ctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatgcttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact
Protein : LFLQFPLLLFELLTYFGMTVQIEYEDMFWWVMLDFSFHGFPLLWSHKQRCFYPESDELAVGKYSPNKLEQWYRSL*LSLGWGELHKWPHNVT

at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:353)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)

00:22:17 Logging
00:22:18 Checking for updates...

When I try deleting the offending sequence from the gff file it just finds an issue with another one. For reference, the gff file looks like this on this sequence:

##gff-version 3
##annot-version v3.1
##species Thuja plicata
29184128 JGI_gene mRNA 38 2176 . - . ID=PAC4GC:47054313;Name=Thpliv31003279m;longest=1;Parent=Thpliv31003279m.g
29184128 JGI_gene exon 1983 2176 . - . ID=PAC4GC:47054313.exon.1;Parent=PAC4GC:47054313
29184128 JGI_gene CDS 1983 2067 . - 0 ID=PAC4GC:47054313.CDS.1;Parent=PAC4GC:47054313
29184128 JGI_gene five_prime_UTR 2068 2176 . - . ID=PAC4GC:47054313.five_prime_UTR.1;Parent=PAC4GC:47054313
29184128 JGI_gene exon 1511 1637 . - . ID=PAC4GC:47054313.exon.2;Parent=PAC4GC:47054313
29184128 JGI_gene CDS 1511 1637 . - 2 ID=PAC4GC:47054313.CDS.2;Parent=PAC4GC:47054313
29184128 JGI_gene exon 38 113 . - . ID=PAC4GC:47054313.exon.3;Parent=PAC4GC:47054313
29184128 JGI_gene CDS 38 113 . - 1 ID=PAC4GC:47054313.CDS.3;Parent=PAC4GC:47054313

Sorry if this is kind of messy, I couldn't figure out how to make the table look better here.

Activity

VenithaB

VenithaB commented on Aug 19, 2019

@VenithaB

Hi! I'm getting the same error!

Adjusting transcripts:
Adjusting genes:
Adjusting chromosomes lengths:
Ranking exons: ....................................................................................................
10000 ....................................................................................................
20000 ....................................................................................................
30000 ............................................
Create UTRs from CDS (if needed):
Correcting exons based on frame information.
java.lang.RuntimeException: Error: Cannot find first coding exon for transcript:
NIGP01000374:-3367-38263, strand: -, id:AAEL023102-RA
5'UTR : NIGP01000374 38195-38263 UTR_5_PRIME 'UTR5_NIGP01000374_38196_38264'
Exons: NIGP01000374:-3367--3191 'EXON_NIGP01000374_38088_38264', rank: 2, frame: .,sequence: tcgcctacaatgctcaactagaaacaattactctaaggcgaaatccatctcacgttccaacctacgaaaatgcaattgaatggcacggtaacgatggctgcctcatctgaaccacccgagcctccacctcgcaatccggacaagatcaatgcatcactcaagcagctagccgaatcg

NIGP01000374:11027-11653 'EXON_NIGP01000374_11028_11654', rank: 1, frame: 0, sequence: aaaacccgttcgctggatacggccaccgataagacaaccgctccggccaccggtgcccgaccattccggcctatcctgtcgctggacaatgcaaagccattaacgaagccattcgaatcatctggaacgcccacgtcggcaccagcctcgtcgtttgccaacagtaacagtaacaacaataacaatggcagcagtcacaacagcagcatggaatcgaattcgaccagcacaaccgggggtccaaactcgggcaccggaaccagtggaagcagcatcagtagttccggtggaggcggaggtggtgacaatggccctgctgctgctgctgctgaactggtgagaggtggttcctcaggtagcggagtaagtccaccgggtgaaggcggtggaatagctggtcaaattggtaacaaattgaactccggtcaacagcagatctcgcccacgcagagtgaaaagagcagcacaggtgggagcaaggagcagtccggtgataattcgggcggcgataacctgttcaagaacggtgtgacagatctaggtgagtcgatagtattgttggtttatttggtaacatgtggaggtggagaattccgtatgaatatgattcatttttcatgatcgtaa

3'UTR : NIGP01000374 11027-11032 UTR_3_PRIME 'UTR3_NIGP01000374_11028_11033'

at org.snpeff.interval.Transcript.getFirstCodingExon(Transcript.java:1136)
at org.snpeff.interval.Transcript.frameCorrectionFirstCodingExon(Transcript.java:909)
at org.snpeff.interval.Transcript.frameCorrection(Transcript.java:878)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.frameCorrection(SnpEffPredictorFactory.java:596)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.finishUp(SnpEffPredictorFactory.java:545)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:348)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)

java.lang.RuntimeException: Error reading file'/home/group_AM/Venitha/installations/snpEff_latest_core/snpEff/./data/AaegL5/genes.gtf'

tshalev

tshalev commented on Aug 19, 2019

@tshalev
Author

My solution was to not use SnpEff and use Variant Effect Predictor instead.

jiabowang

jiabowang commented on Mar 13, 2020

@jiabowang

Hi there,
I have soluted this issue.
If we find this error, that means there are some genes in gtf file but not in fasta file.
So we just have to remove this gene in gtf file.
For example, sed -i "/ENSBGRT00000033763/d" genes.gtf

That works for my data.
There is the bin file in my dataset folder.

pcingola

pcingola commented on Aug 10, 2020

@pcingola
Owner

Closing old issues.

fanhuan

fanhuan commented on Jul 17, 2024

@fanhuan

I ran into similar problem and it was because my 5' UTR happened after start codon in one gene. FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @pcingola@fanhuan@tshalev@jiabowang@VenithaB

        Issue actions

          Error: Cannot find first coding exon for transcript when building database · Issue #230 · pcingola/SnpEff