Description
Hello,
I am trying to build a database for trees species (Western Redcedar). I have a draft genome and some annotations in GFF3 format. When I try to build the database I get the following error:
Adjusting transcripts:
Adjusting genes:
Adjusting chromosomes lengths:
Ranking exons: ....................................................................................................
10000 ....................................................................................................
20000 ....................................................................................................
30000 ....................................................................................................
40000 ....................................................................................................
50000 ............................................................
Create UTRs from CDS (if needed):
Correcting exons based on frame information.
....java.lang.RuntimeException: Error: Cannot find first coding exon for transcript:
29184128:-672-2175, strand: -, id:PAC4GC:47054313, bioType:protein_coding, Protein
5'UTR : 29184128 2067-2175 UTR_5_PRIME 'PAC4GC:47054313.five_prime_UTR.1'
Exons:
29184128:-672--546 'PAC4GC:47054313.exon.2', rank: 3, frame: 2, sequence: cttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact
29184128:-200--7 'PAC4GC:47054313.exon.1', rank: 2, frame: ., sequence: tactagtgtaaccctcataatttgcaggctcttctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatg
29184128:37-112 'PAC4GC:47054313.exon.3', rank: 1, frame: 1, sequence: aaaattatcaagcgtggggcttaagggagctctctcaaataaaattggttctctgacagcacttcatactctgtaa
CDS : ctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatgcttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact
Protein : LFLQFPLLLFELLTYFGMTVQIEYEDMFWWVMLDFSFHGFPLLWSHKQRCFYPESDELAVGKYSPNKLEQWYRSL*LSLGWGELHKWPHNVT
at org.snpeff.interval.Transcript.getFirstCodingExon(Transcript.java:1136)
at org.snpeff.interval.Transcript.frameCorrectionFirstCodingExon(Transcript.java:909)
at org.snpeff.interval.Transcript.frameCorrection(Transcript.java:878)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.frameCorrection(SnpEffPredictorFactory.java:596)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.finishUp(SnpEffPredictorFactory.java:545)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:348)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
java.lang.RuntimeException: Error reading file '/mnt/e/tal/Documents/UBC/GSAT/PhD/WRC/GS/wrc/snps/S_lines/filtering_for_pop_gen/new_analysis/snpEff/./data/tpli_3.1/genes.gff'
java.lang.RuntimeException: Error: Cannot find first coding exon for transcript:
29184128:-672-2175, strand: -, id:PAC4GC:47054313, bioType:protein_coding, Protein
5'UTR : 29184128 2067-2175 UTR_5_PRIME 'PAC4GC:47054313.five_prime_UTR.1'
Exons:
29184128:-672--546 'PAC4GC:47054313.exon.2', rank: 3, frame: 2, sequence: cttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact
29184128:-200--7 'PAC4GC:47054313.exon.1', rank: 2, frame: ., sequence: tactagtgtaaccctcataatttgcaggctcttctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatg
29184128:37-112 'PAC4GC:47054313.exon.3', rank: 1, frame: 1, sequence: aaaattatcaagcgtggggcttaagggagctctctcaaataaaattggttctctgacagcacttcatactctgtaa
CDS : ctttttcttcaattttagccactattactgtttgaactcttaacttattttggcatgacataagttcaaatagaatatgaggactagatgttttggtgggttatgcttgatttttcttttcatggcttccctcttctttggagtcacaaacagcgatgatgcttctaccctgaatctgatgagcttgctgtgggaaaatacagtcccaacaagctggaacagtggtacagatccctgtgactttcactgggatggggtgaactgcacaaatggccgcataacgtcact
Protein : LFLQFPLLLFELLTYFGMTVQIEYEDMFWWVMLDFSFHGFPLLWSHKQRCFYPESDELAVGKYSPNKLEQWYRSL*LSLGWGELHKWPHNVT
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:353)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
00:22:17 Logging
00:22:18 Checking for updates...
When I try deleting the offending sequence from the gff file it just finds an issue with another one. For reference, the gff file looks like this on this sequence:
##gff-version 3
##annot-version v3.1
##species Thuja plicata
29184128 JGI_gene mRNA 38 2176 . - . ID=PAC4GC:47054313;Name=Thpliv31003279m;longest=1;Parent=Thpliv31003279m.g
29184128 JGI_gene exon 1983 2176 . - . ID=PAC4GC:47054313.exon.1;Parent=PAC4GC:47054313
29184128 JGI_gene CDS 1983 2067 . - 0 ID=PAC4GC:47054313.CDS.1;Parent=PAC4GC:47054313
29184128 JGI_gene five_prime_UTR 2068 2176 . - . ID=PAC4GC:47054313.five_prime_UTR.1;Parent=PAC4GC:47054313
29184128 JGI_gene exon 1511 1637 . - . ID=PAC4GC:47054313.exon.2;Parent=PAC4GC:47054313
29184128 JGI_gene CDS 1511 1637 . - 2 ID=PAC4GC:47054313.CDS.2;Parent=PAC4GC:47054313
29184128 JGI_gene exon 38 113 . - . ID=PAC4GC:47054313.exon.3;Parent=PAC4GC:47054313
29184128 JGI_gene CDS 38 113 . - 1 ID=PAC4GC:47054313.CDS.3;Parent=PAC4GC:47054313
Sorry if this is kind of messy, I couldn't figure out how to make the table look better here.
Activity
VenithaB commentedon Aug 19, 2019
Hi! I'm getting the same error!
Adjusting transcripts:
Adjusting genes:
Adjusting chromosomes lengths:
Ranking exons: ....................................................................................................
10000 ....................................................................................................
20000 ....................................................................................................
30000 ............................................
Create UTRs from CDS (if needed):
Correcting exons based on frame information.
java.lang.RuntimeException: Error: Cannot find first coding exon for transcript:
NIGP01000374:-3367-38263, strand: -, id:AAEL023102-RA
5'UTR : NIGP01000374 38195-38263 UTR_5_PRIME 'UTR5_NIGP01000374_38196_38264'
Exons: NIGP01000374:-3367--3191 'EXON_NIGP01000374_38088_38264', rank: 2, frame: .,sequence: tcgcctacaatgctcaactagaaacaattactctaaggcgaaatccatctcacgttccaacctacgaaaatgcaattgaatggcacggtaacgatggctgcctcatctgaaccacccgagcctccacctcgcaatccggacaagatcaatgcatcactcaagcagctagccgaatcg
NIGP01000374:11027-11653 'EXON_NIGP01000374_11028_11654', rank: 1, frame: 0, sequence: aaaacccgttcgctggatacggccaccgataagacaaccgctccggccaccggtgcccgaccattccggcctatcctgtcgctggacaatgcaaagccattaacgaagccattcgaatcatctggaacgcccacgtcggcaccagcctcgtcgtttgccaacagtaacagtaacaacaataacaatggcagcagtcacaacagcagcatggaatcgaattcgaccagcacaaccgggggtccaaactcgggcaccggaaccagtggaagcagcatcagtagttccggtggaggcggaggtggtgacaatggccctgctgctgctgctgctgaactggtgagaggtggttcctcaggtagcggagtaagtccaccgggtgaaggcggtggaatagctggtcaaattggtaacaaattgaactccggtcaacagcagatctcgcccacgcagagtgaaaagagcagcacaggtgggagcaaggagcagtccggtgataattcgggcggcgataacctgttcaagaacggtgtgacagatctaggtgagtcgatagtattgttggtttatttggtaacatgtggaggtggagaattccgtatgaatatgattcatttttcatgatcgtaa
3'UTR : NIGP01000374 11027-11032 UTR_3_PRIME 'UTR3_NIGP01000374_11028_11033'
java.lang.RuntimeException: Error reading
file'/home/group_AM/Venitha/installations/snpEff_latest_core/snpEff/./data/AaegL5/genes.gtf'
tshalev commentedon Aug 19, 2019
My solution was to not use SnpEff and use Variant Effect Predictor instead.
jiabowang commentedon Mar 13, 2020
Hi there,
I have soluted this issue.
If we find this error, that means there are some genes in gtf file but not in fasta file.
So we just have to remove this gene in gtf file.
For example, sed -i "/ENSBGRT00000033763/d" genes.gtf
That works for my data.
There is the bin file in my dataset folder.
pcingola commentedon Aug 10, 2020
Closing old issues.
fanhuan commentedon Jul 17, 2024
I ran into similar problem and it was because my 5' UTR happened after start codon in one gene. FYI.