Skip to content

thecgs/genblastG_extension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is geneblastG_extension?

geneblastG_extension primarily addresses the following issues:

  1. Wraps the geneblastG software to enable multithreaded operation, enhancing computational efficiency.
  2. Corrects the phase of CDS in GFF3 files and reconstructs them into standard GFF3 format.
  3. Based on predicted gene chromosomal positions, it retains the gene model with the optimal score and length to eliminate redundancy.
  4. The provided diff_gff3.py script enables comparison between high-quality gene model and the gene model predicted by geneblastG_extension, facilitating gene family identification tasks.

Schema of quickprot algorithm

Fig.1. Workflow of the geneblastG_extension.

Installation:

python >= 3.5.

The biopython and natsort package must be installed.

The pipline is base on genblastG (v1.38) software and genblastg_patch patch.

$ git clone git@github.com:thecgs/genblastG_extension.git
$ pip install biopython
$ pip install natsort

Usage:

$ genblastG_extension.py -h
usage: genblastG_extension.py -g str -q str [-p str] [-c float] [-gap] [-e str] [-G int] [-t int] [-h] [-v]

This script is mainly the wrapper of genblastg software.
It can run genblastg in multiple threads, reconstruct the results, and output the standard gff3 file. 
Secondly, it can filter the redundant gene model according to the prediction score and the length of 
the predicted gene to generate the best non redundant gene model.

required arguments:
  -g str, --genome str  A file of genome fasta format.
  -q str, --query str   A file of query protein fasta format.

optional arguments:
  -p str, --prefix str  A prefix of output. default=genblastG_extension
  -c float, --query_cover float
                        minimum query cover (0-1) to report an alignment. defualt=0.8
  -gap, --gap           parameter for blast: Perform gapped alignment. default=False
  -e str, --evalue str  A maximum evalue of report alignments. defualt=1e-5
  -G int, --genetic_code int
                        Genetic code. default=1
  -t int, --thread int  Thread number of single sortware. defualt=16
  -h, --help            Show this help message and exit.
  -v, --version         Show program's version number and exit.

Date:2025/09/24 Author:Guisen Chen Email:thecgs001@foxmail.com

Example:

$ cd ./example

$ ../genblastG_extension.py -g genome.fa -q seq.fa

$ cat genblastG_extension.filtered.gff3 
scaffold292	genBlastG	gene	151014	161317	96.055	+	.	ID=GENE1_NP_001119935.1.1-R1-1-A1;Target=NP_001119935.1.1;
scaffold292	genBlastG	mRNA	151014	161317	.	+	.	ID=mrna.GENE1_NP_001119935.1.1-R1-1-A1;Parent=GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	exon	151014	151347	.	+	.	ID=exon.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	CDS	151014	151347	.	+	0	ID=cds.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	exon	159761	159975	.	+	.	ID=exon.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	CDS	159761	159975	.	+	2	ID=cds.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	exon	160161	160280	.	+	.	ID=exon.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	CDS	160161	160280	.	+	0	ID=cds.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	exon	161186	161317	.	+	.	ID=exon.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	CDS	161186	161317	.	+	0	ID=cds.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;

scaffold490	genBlastG	gene	530885	536012	79.581	-	.	ID=GENE2_NP_001092224.1.2-R1-1-A1;Target=NP_001092224.1.2;
scaffold490	genBlastG	mRNA	530885	536012	.	-	.	ID=mrna.GENE2_NP_001092224.1.2-R1-1-A1;Parent=GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	exon	535679	536012	.	-	.	ID=exon.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	CDS	535679	536012	.	-	0	ID=cds.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	exon	531679	531896	.	-	.	ID=exon.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	CDS	531679	531896	.	-	2	ID=cds.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	exon	531127	531246	.	-	.	ID=exon.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	CDS	531127	531246	.	-	0	ID=cds.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	exon	530885	531022	.	-	.	ID=exon.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	CDS	530885	531022	.	-	0	ID=cds.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;

$cat genblastG_extension.pep.fasta
>mrna.NP_001119935.1-R1-1-A1 Gene=NP_001119935.1-R1-1-A1 Length=266 Position=scaffold292:151014-161317(+)
MILSFCLLVTFILGLSGCLGSYVPCEPCDEKAMSMCPPVPVGCQLVKEPGCGCCLTCALSEGQACGVYTGTCTQGLRCLPRSGEEKPLHALLHGRGVCTNEKGYKPAHPPIDRESREHEDTMTTEITEELQPAKVPLLPKDIVNSKKVHALRKEQKRKLGKQRYMGSPMDYSPLPIDKHEPEFGPCRRKLDGIIQGMKDTSRVMALSLYLPNCDRKGFFKRKQCKPSRGRKRGICWCVDKYGIQLPGTDYSGGDIQCKDLESSNNE*
>mrna.NP_001092224.1-R2-2-A1 Gene=NP_001092224.1-R2-2-A1 Length=266 Position=scaffold490:530885-536012(-)
MLLSVSLLVLPLLSFPGCGSSYVPCEPCDQKAQSMCPPVPMGCQLVKEPGCGCCLTCALEEGQPCGVYTGPCTRGLRCLPKNGEEKPLHALLHGRGVCRNEKLYKLLHPSKDESHDDTLLPVPESMLPQTKVPLYGRDHISSRKVHAMKQAKDRKKQLARLGPASNLDFSPLSLDKMDPEFGPCRRRLDNLIQSMKDTSRVLALSLYIPNCDKKGFFKRKQCKPSRGRKRGICWCVDRFGVKIPGINYAGGDLQCKDLDSSSNSNE*

$ cat genblastG_extension.cds.fasta 
>mrna.NP_001119935.1-R1-1-A1 Gene=NP_001119935.1-R1-1-A1 Length=801 Position=scaffold292:151014-161317(+)
ATGATTCTGAGTTTTTGCCTCTTGGTGACATTTATCTTGGGGCTGTCCGGCTGCTTGGGCTCATACGTGCCGTGCGAGCCTTGTGACGAGAAGGCGATGTCCATGTGCCCTCCGGTCCCGGTCGGATGCCAGCTGGTCAAGGAGCCGGGCTGCGGCTGCTGCCTAACGTGTGCCCTGTCTGAGGGGCAGGCGTGCGGCGTTTACACCGGGACGTGCACCCAGGGCCTGCGCTGCCTGCCGAGGAGCGGGGAGGAGAAACCCCTGCACGCCCTTCTCCACGGCAGGGGAGTGTGCACCAACGAGAAAGGATACAAACCTGCCCACCCGCCCATAGATCGTGAGTCTCGAGAACATGAGGACACCATGACCACAGAGATTACAGAGGAGTTGCAGCCAGCCAAAGTGCCGCTCCTTCCTAAAGACATTGTGAACAGTAAAAAAGTCCATGCGCTGCGCAAGGAGCAAAAGAGGAAGCTGGGCAAGCAGCGCTACATGGGCTCTCCTATGGACTATTCCCCTCTGCCCATCGACAAGCATGAGCCTGAATTTGGTCCATGCAGAAGAAAACTGGATGGGATCATTCAGGGGATGAAGGACACTTCTCGTGTAATGGCTCTGTCTTTGTACCTCCCCAACTGCGACAGAAAAGGATTCTTCAAGCGCAAGCAGTGTAAACCATCTCGCGGCCGCAAACGAGGCATCTGCTGGTGCGTGGACAAGTACGGCATCCAGCTCCCCGGCACAGACTACAGCGGAGGGGACATTCAGTGTAAAGACCTGGAGAGCAGCAACAACGAGTGA
>mrna.NP_001092224.1-R2-2-A1 Gene=NP_001092224.1-R2-2-A1 Length=801 Position=scaffold490:530885-536012(-)
ATGCTGCTGAGTGTTTCCCTCCTGGTGCTCCCCCTGCTTAGCTTCCCCGGCTGCGGCTCGTCGTACGTGCCGTGCGAGCCGTGCGATCAGAAGGCCCAGTCCATGTGCCCGCCGGTGCCGATGGGCTGTCAGCTGGTGAAGGAGCCCGGCTGCGGCTGCTGCCTGACGTGCGCGCTCGAAGAGGGCCAGCCGTGCGGCGTGTACACCGGGCCGTGCACCCGCGGGCTCCGGTGCCTCCCGAAGAACGGCGAGGAGAAGCCGCTGCATGCCCTGCTGCACGGCCGGGGGGTGTGCAGGAACGAGAAGTTGTACAAACTGCTGCATCCGTCAAAAGACGAATCTCACGATGACACCCTGCTGCCCGTCCCTGAGTCAATGCTGCCGCAAACCAAGGTGCCCTTATATGGAAGAGACCACATCAGCAGTCGGAAGGTCCACGCCATGAAGCAAGCCAAGGACCGCAAGAAGCAGCTGGCCAGGTTGGGACCTGCCAGCAACCTGGACTTCTCACCGCTAAGCCTGGATAAAATGGATCCCGAGTTCGGGCCCTGCAGGAGAAGATTGGACAATCTCATCCAGAGCATGAAAGACACCTCTCGGGTCTTGGCTCTCTCTCTGTACATCCCCAACTGTGACAAGAAGGGCTTCTTCAAGCGCAAACAGTGTAAGCCGTCTCGTGGACGAAAAAGGGGCATCTGCTGGTGCGTCGACCGGTTTGGCGTGAAAATCCCAGGCATCAACTACGCCGGCGGAGACCTGCAGTGCAAGGATCTCGACAGCAGCAGCAACAGCAATGAATGA

$cat genblastG_extension.raw.gff3
scaffold292	genBlastG	gene	151014	161317	96.055	+	.	ID=GENE1_NP_001119935.1.1-R1-1-A1;Target=NP_001119935.1.1;
scaffold292	genBlastG	mRNA	151014	161317	.	+	.	ID=mrna.GENE1_NP_001119935.1.1-R1-1-A1;Parent=GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	exon	151014	151347	.	+	.	ID=exon.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	CDS	151014	151347	.	+	0	ID=cds.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	exon	159761	159975	.	+	.	ID=exon.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	CDS	159761	159975	.	+	2	ID=cds.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	exon	160161	160280	.	+	.	ID=exon.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	CDS	160161	160280	.	+	0	ID=cds.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	exon	161186	161317	.	+	.	ID=exon.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;
scaffold292	genBlastG	CDS	161186	161317	.	+	0	ID=cds.GENE1_NP_001119935.1.1-R1-1-A1;Parent=mrna.GENE1_NP_001119935.1.1-R1-1-A1;

scaffold490	genBlastG	gene	530885	536012	79.581	-	.	ID=GENE2_NP_001092224.1.2-R1-1-A1;Target=NP_001092224.1.2;
scaffold490	genBlastG	mRNA	530885	536012	.	-	.	ID=mrna.GENE2_NP_001092224.1.2-R1-1-A1;Parent=GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	exon	535679	536012	.	-	.	ID=exon.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	CDS	535679	536012	.	-	0	ID=cds.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	exon	531679	531896	.	-	.	ID=exon.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	CDS	531679	531896	.	-	2	ID=cds.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	exon	531127	531246	.	-	.	ID=exon.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	CDS	531127	531246	.	-	0	ID=cds.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	exon	530885	531022	.	-	.	ID=exon.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;
scaffold490	genBlastG	CDS	530885	531022	.	-	0	ID=cds.GENE2_NP_001092224.1.2-R1-1-A1;Parent=mrna.GENE2_NP_001092224.1.2-R1-1-A1;

scaffold292	genBlastG	gene	151014	161317	77.4653	+	.	ID=GENE3_NP_001092224.1.2-R2-2-A1;Target=NP_001092224.1.2;
scaffold292	genBlastG	mRNA	151014	161317	.	+	.	ID=mrna.GENE3_NP_001092224.1.2-R2-2-A1;Parent=GENE3_NP_001092224.1.2-R2-2-A1;
scaffold292	genBlastG	exon	151014	151347	.	+	.	ID=exon.GENE3_NP_001092224.1.2-R2-2-A1;Parent=mrna.GENE3_NP_001092224.1.2-R2-2-A1;
scaffold292	genBlastG	CDS	151014	151347	.	+	0	ID=cds.GENE3_NP_001092224.1.2-R2-2-A1;Parent=mrna.GENE3_NP_001092224.1.2-R2-2-A1;
scaffold292	genBlastG	exon	159761	159975	.	+	.	ID=exon.GENE3_NP_001092224.1.2-R2-2-A1;Parent=mrna.GENE3_NP_001092224.1.2-R2-2-A1;
scaffold292	genBlastG	CDS	159761	159975	.	+	2	ID=cds.GENE3_NP_001092224.1.2-R2-2-A1;Parent=mrna.GENE3_NP_001092224.1.2-R2-2-A1;
scaffold292	genBlastG	exon	160161	160280	.	+	.	ID=exon.GENE3_NP_001092224.1.2-R2-2-A1;Parent=mrna.GENE3_NP_001092224.1.2-R2-2-A1;
scaffold292	genBlastG	CDS	160161	160280	.	+	0	ID=cds.GENE3_NP_001092224.1.2-R2-2-A1;Parent=mrna.GENE3_NP_001092224.1.2-R2-2-A1;
scaffold292	genBlastG	exon	161186	161317	.	+	.	ID=exon.GENE3_NP_001092224.1.2-R2-2-A1;Parent=mrna.GENE3_NP_001092224.1.2-R2-2-A1;
scaffold292	genBlastG	CDS	161186	161317	.	+	0	ID=cds.GENE3_NP_001092224.1.2-R2-2-A1;Parent=mrna.GENE3_NP_001092224.1.2-R2-2-A1;

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published