Skip to content

Files

Latest commit

 

History

History

calculate_Ka_Ks_pipeline

A pipeline used to calculate Ka and Ks

This pipeline adopted a method named gamma-MYN method (a Modified version of Yang-Nielsen method) to estimate Ka and Ks values. MAFFT (L-INS-i) is used to perform pairwise alignment of protein sequences for each duplicate gene pair.

Authors Xin Qiao (qiaoxin)
Leiting Li (lileiting)
Email qiaoxinqx2011@126.com

Dependencies

Installation

git clone https://github.com/qiao-xin/Scripts_for_GB.git

Running

Once the required dependencies have been installed, try running this pipeline on the example data:

perl calculate_Ka_Ks_pipe.pl -d data/Ath.cds -g data/Ath.tandem.pairs -o result/Ath.td.kaks

Note: It will take a few minutes to carry out above computation task, so please be patient. The CDS sequences (FASTA format) can be downloaded from Phytozome, NCBI, Ensembl Plants, etc. The different modes of duplicated gene pairs for any interested species can be identified using DupGen_finder, and are also available on PlantDGD.

Results Files

1 - Ath.td.kaks.axt

The aligned pairwise sequences with AXT format was used as input file for KaKs_Calculator software.

2 - Ath.td.kaks.KKC.format

KaKs_Calculator generates this file that contains Ka, Ks, Ka/Ks values and other informations.

Note: KaKs_Calculator provides comprehensive information estimated from compared sequences, including numbers of synonymous and nonsynonymous sites, numbers of synonymous and nonsynonymous substitutions, GC contents, maximum-likelihood score, and AICC, in addition to synonymous and nonsynonymous substitution rates and their ratio. Meanwhile, Fisher’s exact test for small sample is applied to justify the validity of Ka and Ks calculated by these methods.

3 - Ath.td.kaks

This is a simplified version of KaKs_Calculator output file, which only contains Ka, Ks, Ka/Ks and P-value.

Duplicate 1	Duplicate 2	Ka	Ks	Ka/Ks	P-Value
AT1G01580.1	AT1G01590.1	0.230604	1.32053	0.17463	2.55813e-42
AT1G01660.1	AT1G01670.1	0.518339	2.09868	0.246984	8.39747e-21
AT1G01670.1	AT1G01680.1	0.365625	1.26443	0.289163	7.50956e-11
AT1G02190.1	AT1G02205.3	0.310436	3.41288	0.0909601	1.52263e-75

4 - Ath.td.kaks.KKC.logfile

The logfile generated by KaKs_Calculator software:

Method(s): GMYN 
Genetic code: 1-Standard Code
Please wait while reading sequences and calculating...
1 AT1G01580.1-AT1G01590.1	[OK]
2 AT1G01660.1-AT1G01670.1	[OK]
3 AT1G01670.1-AT1G01680.1	[OK]
4 AT1G02190.1-AT1G02205.3	[OK]
5 AT1G02220.1-AT1G02230.1	[OK]
6 AT1G02230.1-AT1G02250.1	[OK]
7 AT1G02300.1-AT1G02305.1	[OK]
8 AT1G02430.1-AT1G02440.1	[OK]
9 AT1G02470.2-AT1G02475.1	[OK]
10 AT1G02520.1-AT1G02530.1	[OK]
...
Mission accomplished. ((Time elapsed: 1:41)

Citation

Qiao X, Li Q, Yin H, Qi K, Li L, Wang R, Zhang S, Paterson AH: Gene duplication and evolution in recurring polyploidization–diploidization cycles in plants. Genome Biology 2019, 20:38.