Skip to content

This python script can be used to detect Whole-genome duplication (WGD) with the dS based method.

License

Notifications You must be signed in to change notification settings

MaoYafei/GenoDup-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Yafei
Feb 22, 2019
adbfcf4 · Feb 22, 2019

History

30 Commits
May 26, 2018
Nov 29, 2018
Nov 19, 2018
Feb 22, 2019
May 26, 2018
Nov 28, 2018
May 26, 2018
May 26, 2018

Repository files navigation

GenoDup Pipeline

This python script can be used to detect Whole-genome duplication (WGD) with the dS-based method.

Run the program

  1. Caculate dS values based on gene family data (Paralogs dS values)

python GenoDup.py -s Nuclear_sequence_file -p Protein_sequence_file -g Gene_family_file -n Maximum_number_of_gene_family

  1. Caculate dS values based on anchor gene pair data (Anchor dS values)

python GenoDup.py -s Nuclear_sequence_file -p Protein_sequence_file -c Gene_pair_file

Input

  1. Nuclear_sequence_file: it contains all the nuclear sequences in your analysis (fasta format).
eg:
    >gene1
      ATCG
    >gene2
      ATCC
    ...
  1. Protein_sequence_file: it contains all the protein sequences in your analysis (fasta format).
eg:
    >gene1
      PAPA
    >gene2
      PAPA
    ...
  1. Gene_family_file: it contains the gene family cluster (usually be produced by OrthoMCL).
eg:
    led1: gene1,gene2,gene3
    led2: gene3,gene4
    ...
  1. Gene_pair_file: it contains two Ohnologs in two colums separated by tab (could be produced by MCScanX or OrthoMCL or i-ADHoRe).
eg:
    gene1 gene2
    gene3 gene4
    ...
  1. Maximum_number_of_gene_family: Maximum number of gene family which you want to analyze, only use with -g (suggest: 5-15)

Output

  1. pairwise directory: including all gene pair sequences (fasta format).
  2. PAML_result: including all codeml output files.
  3. dS_value.txt: including all results of dS values generated by codeml.

Visualization

Rscript plot_Genodup.r

Citations

 1.Mao, Yafei. "GenoDup Pipeline: a tool to detect genome duplication using the dS-based method." PeerJ 7 (2019): e6303.
 2.Abascal, Federico, Rafael Zardoya, and Maximilian J. Telford. "TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations." Nucleic acids research 38.suppl_2 (2010): W7-W13.
 3.Katoh, Kazutaka, and Daron M. Standley. "MAFFT multiple sequence alignment software version 7: improvements in performance and usability." Molecular biology and evolution 30.4 (2013): 772-780.
 4.Yang, Ziheng. "PAML: a program package for phylogenetic analysis by maximum likelihood." Bioinformatics 13.5 (1997): 555-556.

Others

Please read below literature for basic knowledge of dS value calculation for WGD inference:
  1.Vanneste, Kevin, et al. "Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous–Paleogene boundary." Genome research 24.8 (2014): 1334-1347.
  2.Berthelot, Camille, et al. "The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates." Nature communications 5 (2014): 3657.
  3.Vanneste, Kevin, Yves Van de Peer, and Steven Maere. "Inference of genome duplications from age distributions revisited." Molecular biology and evolution 30.1 (2012): 177-190.

About

This python script can be used to detect Whole-genome duplication (WGD) with the dS based method.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published