Skip to content

MaoYafei/GenoDup-Pipeline

Repository files navigation

GenoDup Pipeline

This python script can be used to detect Whole-genome duplication (WGD) with the dS-based method.

Run the program

  1. Caculate dS values based on gene family data (Paralogs dS values)

python GenoDup.py -s Nuclear_sequence_file -p Protein_sequence_file -g Gene_family_file -n Maximum_number_of_gene_family

  1. Caculate dS values based on anchor gene pair data (Anchor dS values)

python GenoDup.py -s Nuclear_sequence_file -p Protein_sequence_file -c Gene_pair_file

Input

  1. Nuclear_sequence_file: it contains all the nuclear sequences in your analysis (fasta format).
eg:
    >gene1
      ATCG
    >gene2
      ATCC
    ...
  1. Protein_sequence_file: it contains all the protein sequences in your analysis (fasta format).
eg:
    >gene1
      PAPA
    >gene2
      PAPA
    ...
  1. Gene_family_file: it contains the gene family cluster (usually be produced by OrthoMCL).
eg:
    led1: gene1,gene2,gene3
    led2: gene3,gene4
    ...
  1. Gene_pair_file: it contains two Ohnologs in two colums separated by tab (could be produced by MCScanX or OrthoMCL or i-ADHoRe).
eg:
    gene1 gene2
    gene3 gene4
    ...
  1. Maximum_number_of_gene_family: Maximum number of gene family which you want to analyze, only use with -g (suggest: 5-15)

Output

  1. pairwise directory: including all gene pair sequences (fasta format).
  2. PAML_result: including all codeml output files.
  3. dS_value.txt: including all results of dS values generated by codeml.

Visualization

Rscript plot_Genodup.r

Citations

 1.Mao, Yafei. "GenoDup Pipeline: a tool to detect genome duplication using the dS-based method." PeerJ 7 (2019): e6303.
 2.Abascal, Federico, Rafael Zardoya, and Maximilian J. Telford. "TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations." Nucleic acids research 38.suppl_2 (2010): W7-W13.
 3.Katoh, Kazutaka, and Daron M. Standley. "MAFFT multiple sequence alignment software version 7: improvements in performance and usability." Molecular biology and evolution 30.4 (2013): 772-780.
 4.Yang, Ziheng. "PAML: a program package for phylogenetic analysis by maximum likelihood." Bioinformatics 13.5 (1997): 555-556.

Others

Please read below literature for basic knowledge of dS value calculation for WGD inference:
  1.Vanneste, Kevin, et al. "Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous–Paleogene boundary." Genome research 24.8 (2014): 1334-1347.
  2.Berthelot, Camille, et al. "The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates." Nature communications 5 (2014): 3657.
  3.Vanneste, Kevin, Yves Van de Peer, and Steven Maere. "Inference of genome duplications from age distributions revisited." Molecular biology and evolution 30.1 (2012): 177-190.

About

This python script can be used to detect Whole-genome duplication (WGD) with the dS based method.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published