Skip to content

03.LASTAL Alignment Pipeline

古郡 まさき edited this page May 30, 2024 · 2 revisions

Whole Genome analysis

The pipeline for genome ananlysis

01.Alignment

Lastal was used to align. Install:

    conda install -c bioconda last

Asuming you have 3 genomes, you should choose a ref FASTA Format genome and 2 query genome:

    lastdb -P8 -uMAM8 myDB $ref
    
    ##first query
    
    last-train -P40 --revsym -E0.05 -C2 myDB $query1 >$sp1.train
    lastal -k32 -p $sp1.train myDB $query1 | last-split -fMAF+ >$sp1.maf
    maf-swap $sp1.maf |last-split |maf-swap |maf-sort > $sp1.filter.maf
    
    ##second query
    
    last-train -P40 --revsym -E0.05 -C2 myDB $query2 >$sp2.train
    lastal -k32 -p $sp2.train myDB $query2 | last-split -fMAF+ >$sp2.maf
    maf-swap $sp2.maf |last-split |maf-swap |maf-sort > $sp2.filter.maf

So you can get 2 MAF FORMAT files.

02.Rename

    perl maf.rename.species.S.pl $sp1.filter.maf $sp1 $sp1.last.maf >$sp1.rename.maf
    perl maf.rename.species.S.pl $sp2.filter.maf $sp2 $sp2.last.maf >$sp2.rename.maf

03.Multiz

    ## multiz合并maf(>3个query according to phylogenetic tree for merging)
    multiz M=1 $sp1.rename.maf $sp2.rename.maf 0 U1 U2 > $sp1-$sp2.ma        

04.MAF2LST

    perl 01.convertMaf2List.pl $sp1-$sp2.maf $sp1-$sp2 $sp_ref $sp1,$sp2

05.Catch gene regions

    perl 02.lst2gene.pl $sp1-$sp2 $gff

06.Uncode Region Filter

The uncode region filter script filter.pl

Usage

    perl filter.pl [fasta] [filter_file]

07.contruct tree + root tree + label tree

    for i in genes/*.filter; do echo -e "iqtree -s $i" >> iqtree.sh; done
    for i in genes/*.treefile; do echo -e "nw_reroot  $i  imhausi > $i.nwk" >> reroot.sh; done
    for i in genes/*.nwk; do echo -e "hyphy label-tree.bf --tree $i --list chr.list --output $i.label" >> label.sh ;done

08.PSG(absrel)

    hyphy absrel --alignment genes/evm.model.Contig11.1.fa.filter \
    --tree genes/evm.model.Contig11.1.fa.filter.treefile.tree.nwk \
    --branches Foreground \
    -output genes/evvm.model.Contig11.1.fa.filter.result

09.PSG(busted)

    hyphy busted --alignment genes/evm.model.Contig11.1.fa.filter \
    --tree genes/evm.model.Contig11.1.fa.filter.treefile.tree.nwk \
    --branches Foreground \
    -output genes/evvm.model.Contig11.1.fa.filter.result

10.RELAX

    hyphy relax --alignment  genes/evm.model.Contig11.1.fa.filter \
    --test Foreground \
    --tree genes/evm.model.Contig11.1.fa.filter.treefile.tree.nwk \
    --output genes/evm.model.Contig11.1.fa.filter.relax

Scripts

Psg_count.pl realx_count.pl are used for extract messages from hyphy output file.

hyphy: https://github.com/veg/hyphy

Clone this wiki locally