Skip to content

Files

Latest commit

author
Swindler
Oct 27, 2023
3f0ba04 · Oct 27, 2023

History

History
194 lines (104 loc) · 5.4 KB

other_tools.md

File metadata and controls

194 lines (104 loc) · 5.4 KB

Extending AutoHiC to other assembly software

Currently AutoHiC has 3d-dna built into the process. If you are using YaHS, SALSA, Pin_hic etc, you can extend it by following the steps below. However, the process is currently being tested and there may be some problems. I hope if you have any problems during the process, you can open a issue or contact me: jzjlab@163.com

Install

Since some other dependencies are needed during use, we recommend using conda to prepare the environment.

conda

conda create -n morehic -c bioconda python=3.6 matlock samtools -y

conda activate morehic
  • Download the conversion script
git clone git@github.com:phasegenomics/juicebox_scripts.git

soft download

If you cannot clone it, you can get it from the link below. In the Folder other_tools, the filename is juicebox_scripts-master.zip.

Google Drive (recommend) Baidu Netdisk(百度网盘) Quark (夸克)
Pre-trained model Pre-trained model Pre-trained model

Usage

Since AutoHiC requires .hic and .assembly files, we have to generate them first. This process requires the use of genome files and bam files. These two files come from the custom assembly software you use.

fasta2assembly

First, generate an X file based on the genome file.

# fasta 2 apg
python3 juicebox_scripts/juicebox_scripts/makeAgpFromFasta.py test.fasta out.agp

# apg 2 asembly
python3 juicebox_scripts/juicebox_scripts/agp2assembly.py out.agp out.assembly

The path of juicebox_scripts must be replaced according to the actual situation.

bam2hic

Use the bam file to generate the corresponding .hic file. This step requires the use of 3d-dna, which can be obtained from the link above : soft download

  • If you have multiple bam files, you can use the following command to merge them together
# merge bam 
samtools merge merged.bam input1.bam input2.bam input3.bam
  • get .hic file
# this step sometimes crashes on memory
matlock bam2 juicer out.bam out.links.txt  

sort -k2,2 -k6,6 out.links.txt > out.sorted.links.txt

# creates .hic file
bash 3d-dna/visualize/run-assembly-visualizer.sh out.assembly out.sorted.links.txt 
# The path of 3d-dna must be replaced according to the actual situation.

The above steps make certain assumptions about the contents of the bam file. If an error is reported during the generation of the out.links.txt file, you can use the following command

# this BAM file should represent Hi-C reads mapped against starting contigs!
samtools view -h in.bam |sed '/^[^@]/s/^\(.*\)\/[12]\t/\1\t/'|samtools view -Sb -o out.bam

samtools sort -@ 40 -n out.bam -o out.sorted.bam

If you encounter the following error, it means that your bam file does not match the newly assembled genome. You need to re-align to new genome and use the updated bam file.

temp.scaffolds_FINAL.asm_mnd.txt does not exist or does not contain any reads.

onehic

Since the current environment used by AutoHiC is incompatible, you have to create a new environment according to the AutoHiC documentation.

# clone AutoHiC
git clone https://github.com/Jwindler/AutoHiC.git

# cd AutoHiC
cd AutoHiC

# create AutoHiC env
conda env create -f autohic.yaml

# activate AutoHiC
conda activate autohic

# configuration environment
cd ./src/models/swin

# install dependencies
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/

Now you can use onehic.py to adjust the genome based on the acquired out.assembly and out.hic files.

# Enter the AutoHiC directory.
cd /home/ubuntu/AutoHic  

# run onehic
python3.9 onehic.py -hic out.hic -asy out.assembly -autohic /home/ubuntu/AutoHic -p pretrained.pth -out ./

get new fasta

# activate env
conda activate morehic

# get new fasta
python juicebox_assembly_converter.py -a adjusted.assembly -f genome.fasta

Notes

Since this process is currently in testing, if you have any questions, please feel free to contact me (jzjlab@163.com) and I will be happy to help.

Citations

If you used AutoHiC in your research, please cite us:

AutoHiC: a deep-learning method for automatic and accurate chromosome-level genome assembly
Zijie Jiang, Zhixiang Peng, Yongjiang Luo, Lingzi Bie, Yi Wang

bioRxiv 2023.08.27.555031; doi: https://doi.org/10.1101/2023.08.27.555031