Currently AutoHiC
has 3d-dna
built into the process. If you are using YaHS
, SALSA
, Pin_hic
etc, you can extend it by following the steps below. However, the process is currently being tested and there may be some problems. I hope if you have any problems during the process, you can open a issue or contact me: jzjlab@163.com
Since some other dependencies are needed during use, we recommend using conda to prepare the environment.
conda create -n morehic -c bioconda python=3.6 matlock samtools -y
conda activate morehic
- Download the conversion script
git clone git@github.com:phasegenomics/juicebox_scripts.git
If you cannot clone it, you can get it from the link below. In the Folder
other_tools
, the filename isjuicebox_scripts-master.zip
.
Google Drive (recommend) | Baidu Netdisk(百度网盘) | Quark (夸克) |
---|---|---|
Pre-trained model | Pre-trained model | Pre-trained model |
Since AutoHiC requires .hic
and .assembly
files, we have to generate them first. This process requires the use of genome files
and bam files
. These two files come from the custom assembly software you use.
First, generate an X file based on the genome file.
# fasta 2 apg
python3 juicebox_scripts/juicebox_scripts/makeAgpFromFasta.py test.fasta out.agp
# apg 2 asembly
python3 juicebox_scripts/juicebox_scripts/agp2assembly.py out.agp out.assembly
The path of
juicebox_scripts
must be replaced according to the actual situation.
Use the bam
file to generate the corresponding .hic
file. This step requires the use of 3d-dna
, which can be obtained from the link above : soft download
- If you have multiple
bam
files, you can use the following command to merge them together
# merge bam
samtools merge merged.bam input1.bam input2.bam input3.bam
- get
.hic
file
# this step sometimes crashes on memory
matlock bam2 juicer out.bam out.links.txt
sort -k2,2 -k6,6 out.links.txt > out.sorted.links.txt
# creates .hic file
bash 3d-dna/visualize/run-assembly-visualizer.sh out.assembly out.sorted.links.txt
# The path of 3d-dna must be replaced according to the actual situation.
The above steps make certain assumptions about the contents of the
bam
file. If an error is reported during the generation of theout.links.txt
file, you can use the following command
# this BAM file should represent Hi-C reads mapped against starting contigs!
samtools view -h in.bam |sed '/^[^@]/s/^\(.*\)\/[12]\t/\1\t/'|samtools view -Sb -o out.bam
samtools sort -@ 40 -n out.bam -o out.sorted.bam
If you encounter the following error, it means that your bam file does not match the newly assembled genome. You need to re-align to new genome and use the updated bam file.
temp.scaffolds_FINAL.asm_mnd.txt does not exist or does not contain any reads.
Since the current environment used by AutoHiC is incompatible, you have to create a new environment according to the AutoHiC documentation.
# clone AutoHiC
git clone https://github.com/Jwindler/AutoHiC.git
# cd AutoHiC
cd AutoHiC
# create AutoHiC env
conda env create -f autohic.yaml
# activate AutoHiC
conda activate autohic
# configuration environment
cd ./src/models/swin
# install dependencies
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple/
Now you can use onehic.py
to adjust the genome based on the acquired out.assembly
and out.hic
files.
# Enter the AutoHiC directory.
cd /home/ubuntu/AutoHic
# run onehic
python3.9 onehic.py -hic out.hic -asy out.assembly -autohic /home/ubuntu/AutoHic -p pretrained.pth -out ./
# activate env
conda activate morehic
# get new fasta
python juicebox_assembly_converter.py -a adjusted.assembly -f genome.fasta
Since this process is currently in testing, if you have any questions, please feel free to contact me (jzjlab@163.com
) and I will be happy to help.
If you used AutoHiC in your research, please cite us:
AutoHiC: a deep-learning method for automatic and accurate chromosome-level genome assembly
Zijie Jiang, Zhixiang Peng, Yongjiang Luo, Lingzi Bie, Yi Wang
bioRxiv 2023.08.27.555031; doi: https://doi.org/10.1101/2023.08.27.555031