Skip to content

Variational autoencoder for metagenomic binning

License

Notifications You must be signed in to change notification settings

RasmussenLab/vamb

Folders and files

NameName
Last commit message
Last commit date
Apr 25, 2025
Apr 1, 2025
Apr 29, 2025
Mar 17, 2025
Mar 17, 2025
Apr 29, 2025
Jul 18, 2024
Nov 1, 2023
Apr 1, 2025
Jul 2, 2024
Dec 4, 2024
Jul 4, 2024
Feb 22, 2023
Jul 6, 2022
Apr 7, 2025
Mar 30, 2025
Mar 17, 2025

Repository files navigation

Vamb

Read the Doc

Read the documentation on how to use Vamb here: https://vamb.readthedocs.io/en/latest/

Vamb is a family of metagenomic binners which feeds kmer composition and abundance into a variational autoencoder and clusters the embedding to form bins. Its binners perform excellently with multiple samples, and pretty good on single-sample data.

Programs in Vamb

The Vamb package contains several programs, including three binners:

  • TaxVamb: A semi-supervised binner that uses taxonomy information from e.g. mmseqs taxonomy. TaxVamb produces the best results, but requires you have run a taxonomic annotation workflow. Link to article.
  • Vamb: The original binner based on variational autoencoders. This has been upgraded significantly since its original release. Vamb strikes a good balance between speed and accuracy. Link to article.
  • Avamb: An obsolete ensemble model based on Vamb and adversarial autoencoders. Avamb has an accuracy in between Vamb and TaxVamb, but is more computationally demanding than either. We don't recommend running Avamb: If you have the compute to run it, you should instead run TaxVamb See the Avamb README page for more information. Link to article.

And a taxonomy predictor:

  • Taxometer: This tool refines arbitrary taxonomy predictions (e.g. from mmseqs taxonomy) using kmer composition and co-abundance. Link to article

See also our tool BinBencher.jl for evaluating metagenomic bins when a ground truth is available, e.g. for simulated data or a mock microbiome.

Quickstart

For more details, and how to run on an example dataset see the documentation.

# Assemble your reads, one assembly per sample, e.g. with SPAdes
for sample in 1 2 3; do
    spades.py --meta ${sample}.{fw,rv}.fq.gz -t 24 -m 100gb -o asm_${sample};
done    

# Concatenate your assemblies, and rename the contigs to the naming scheme
# S{sample}C{original contig name}. This can be done with a script provided by Vamb
# in the vamb/src directory
python src/concatenate.py contigs.fna.gz asm_{1,2,3}/contigs.fasta

# Estimate sample-wise abundance by mapping reads to the contigs.
# Any mapper will do, but we recommend strobealign with the --aemb flag
mkdir aemb
for sample in 1 2 3; do
    strobealign -t 8 --aemb contigs.fna.gz ${sample}.{fw,rv}.fq.gz > aemb/${sample}.tsv;
done

# Create an abundance TSV file from --aemb outputs using the script in vamb/src dir
python src/merge_aemb.py aemb abundance.tsv

# Run Vamb using the contigs and the directory with abundance files
vamb bin default --outdir vambout --fasta contigs.fna.gz --abundance_tsv abundance.tsv