Skip to content

daler/pybedtools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

beedab9 · Mar 16, 2025
Mar 11, 2025
Jan 26, 2016
Mar 16, 2025
Mar 14, 2025
Nov 24, 2018
Nov 21, 2018
Jan 24, 2022
Nov 21, 2018
Nov 5, 2024
Sep 3, 2015
Nov 20, 2018
Jun 16, 2015
Jun 1, 2020
Mar 14, 2025
Aug 25, 2023
Mar 7, 2025
Mar 14, 2025
Dec 26, 2019

Repository files navigation

Overview

https://badge.fury.io/py/pybedtools.svg?style=flat

The BEDTools suite of programs is widely used for genomic interval manipulation or "genome algebra". pybedtools wraps and extends BEDTools and offers feature-level manipulations from within Python.

See full online documentation, including installation instructions, at https://daler.github.io/pybedtools/.

The GitHub repo is at https://github.com/daler/pybedtools.

Why pybedtools?

Here is an example to get the names of genes that are <5 kb away from intergenic SNPs:

from pybedtools import BedTool

snps = BedTool('snps.bed.gz')  # [1]
genes = BedTool('hg19.gff')    # [1]

intergenic_snps = snps.subtract(genes)                       # [2]
nearby = genes.closest(intergenic_snps, d=True, stream=True) # [2, 3]

for gene in nearby:             # [4]
    if int(gene[-1]) < 5000:    # [4]
        print gene.name         # [4]

Useful features shown here include:

  • [1] support for all BEDTools-supported formats (here gzipped BED and GFF)
  • [2] wrapping of all BEDTools programs and arguments (here, subtract and closest and passing the -d flag to closest);
  • [3] streaming results (like Unix pipes, here specified by stream=True)
  • [4] iterating over results while accessing feature data by index or by attribute access (here [-1] and .name).

In contrast, here is the same analysis using shell scripting. Note that this requires knowledge in Perl, bash, and awk. The run time is identical to the pybedtools version above:

snps=snps.bed.gz
genes=hg19.gff
intergenic_snps=/tmp/intergenic_snps

snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`
gene_fields=9
distance_field=$(($gene_fields + $snp_fields + 1))

intersectBed -a $snps -b $genes -v > $intergenic_snps

closestBed -a $genes -b $intergenic_snps -d \
| awk '($'$distance_field' < 5000){print $9;}' \
| perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'

rm $intergenic_snps

See the Shell script comparison in the docs for more details on this comparison, or keep reading the full documentation at http://daler.github.io/pybedtools.

About

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")

Resources

License

Stars

Watchers

Forks

Packages

No packages published