chapter-09-working-with-range-data

Ranges Chapter Supplementary Material

Why do we use ranges? The terrific Science Web summarizes: Most bioinformaticians to be replaced by BEDTools

A note on the Introduction

While the simple linear sequence representation of genomes begins to break down when we consider structural variants like insertions, deletions, translocations, copy number variants, etc. -- everything is better thought of as a graph. There's exciting new work in representing and working with sequences as graphs, e.g. the preprints Dilthey et al., 2014 and Paten et al., 2014.

S4Vectors

As of writing this chapter, parts of the IRanges package are being split off into the new S4Vectors package. I only discuss these lower-level topics briefly, but if you want more information see the IRanges vignette and the S4Vectors page. I don't link these directly in the book because these links and vignette content may change.

Files

Mus_musculus.GRCm38.75_chr1.gtf.gf are chromosome 1 annotations extracted from Mus_musculus.GRCm38.75.gtf.gz downloaded from Ensembl's FTP (ftp://ftp.ensembl.org/pub/release-75/gtf/mus_musculus) on 2014-08-02.
```
  gzcat Mus_musculus.GRCm38.75.gtf.gz | egrep "^(1\t|#)" | gzip > Mus_musculus.GRCm38.75_chr1.gtf.gf
```
mm10_snp137_chr1_trunc.bed.gz is a randomly sampled, shorter version of mm10_snp137_chr1.bed.gz (since Github doesn't play well with large files). mm10_snp137_chr1.bed.gz was downloaded from the UCSC Genome Browser's Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables). This was file was downloaded on 2014-08-01. The truncated file was created with (you need GNU sort for this):
```
  $ gzcat mm10_snp137_chr1.bed.gz | sort --random-sort | head -n 2700000 | gzip > mm10_snp137_chr1_trunc.bed.gz
```

Mus_musculus.GRCm38_genome.txt is a tab-delimited file of all chromosome lengths from the mm10/GRCm38 genome version. It was created with:

  curl ftp://ftp.ensembl.org/pub/release-75/fasta/mus_musculus/dna/Mus_musculus.GRCm38.75.dna_rm.toplevel.fa.gz \
  | bioawk -c fastx '{print $name"\t"length($seq)}' > Mus_musculus.GRCm38_genome.txt

On Python's 0-Indexing

I use Python and R's indexing to introduce 0- and 1-based range systems. Here's Python's creator Guido van Rossum on why Python uses 0-based indexing.

Why use `biocLite()`?

Why use biocLite() rather than install.packages()? See this description from Bioconductor.

Name	Name	Last commit message	Last commit date
parent directory ..
Mus_musculus.GRCm38.75.dna_rm.toplevel_chr1.fa.gz	Mus_musculus.GRCm38.75.dna_rm.toplevel_chr1.fa.gz	updated readme and materials	Jul 7, 2014
Mus_musculus.GRCm38.75_chr1.gtf.gz	Mus_musculus.GRCm38.75_chr1.gtf.gz	updated readme and materials	Jul 7, 2014
Mus_musculus.GRCm38_genome.txt	Mus_musculus.GRCm38_genome.txt	updated readme and materials	Jul 7, 2014
README.md	README.md	added joins to main directory of ch13	Mar 30, 2015
cov.txt	cov.txt	updated readme and materials	Jul 7, 2014
genome.txt	genome.txt	updated readme and materials	Jul 7, 2014
mm10_snp137_chr1_trunc.bed.gz	mm10_snp137_chr1_trunc.bed.gz	added necessary file	Feb 28, 2015
mm_GRCm38.75_protein_coding_genes.gtf	mm_GRCm38.75_protein_coding_genes.gtf	updated readme and materials	Jul 7, 2014
plot-ranges.R	plot-ranges.R	ranges chapter material	Jul 3, 2014
ranges-cov.bed	ranges-cov.bed	updated readme and materials	Jul 7, 2014
ranges-qry.bed	ranges-qry.bed	updated readme and materials	Jul 7, 2014
ranges-sbj.bed	ranges-sbj.bed	updated readme and materials	Jul 7, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

chapter-09-working-with-range-data

chapter-09-working-with-range-data

README.md

Ranges Chapter Supplementary Material

A note on the Introduction

S4Vectors

Files

On Python's 0-Indexing

Why use `biocLite()`?

Collapse file tree

Files

chapter-09-working-with-range-data

Directory actions

More options

Directory actions

More options

Latest commit

History

chapter-09-working-with-range-data

Folders and files

parent directory

README.md

Ranges Chapter Supplementary Material

A note on the Introduction

S4Vectors

Files

On Python's 0-Indexing

Why use biocLite()?

Why use `biocLite()`?