Skip to content
satijalab edited this page Aug 17, 2018 · 10 revisions

The Assay class stores single cell data.

For typical scRNA-seq experiments, a Seurat object will have a single Assay ("RNA"). This assay will also store multiple 'transformations' of the data, including raw counts (@counts slot), normalized data (@data slot), and scaled data for dimensional reduction (@scale.data slot).

For more complex experiments, an object could contain multiple assays. These could include multi-modal data types (CITE-seq antibody-derived tags, ADTs), or imputed/batch-corrected measurements. Each of those assays has the option to store the same data transformations as well.

Slots

Slot Function
counts Stores unnormalized data such as raw counts or TPMs
data Normalized data matrix
scale.data Scaled data matrix
key A character string to facilitate looking up features from a specific Assay
var.features A vector of features identified as variable
meta.features Feature-level meta data

Object Information

Summary information about Assay objects can be had quickly and easily using standard R functions. Object shape/dimensions can be found using the dim, ncol, and nrow functions; cell and feature names can be found using the colnames and rownames functions, respectively, or the dimnames function.

# The following examples use the RNA assay from the PBMC 3k dataset
> rna
Assay data with 13714 features for 2638 cells
Top 10 variable features:
 PPBP, DOK3, NFE2L2, ARVCF, YPEL2, UBE2D4, FAM210B, CTB-113I20.2, GBGT1,
 GMPPA
# nrow and ncol provide the number of features and cells, respectively
# dim provides both nrow and ncol at the same time
> dim(x = rna)
[1] 13714  2638
# In addtion to rownames and colnames, one can use dimnames
# which provides a two-length list with both rownames and colnames
> head(x = rownames(x = rna))
[1] "AL627309.1"    "AP006222.2"    "RP11-206L10.2" "RP11-206L10.9"
[5] "LINC00115"     "NOC2L"
> head(x = colnames(x = rna))
[1] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG"
[5] "AAACCGTGTATGCG" "AAACGCACTGGTAC"

Data Access

Accessing data from an Assay object is done in several ways. Expression data is accessed with the GetAssayData function. Pulling expression data from the data slot can also be done with the single [ extract operator. Adding expression data to either the counts, data, or scale.data slots can be done with SetAssayData. New data must have the same cells in the same order as the current expression data.

# Slicing data using the single [ extract operator can take
# numeric slices or vectors of row/column names
> rna[1:3, 1:3]
3 x 3 sparse Matrix of class "dgCMatrix"
              AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC
AL627309.1                 .              .              .
AP006222.2                 .              .              .
RP11-206L10.2              .              .              .
# GetAssayData allows pulling from a specific slot rather than just data
> GetAssayData(object = rna, slot = 'scale.data')[1:3, 1:3]
              AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC
AL627309.1       -0.06547546    -0.10052277    -0.05804007
AP006222.2       -0.02690776    -0.02820169    -0.04508318
RP11-206L10.2    -0.03596234    -0.17689415    -0.09997719
# SetAssayData example...

Feature-level meta data can be accessed with the double [[ extract operator. Adding feature-level meta data can be set using the double [[ extract operator as well. The HVFInfo function serves a specific version of the double [[ extract operator, pulling certain columns from the meta data.

# Feature-level meta data is stored as a data frame
# Standard data frame functions work on the meta data data frame
> colnames(x = rna[[]])
[1] "mean"              "dispersion"        "dispersion.scaled"
# HVFInfo pulls mean, dispersion, and dispersion scaled
# Useful for viewing the results of FindVariableFeatures
> head(x = HVFInfo(object = rna))
                     mean dispersion dispersion.scaled
AL627309.1    0.013555659   1.432845        -0.6236875
AP006222.2    0.004695980   1.458631        -0.5728009
RP11-206L10.2 0.005672517   1.325459        -0.8356099
RP11-206L10.9 0.002644177   0.859264        -1.7556304
LINC00115     0.027437275   1.457477        -0.5750770
NOC2L         0.376037723   1.876440        -0.4162432
# One can pull multiple values from the data frame at any time
> head(x = rna[[c('mean', 'dispersion')]])
                     mean dispersion
AL627309.1    0.013555659   1.432845
AP006222.2    0.004695980   1.458631
RP11-206L10.2 0.005672517   1.325459
RP11-206L10.9 0.002644177   0.859264
LINC00115     0.027437275   1.457477
NOC2L         0.376037723   1.876440
# Passing `drop = TRUE` will turn the meta data into a names vector
# with each entry being named for the cell it corresponds to
> head(x = rna[['mean', drop = TRUE]])
   AL627309.1    AP006222.2 RP11-206L10.2 RP11-206L10.9     LINC00115
  0.013555659   0.004695980   0.005672517   0.002644177   0.027437275
        NOC2L
  0.376037723
# Add meta data example

The vector of variable features can be pulled with the VariableFeatures function. VariableFeatures can also set the vector of variable features.

# VariableFeatures both accesses and sets the vector of variable features
> head(x = VariableFeatures(object = rna))
[1] "PPBP"   "DOK3"   "NFE2L2" "ARVCF"  "YPEL2"  "UBE2D4"
# Set variable features example

The key

# Key both accesses and sets the key slot for an Assay object
> Key(object = rna)
"rna_"
> Key(object = rna) <- 'myRNA_'
> Key(object = rna)
"myRNA_"
# Pull a feature from the RNA assay on the Seurat level
> head(x = FetchData(object = pbmc, vars.fetch = 'rna_MS4A1'))
               rna_MS4A1
AAACATACAACCAC  0.000000
AAACATTGAGCTAC  2.583047
AAACATTGATCAGC  0.000000
AAACCGTGCTTCCG  0.000000
AAACCGTGTATGCG  0.000000
AAACGCACTGGTAC  0.000000

Methods

Methods for the Assay class can be found with the following:

library(Seurat)
utils::methods(class = 'Assay')
  • [: access expression data from the data slot
  • [[: access feature-level metadata
  • [[<-: add feature-level metadata
  • colMeans: calculate means across columns (cells) of any expression matrix within the Assay
  • colSums: calculate sums across columns (cells) of any expression matrix within the Assay
  • dimnames: get a list with row (feature) and column (cell) names
  • dim: get the number of features (in data) and cells in the Assay
  • GetAssayData: pull one of the expression matrices within the Assay
  • HVFInfo:
  • Key: get the key assigned to the Assay
  • Key<-: ...
  • merge: ...
  • RenameCells: ...
  • rowMeans: calculate means across rows (features) of any expression matrix within the Assay
  • rowSums: calculate sums across rows (features) of any expression matrix within the Assay
  • SetAssayData: add data to or replace one of the expresion matrices within the Assay
  • SubsetData: ...
  • VariableFeatures: pull the names of features designated as variable
  • VariableFeatures<-: assign a vector of features that are considered variable
  • WhichCells: ...