Skip to content
Brian Haas edited this page Mar 29, 2023 · 11 revisions

Running PASA via Docker

If you have Docker installed, you can pull our image from DockerHub, which contains PASA and required software components.

Pull the latest Docker image for PASA like so:

% docker pull pasapipeline/pasapipeline

Given a target genome, transcripts, and PASA run-configuration file, you can run PASA from within Docker like so:

# here, $base_dir corresponds to your working directory that contains your input data.
#   Replace $base_dir with your actual directory name (don't use it as a variable)

%  docker run --rm -it -v /tmp:/tmp -v $base_dir:$base_dir \
     pasapipeline/pasapipeline:latest \
     bash -c 'cd /$base_dir && /usr/local/src/PASApipeline/Launch_PASA_pipeline.pl \
                 -c alignAssembly.conf -C -R --ALIGNER gmap -g genome.fa -t transcripts.cdna.fasta '

and just to give you a concrete example of how I do this in my own environment (with paths specified according to my project structure), my own docker command for running PASA on the provided sample data is:

%   docker run --rm -it \
      -v /tmp:/tmp \
      -v /home/bhaas/GITHUB/pasapipeline/sample_data:/home/bhaas/GITHUB/pasapipeline/sample_data  \
       pasapipeline/pasapipeline:latest \
        bash -c 'cd /home/bhaas/GITHUB/pasapipeline/sample_data \
              && /usr/local/src/PASApipeline/Launch_PASA_pipeline.pl \
              -c sqlite.confs/alignAssembly.config -C -R \
              --ALIGNER gmap -g genome_sample.fasta -t all_transcripts.fasta.clean'

and the provided sqlite.confs/alignAssembly.config is set up to use a SQLite database at /tmp/sample_mydb_pasa.sqlite

Example with test data

If you are going to try out the docker run command as @brianjohnhaas suggests, do so this way

  1. Let's say you are in a directory called /home/github_pasa
  2. git clone https://github.com/PASApipeline/PASApipeline.git
  3. you will now see PASApipeline/ under /home/github_pasa/
  4. Before the docker run .... do as follows
mkdir -p /home/work/temp
cd /home/work/
cp -r /home/github_pasa/PASApipeline/sample_data .
gunzip /home/work/sample_data/genome_sample.fasta.gz
  1. Please do note that running the docker run ... command will add new files/folders to the sample_data directory at /home/work/
  2. Run the docker run ... command from /home/work where <docker_image:tag> could either be a docker pull pasapipeline/pasapipeline:latest or a custom docker built using Dockerfile at the https://github.com/PASApipeline/PASApipeline/tree/master/Docker

PASA/Docker Execution Modes (SQLite or MySQL)

Docker using SQLite

  • if you want to run PASA (align step) with SQLITE then do this
docker run --rm -it \
      -v $PWD/temp:/tmp \
      -v $PWD/sample_data:/home/bhaas/GITHUB/pasapipeline/sample_data  \
       pasapipeline/pasapipeline:latest \
        bash -c '  \
        cd /home/bhaas/GITHUB/pasapipeline/sample_data \
              && /usr/local/src/PASApipeline/Launch_PASA_pipeline.pl \
              -c mysql.confs/alignAssembly.config -C -R \
              --ALIGNER gmap -g genome_sample.fasta -t all_transcripts.fasta.clean'

MySQL internally within Docker

  • if you want to run PASA (align step) with MySQL internally within docker then do this
docker run --rm -it \
      -v $PWD/temp:/tmp \
      -v $PWD/sample_data:/home/bhaas/GITHUB/pasapipeline/sample_data  \
       pasapipeline/pasapipeline:latest \
        bash -c 'service mysql start && \ 
        cd /home/bhaas/GITHUB/pasapipeline/sample_data \
              && /usr/local/src/PASApipeline/Launch_PASA_pipeline.pl \
              -c mysql.confs/alignAssembly.config -C -R \
              --ALIGNER gmap -g genome_sample.fasta -t all_transcripts.fasta.clean'

Typically, you would need the same database for your downstream annotation step - hence if you are using MySQL within docker, you are better off using a workflow manager like Nextflow - https://www.nextflow.io/. You would need to do mysqldump sample_mydb_pasa > sample_mydb_pasa.sql after the align step and then mysql sample_mydb_pasa < sample_mydb_pasa.sql. Alternatively, use a local installation of mysql and connect to it from within the docker container (see below). If you are wondering where the name sample_mydb_pasa is coming from, it comes from the DATABASE field in alignAssembly.config https://github.com/PASApipeline/PASApipeline/blob/master/sample_data/mysql.confs/alignAssembly.config#L6

Local MySQL outside Docker Container

  • If you have mysql running on your server and want to connect to it from the docker image, one way to do it is like so:

docker run --rm -it \
      -v $PWD/temp:/tmp \
      -v /var/run/mysqld/mysqld.sock:/var/run/mysqld/mysqld.sock \
      -v $PWD/sample_data:/home/bhaas/GITHUB/pasapipeline/sample_data  \
       pasapipeline/pasapipeline:latest \
        bash -c 'cd /home/bhaas/GITHUB/pasapipeline/sample_data \
              && /usr/local/src/PASApipeline/Launch_PASA_pipeline.pl \
              -c mysql.confs/alignAssembly.config -C -R \
              --ALIGNER gmap -g genome_sample.fasta -t all_transcripts.fasta.clean'

If you require a custom pasa_conf/conf.txt to connect to an external mysql server, you can set the path to this custom conf.txt file to env var PASACONF and PASA will use it instead of the default (which assumes localhost).

Running PASA via Singularity

A Singularity image for PASA is available at https://data.broadinstitute.org/Trinity/CTAT_SINGULARITY/MISC/PASApipeline/.

Running the singularity image is much like running the docker image above. Software locations within the image are identical, as the singularity image is built directly from the docker one.

The syntax for executing PASA via the singularity image is like so:


singularity exec -B $PWD pasapipeline.simg  /usr/local/src/PASApipeline/Launch_PASA_pipeline.pl  ...remaining options as above ...

Clone this wiki locally