Skip to content

preprocessing_RNAseq.smk

Snakemake workflow for preprocessing paired-end and single-end bulk RNA-seq data. The layout parameter in the config file controls the mode.

Note

Please make sure that you have Singularity and Snakemake installed on your system and cloned the SnakeNgs repository.

Workflow

Paired-end (layout: "paired")

preprocessing_RNAseq.smk rulegraph

The rulegraph was created by snakevision.

  1. Quality control using fastp with the default parameters.
  2. Alignment using STAR with the parameter --outFilterMultimapNmax 1.
  3. Convert the SAM file to BAM file and sort using samtools.
  4. Collect metrics using Picard CollectRnaSeqMetrics and CollectInsertSizeMetrics.
  5. Make bigWig files using deepTools bamCoverage with the parameter --binSize 1.
  6. Make summary statistics using MultiQC.

Single-end (layout: "single")

preprocessing_RNAseq_single.smk rulegraph

The rulegraph was created by snakevision.

  1. Quality control using fastp with the default parameters.
  2. Alignment using STAR with the parameter --outFilterMultimapNmax 1.
  3. Convert the SAM file to BAM file and sort using samtools.
  4. Collect metrics using Picard CollectRnaSeqMetrics.
  5. Make bigWig files using deepTools bamCoverage.
  6. Make summary statistics using MultiQC.

Usage

1
2
3
4
5
snakemake -s /path/to/SnakeNgs/snakefile/preprocessing_RNAseq.smk \
--configfile /path/to/config.yaml \
--cores <int> \
--use-singularity \
--rerun-incomplete

config.yaml should contain the following information:

Paired-end

1
2
3
4
5
workdir: path/to/output
samples: ["SRRXXXXXX", "SRRYYYYYY", "SRRZZZZZZ"]
star_index: path/to/star_index
gtf: path/to/reference_transcriptome.gtf
layout: "paired"
  • path/to/output should contain fastq directory with the following structure:
1
2
3
4
5
6
7
8
output/
└── fastq
    ├── SRRXXXXXX_1.fastq.gz
    ├── SRRXXXXXX_2.fastq.gz
    ├── SRRYYYYYY_1.fastq.gz
    ├── SRRYYYYYY_2.fastq.gz
    ├── SRRZZZZZZ_1.fastq.gz
    └── SRRZZZZZZ_2.fastq.gz

Single-end

1
2
3
4
5
workdir: path/to/output
samples: ["SRRXXXXXX", "SRRYYYYYY", "SRRZZZZZZ"]
star_index: path/to/star_index
gtf: path/to/reference_transcriptome.gtf
layout: "single"
  • path/to/output should contain fastq directory with the following structure:
1
2
3
4
5
output/
└── fastq
    ├── SRRXXXXXX.fastq.gz
    ├── SRRYYYYYY.fastq.gz
    └── SRRZZZZZZ.fastq.gz

Common settings

  • /path/to/reference_transcriptome.gtf is the reference transcriptome in GTF format (e.g. Homo_sapiens.GRCh38.106.gtf for human transcriptome).

Please refer to the tutorial for more information.

Docker image used in the workflow