Skip to content

preprocessing_RNAseq_single.smk

Snakemake workflow for preprocessing single-end bulk RNA-seq data.

Note

Please make sure that you have Singularity and Snakemake installed on your system and cloned the SnakeNgs repository.

Workflow

preprocessing_RNAseq_single.smk rulegraph

The rulegraph was created by snakevision.

  1. Quality control using fastp with the default parameters.
  2. Alignment using STAR with the parameter --outFilterMultimapNmax 1.
  3. Convert the SAM file to BAM file and sort using samtools.
  4. Collect metrics using Picard CollectRnaSeqMetrics.
  5. Make bigWig files using deepTools bamCoverage.
  6. Make summary statistics using MultiQC.

Usage

1
2
3
4
5
snakemake -s /path/to/SnakeNgs/snakefile/preprocessing_RNAseq_single.smk \
--configfile /path/to/config.yaml \
--cores <int> \
--use-singularity \
--rerun-incomplete

config.yaml should contain the following information:

1
2
3
4
workdir: path/to/output
samples: ["SRRXXXXXX", "SRRYYYYYY", "SRRZZZZZZ"]
star_index: path/to/star_index
gtf: path/to/reference_transcriptome.gtf
  • path/to/output should contain fastq directory with the following structure:
1
2
3
4
5
output/
└── fastq
    ├── SRRXXXXXX.fastq.gz
    ├── SRRYYYYYY.fastq.gz
    └── SRRZZZZZZ.fastq.gz
  • path/to/star_index is the directory containing the STAR index.

  • /path/to/reference_transcriptome.gtf is the reference transcriptome in GTF format (e.g. Homo_sapiens.GRCh38.106.gtf for human transcriptome).

Please refer to the tutorial for more information.

Docker image used in the workflow