preprocessing_CLIPseq.smk¶
Snakemake workflow for preprocessing CLIP-seq data (HITS-CLIP, iCLIP-seq, etc.).
Note
Please make sure that you have Singularity and Snakemake installed on your system and cloned the SnakeNgs repository.
Workflow¶
- Quality control using fastp with the parameters specified in the
config.yaml(fastp_args). - Alignment using STAR with the parameter
--outFilterMultimapNmaxspecified in theconfig.yaml. - Convert the SAM file to BAM file and sort using samtools.
- Remove duplicates using Picard
MarkDuplicateswith the parameter--REMOVE_DUPLICATES true. - Make bigWig files using deepTools
bamCoveragewith the parameter--binSize 1. Optionally, normalize using CPM by settingnormalize_bigwig: truein theconfig.yaml. - Make summary statistics using MultiQC.
Usage¶
1 2 3 4 5 | |
config.yaml should contain the following information:
1 2 3 4 5 6 | |
Config parameters¶
| Parameter | Description | Example |
|---|---|---|
workdir |
Path to the output directory | /path/to/output |
samples |
List of sample names | ["SRRXXXXXX", "SRRYYYYYY"] |
star_index |
Path to the STAR index directory | /path/to/star_index |
fastp_args |
Additional arguments for fastp | "-l 20 -3 --trim_front1 5" (HITS-CLIP) or "--trim_front1 9 --max_len1 35 --length_required 25" (iCLIP-seq) |
outFilterMultimapNmax |
Maximum number of loci the read is allowed to map to | 100 (HITS-CLIP) or 1 (iCLIP-seq) |
normalize_bigwig |
Whether to normalize bigWig files using CPM | false (HITS-CLIP) or true (iCLIP-seq) |
path/to/outputshould containfastqdirectory with the following structure:
1 2 3 4 5 | |
path/to/star_indexis the directory containing the STAR index.