LeafCutter.smk¶
Snakemake workflow for differential RNA splicing analysis using LeafCutter.
Note
Please make sure that you have Singularity and Snakemake installed on your system and cloned the SnakeNgs repository.
Workflow¶
The rulegraph was created by snakevision.
- Extract junction reads using regtools
junctions extract
. - Cluster introns using LeafCutter
leafcutter_cluster_regtools.py
. - Extract exon information using LeafCutter
gtf_to_exons.R
. - Differential splicing analysis using LeafCutter
leafcutter_ds.R
. - Plot splice junctions using LeafCutter
ds_plots.R
. - Make annotation codes using LeafCutter
gtf2leafcutter.pl
. - Prepare results for visualization in LeafViz using LeafCutter
prepare_results.R
. - Classify clusters using LeafCutter
classify_clusters.R
.
Usage¶
1 2 3 4 5 |
|
config.yaml
should contain the following information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
/path/to/output
is the directory where the output files will be saved./path/to/experiment_table.tsv
is a tab-separated file, which is same as the one used in Shiba.
1 2 3 4 5 6 7 |
|
The group
column must be specified as Ref
or Alt
. This workflow will perform the differential splicing analysis between the two groups.
-
/path/to/reference_transcriptome.gtf
is the reference transcriptome in GTF format (e.g.Homo_sapiens.GRCh38.106.gtf
for human transcriptome). -
minimum_anchor_length
is the minimum anchor length for the junction reads. minimum_intron_length
is the minimum intron length for the junction reads.maximum_intron_length
is the maximum intron length for the junction reads.strand
is the strand information in the BAM file, whereXS
is used for unstranded data. Please refer to the regtools documentation for more information.minimum_reads
is the minimum number of reads required to cluster the introns.min_coverage
is the minimum coverage required for the intron to be considered.min_samples_per_intron
is the minimum number of samples required for the intron to be considered.min_samples_per_group
is the minimum number of samples required for the group to be considered.FDR
is the false discovery rate for the differential splicing analysis.