Skip to content

footprinting_ATACseq.smk

Snakemake workflow for footprinting analysis of ATAC-seq data using TOBIAS.

Note

Please make sure that you have Singularity and Snakemake installed on your system and cloned the SnakeNgs repository.

Workflow

footprinting_ATACseq.smk rulegraph

The rulegraph was created by snakevision.

  1. Merge peak resions from all samples using bedtools merge.
  2. Merge bam files from all samples using samtools merge.
  3. Bias correction of ATAC-seq reads in open chromatin using TOBIAS ATACorrect.
  4. Calculate footprint scores from corrected cutsites using TOBIAS FootprintScores.
  5. Estimation of differentially bound motifs based on scores, sequence and motifs using TOBIAS BINDetect.

Usage

1
2
3
4
5
snakemake -s /path/to/SnakeNgs/snakefile/footprinting_ATACseq.smk \
--configfile /path/to/config.yaml \
--cores <int> \
--use-singularity \
--rerun-incomplete

config.yaml should contain the following information:

1
2
3
4
5
6
7
# general
workdir: /path/to/output
experiment_table: /path/to/experiment_table.tsv

# TOBIAS
genome_fasta: /path/to/genome.fa
cluster_motifs: /path/to/motifs
  • path/to/output is the directory where the output files will be saved.
  • path/to/experiment_table.tsv is a tab-separated file containing the following information:
1
2
3
4
5
6
7
sample  bam peak    group
SRRXXXXXX   /path/to/SRRXXXXXX.sort.rmdup.bam   /path/to/SRRXXXXXX_peaks.narrowPeak Ref
SRRYYYYYY   /path/to/SRRYYYYYY.sort.rmdup.bam   /path/to/SRRYYYYYY_peaks.narrowPeak Ref
SRRZZZZZZ   /path/to/SRRZZZZZZ.sort.rmdup.bam   /path/to/SRRZZZZZZ_peaks.narrowPeak Ref
SRRAAAAAA   /path/to/SRRAAAAAA.sort.rmdup.bam   /path/to/SRRAAAAAA_peaks.narrowPeak Alt
SRRBBBBBB   /path/to/SRRBBBBBB.sort.rmdup.bam   /path/to/SRRBBBBBB_peaks.narrowPeak Alt
SRRCCCCCC   /path/to/SRRCCCCCC.sort.rmdup.bam   /path/to/SRRCCCCCC_peaks.narrowPeak Alt

It is expected that the input BAM files are generated by the preprocessing_ChIPseq.smk workflow and the peak files are generated by the callpeak_ATACseq.smk workflow.

  • genome.fa is the genome fasta file (e.g. hg38.fa).
  • motifs is File containing motifs in either PFM, JASPAR or MEME format. These are the motifs which will be used to scan for binding sites.

Docker image used in the workflow