UMI count from single-nucleus RNA-seq¶
This tutorial walks you through the single-nucleus RNA-seq (snRNA-seq) UMI counting workflow using the kb-nac.smk pipeline. You will download a public 10x Chromium snRNA-seq dataset, prepare reference files, run the pipeline, and inspect the outputs.
By the end of this tutorial, you will have:
- A kallisto index
- Filtered and unfiltered count matrices in h5ad format
- BUS file inspection reports
- A MultiQC summary report
Prerequisites
Before starting, make sure you have the following installed and configured:
- Singularity (≥ 3.7)
- Snakemake (≥ 7.0)
- SnakeNgs repository cloned locally
- ngsfetch for downloading FASTQ files
- ~20 GB of free disk space (for reference genome and output files)
1. Download example data¶
In this tutorial, we use two samples from a study on nonsense-mediated mRNA decay (NMD) in mouse cortical development (GSE295222, BioProject PRJNA1253721; Lin et al., Cell Rep 2026). This dataset profiled E17.5 mouse cortex nuclei using the 10x Genomics Chromium platform.
| Sample | Accessions (2 lanes) | Genotype | Instrument | Layout |
|---|---|---|---|---|
| GSM8943907 (Control) | SRR33238112, SRR33238113 | Upf2 fl/+ | NovaSeq X Plus | Paired-end |
| GSM8943908 (Upf2cKO) | SRR33238110, SRR33238111 | Upf2 fl/fl;Emx1-Cre | NovaSeq X Plus | Paired-end |
Each sample was sequenced across two lanes on an Illumina NovaSeq X Plus.
Download the FASTQ files using ngsfetch:
1 2 3 4 5 6 7 8 9 | |
Note
For 10x Chromium data, _1.fastq.gz typically contains the cell barcode + UMI (R1) and _2.fastq.gz contains the cDNA insert (R2). Verify that your files follow this convention.
2. Prepare the experiment table¶
Create an experiment_table.tsv file that maps sample names to their FASTQ file paths:
1 2 3 | |
Replace /path/to/ with the actual absolute paths on your system.
Note
If your samples were sequenced across multiple lanes, provide comma-separated paths for each lane:
1 | |
3. Prepare reference files¶
The pipeline requires a reference genome FASTA and a GTF annotation file. The kallisto index will be built automatically by the pipeline using kb ref.
Download the mouse reference files from Ensembl:
1 2 3 4 5 6 7 | |
4. Create the configuration file¶
Create a config.yaml file in the working directory:
1 2 3 4 5 | |
Replace /path/to/ with the actual absolute paths on your system.
The technology parameter specifies the single-cell assay. Common options include:
| Technology | Description |
|---|---|
10xv2 |
10x Chromium v2 |
10xv3 |
10x Chromium v3 |
BDWTA |
BD Rhapsody |
INDROPSV3 |
inDrops v3 |
Visium |
10x Visium spatial |
For the full list of supported technologies, see the kb-nac.smk usage documentation.
5. Run the pipeline¶
Execute the pipeline with Snakemake:
1 2 3 4 5 | |
Note
The first step (kb ref) builds the kallisto index, which can take 30–60 minutes and requires ~16 GB of RAM for the mouse genome. The index is built once and reused for all samples.
6. Inspect the outputs¶
Once the pipeline completes, the output directory will have the following structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
Key output files¶
kb_index/— Kallisto index files built bykb ref. These include the index, transcript-to-gene mapping, and cDNA/intron FASTA files.kb/*/counts_unfiltered/adata.h5ad— Unfiltered count matrix containing all barcodes, in AnnData h5ad format.kb/*/counts_filtered/adata.h5ad— Filtered count matrix containing only cell-associated barcodes (filtered by bustools).kb/*/inspect.json— BUS file inspection summary with statistics on the number of reads, barcodes, and UMIs.multiqc/multiqc_report.html— Aggregated QC summary report.
Loading the count matrix in Python¶
The h5ad output can be directly loaded with Scanpy:
1 2 3 4 | |
7. Alternative tools¶
SnakeNgs also provides alternative pipelines for single-cell/nucleus RNA-seq quantification:
- STARsolo — Gene count quantification using STAR's built-in single-cell mode. Produces Cell Ranger-compatible output.
- Cell Ranger — 10x Genomics' official pipeline for gene expression quantification.
8. Summary and next steps¶
In this tutorial, you ran the kb-nac.smk pipeline to build a kallisto index and quantify UMI counts from single-nucleus RNA-seq data.
For detailed parameter descriptions, see the usage documentation.
The filtered count matrices can be used for downstream analysis with tools such as: