Skip to main content

Samtools

note

Toolkit for interacting with BAM/CRAM files

http://www.htslib.org

Supported commands:

  • stats
  • flagstats
  • idxstats
  • rmdup
  • coverage
  • markdup

idxstats

The samtools idxstats prints its results to standard out (no consistent file name) and has no header lines (no way to recognise from content of file). As such, idxstats result files must have the string idxstat somewhere in the filename.

There are a few MultiQC config options that you can add to customise how the idxstats module works. A typical configuration could look as follows:

# Always include these chromosomes in the plot
samtools_idxstats_always:
- X
- Y

# Never include these chromosomes in the plot
samtools_idxstats_ignore:
- MT

# Threshold where chromosomes are ignored in the plot.
# Should be a fraction, default is 0.001 (0.1% of total)
samtools_idxstats_fraction_cutoff: 0.001

# Name of the X and Y chromosomes.
# If not specified, MultiQC will search for any chromosome
# names that look like x, y, chrx or chry (case-insensitive search)
samtools_idxstats_xchr: myXchr
samtools_idxstats_ychr: myYchr

coverage

You can include and exclude contigs based on name or pattern.

For example, you could add the following to your MultiQC config file:

samtools_coverage:
include_contigs:
- "chr*"
exclude_contigs:
- "*_alt"
- "*_decoy"
- "*_random"
- "chrUn*"
- "HLA*"
- "chrM"
- "chrEBV"

Note that exclusion supersedes inclusion for the contig filters.

If you want to see what is being excluded, you can set show_excluded_debug_logs to True:

samtools_coverage:
show_excluded_debug_logs: True

General Statistics Columns

You can customize which metrics from samtools modules appear in the General Statistics table. For example, to show reads mapped percentage and error rate from stats module, and add reads mapped from flagstat module:

general_stats_columns:
samtools/stats:
columns:
reads_mapped_percent:
title: "% Mapped"
description: "% Mapped reads from samtools stats"
hidden: false
error_rate:
title: "Error rate"
description: "Error rate from samtools stats"
hidden: false
samtools/flagstat:
columns:
mapped_passed:
title: "Flagstat Mapped"
description: "Reads mapped from samtools flagstat"
hidden: false

Each samtools submodule has its own namespace in the configuration:

  • samtools/stats
  • samtools/flagstat
  • samtools/rmdup
  • samtools/markdup
  • samtools/coverage

File search patterns

samtools/coverage:
contents: "#rname\tstartpos\tendpos\tnumreads\tcovbases\tcoverage\tmeandepth\tmeanbaseq\t\
meanmapq"
num_lines: 10
samtools/flagstat:
contents: in total (QC-passed reads + QC-failed reads)
samtools/idxstats:
fn: "*idxstat*"
samtools/markdup_json:
contents:
- '"COMMAND":'
- samtools markdup
num_lines: 10
samtools/markdup_txt:
contents:
- "^COMMAND:"
- samtools markdup
num_lines: 2
samtools/rmdup:
contents: "[bam_rmdup"
samtools/stats:
contents: This file was produced by samtools stats