Mosdepth
Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
Mosdepth can generate several output files all with a common prefix and different endings:
- per-base depth (
{prefix}.per-base.bed.gz), - mean per-window depth given a window size (
{prefix}.regions.bed.gz, if a BED file provided with--by), - mean per-region given a BED file of regions (
{prefix}.regions.bed.gz, if a window size provided with--by), - a distribution of proportion of bases covered at or above a given threshhold for each chromosome and genome-wide (
{prefix}.mosdepth.global.dist.txtand{prefix}.mosdepth.region.dist.txt), - quantized output that merges adjacent bases as long as they fall in the same coverage bins (
{prefix}.quantized.bed.gz), - threshold output to indicate how many bases in each region are covered at the given thresholds (
{prefix}.thresholds.bed.gz) - summary output providing region length, coverage mean, min, and max for each region. (
{prefix}.mosdepth.summary.txt)
The MultiQC module plots coverage distributions from 2 kinds of outputs:
{prefix}.mosdepth.region.dist.txt{prefix}.mosdepth.global.dist.txt
Using "region" if exists, otherwise "global". Plotting 3 figures:
- Proportion of bases in the reference genome with, at least, a given depth of coverage (cumulative coverage distribution).
- Proportion of bases in the reference genome with a given depth of coverage (absolute coverage distribution).
- Average coverage per contig/chromosome.
Also plotting the percentage of the genome covered at a threshold in the General Stats section. The default thresholds are 1, 5, 10, 30, 50, which can be customised in the config as follows:
mosdepth_config:
general_stats_coverage:
- 10
- 20
- 40
- 200
- 30000
Mosdepth does omit cumulative coverages at high coverage thresholds. To work around this, MultiQC will use the next cumulative coverage available. E.g. if 3301x coverage is present, but 3300x is missing, and 3300x is requested, the value of 3301x will be used. This should provide the technically correct value because it is a cumulative distribution and mosdepth only skips values if there are no bases at that coverage level.
For more details, see this comment.
You can also specify which columns would be hidden when the report loads (by default, all values are hidden except 30X):
general_stats_coverage_hidden:
- 10
- 20
- 200
For the per-contig coverage plot, you can include and exclude contigs based on name or pattern.
For example, you could add the following to your MultiQC config file:
mosdepth_config:
include_contigs:
- "chr*"
exclude_contigs:
- "*_alt"
- "*_decoy"
- "*_random"
- "chrUn*"
- "HLA*"
- "chrM"
- "chrEBV"
Note that exclusion superseeds inclusion for the contig filters.
To additionally avoid cluttering the plot, mosdepth can exclude contigs with a low relative coverage.
mosdepth_config:
# Should be a fraction, e.g. 0.001 (exclude contigs with 0.1% coverage of sum of
# coverages across all contigs)
perchrom_fraction_cutoff: 0.001
If you want to see what is being excluded, you can set show_excluded_debug_logs to True:
mosdepth_config:
show_excluded_debug_logs: True
This will then print a debug log message (use multiqc -v) for each excluded contig.
This is disabled by default as there can be very many in some cases.
Besides the {prefix}.mosdepth.global.dist.txt and {prefix}.mosdepth.region.dist.txt
files, the {prefix}.mosdepth.summary.txt file is used for the General Stats table.
The module also plots an X/Y relative chromosome coverage per sample. By default, it finds chromosome named X/Y or chrX/chrY, but that can be customised:
mosdepth_config:
# Name of the X and Y chromosomes. If not specified, MultiQC will search for
# any chromosome names that look like x, y, chrx or chry (case-insensitive)
xchr: myXchr
ychr: myYchr
File search patterns
mosdepth/global_dist:
fn: '*.mosdepth.global.dist.txt'
mosdepth/region_dist:
fn: '*.mosdepth.region.dist.txt'
mosdepth/summary:
fn: '*.mosdepth.summary.txt'