Skip to main content

Deacon

note

Search and depletion of FASTA/FASTQ files and streams using accelerated minimizer matching.

https://github.com/bede/deacon

Deacon filters DNA sequences in FASTA/Q files and streams using SIMD-accelerated minimizer comparison against an indexed query. It can either keep matching sequences (search mode, default) or remove them (depletion mode, -d / --deplete). Built with panhuman host depletion in mind but useful for searching large sequence collections.

This module parses the JSON summary log written by deacon filter when called with -s / --summary, and reports the number of input, kept, and removed sequences and base pairs alongside the filter mode.

Generating compatible output

The module looks for the JSON summary file produced by the --summary (-s) option:

# Search mode: keep matching reads
deacon filter index.idx reads.fq.gz -o matches.fq.gz -s summary.json

# Depletion mode: remove matching reads (e.g. host depletion)
deacon filter -d panhuman-1.k31w15.idx reads.fq.gz -o depleted.fq.gz -s summary.json

Interpreting results

seqs_removed and bp_removed count sequences and base pairs that matched the indexed query and were therefore filtered out. In depletion mode (Deplete = True) these are typically host reads to discard; in search mode they are non-target reads. The Deplete column distinguishes the two modes so the same report can mix search and depletion samples.

File search patterns

deacon:
contents: '"version": "deacon'
fn: '*.json'
num_lines: 30