Skip to main content

SeqKit

note

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation.

https://bioinf.shenwei.me/seqkit/

SeqKit is a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.

Supported commands:

  • stats

The module parses output from seqkit stats which provides simple statistics of FASTA/Q files including sequence counts, total length, N50, GC content, and quality metrics for FASTQ files.

stats

The seqkit stats command produces tabular output with columns for file, format, type, num_seqs, sum_len, min_len, avg_len, max_len, and optionally Q1, Q2, Q3, sum_gap, N50, Q20(%), Q30(%), AvgQual, and GC(%) when run with the --all flag.

To generate output suitable for MultiQC, run seqkit stats with the --tabular flag:

seqkit stats --all --tabular *.fastq.gz > seqkit_stats.tsv

File search patterns

seqkit/stats:
contents_re: ^file\s+format\s+type\s+num_seqs\s+sum_len
num_lines: 1