Skip to main content

VG

Toolkit to manipulate and analyze graphical genomes, including read alignment

https://github.com/vgteam/vg

The module parses the vg stats reports that summarize the stats of read alignment to a graphical genome in a GAM file.

vg stats is capable of producing many reports summarizing many aspects of graphical genomes, including specific aspects of aligned GAM files such as node coverage. This module is not meant to gather those data. Rather, this module is designed to summarize the alignment performance of GAM files produced by vg giraffe created from the stdout of the vg stats command:

$ vg stats -a mapped.gam > sample-stats.txt
$ cat sample-stats.txt
Total alignments: 727413268
Total primary: 727413268
Total secondary: 0
Total aligned: 717826332
Total perfect: 375143620
Total gapless (softclips allowed): 714388968
Total paired: 727413268
Total properly paired: 715400510
Alignment score: mean 129.012, median 132, stdev 31.5973, max 161 (244205781 reads)
Mapping quality: mean 52.8552, median 60, stdev 17.7742, max 60 (589259353 reads)
Insertions: 3901467 bp in 1466045 read events
Deletions: 6759252 bp in 2795331 read events
Substitutions: 281648245 bp in 281648245 read events
Softclips: 11480269152 bp in 252773804 read events
Total time: 291465 seconds
Speed: 2495.71 reads/second

It is not guaranteed that output created using any other parameter combination can be parsed using this module.

The graphical reports are designed to mimic a samtools stats report, including:

  1. A bar chart showing the breakdown of aligned, perfectly aligned, and unaligned reads.
  2. A violin plot for all metrics.

File search patterns

vg/stats:
contents:
- "Total perfect:"
- "Total gapless (softclips allowed):"
- "Total time:"
- "Speed:"
num_lines: 30