Skip to main content

mgikit

note

Demultiplexes FASTQ files from an MGI sequencing instrument

https://github.com/sagc-bioinformatics/mgikit

Possible mgikit output files are:

  1. Sample stats file ('*.L4.mgikit.sample_stats'): Sample statistics for each lane like yield, quality scores, cluster count
  2. General info file ('*.L4.mgikit.general'): Rounded up sample stats, but also includes lane-level stats
  3. General info file ('*.L4.mgikit.info'): Matching indexes within the data generated by a specific lane
  4. Undetermined barcodes file ('*.L4.mgikit.undetermined_barcode'): Barcodes that did not match with any sample.
  5. Ambiguous barcodes file ('*.L4.mgikit.ambiguous_barcode'): Barcodes that match with multiple samples. This situation can happen when setting a high mismatch threshold.

Configuration options:

mgikit:
# ignore undetermined and ambiguous cases in the report
keep_core_samples: false
# number of undetermined barcodes to be presented in the report. It takes any positive value less than or equal to the number of barcodes in the demultiplexer reports which is usually 50
undetermined_barcode_threshold: 25
# generate a brief version of the report. Ignores the reports for cluster per sample per lane.
brief_report: false
# number of decimal positions to be used for counts in the tables
decimal_positions: 2

File search patterns

mgikit/mgi_ambiguous_barcode:
fn: "*.mgikit.ambiguous_barcode"
mgikit/mgi_general_info:
fn: "*.mgikit.general"
mgikit/mgi_sample_reads:
fn: "*.mgikit.info"
mgikit/mgi_sample_stats:
fn: "*.mgikit.sample_stats"
mgikit/mgi_undetermined_barcode:
fn: "*.mgikit.undetermined_barcode"