Using MultiQC within scripts
Even though the primary way to run MultiQC is as a command line, it can also be imported like a Python module in order to build the report interactively, such as in custom Python scripts or in a Jupyter notebook environment (See an example notebook).
MultiQC provides a set of commands to iteratively parse logs and add sections to a report. All of them are available via importing MultiQC as a module:
import multiqc
Parse logs
Find files that MultiQC recognizes in analysis_dir
and parse them, without generating a report.
Data can be accessed with other methods: list_modules
, list_plots
, etc.
def parse_logs(*analysis_dir, **kwargs)
Parameters:
analysis_dir
: Path(s) to search for files to parseverbose
: Print more information to the consolefile_list
: Supply a file containing a list of file paths to be searched, one per rowprepend_dirs
: Prepend directory to sample namesdirs_depth
: Prepend n directories to sample names. Negative number to take from start of pathfn_clean_sample_names
: Do not clean the sample names (leave as full file name)require_logs
: Require all explicitly requested modules to have log files. If not, MultiQC will exit with an erroruse_filename_as_sample_name
: Use the log filename as the sample namestrict
: Don't catch exceptions, run additional code checks to help developmentquiet
: Only show log warningsno_ansi
: Disable coloured log outputprofile_runtime
: Add analysis of how long MultiQC takes to run to the reportno_version_check
: Disable checking the latest MultiQC version on the serverignore
: Ignore analysis filesignore_samples
: Ignore sample namesrun_modules
: Use only this module. Can specify multiple timesexclude_modules
: Do not use this module. Can specify multiple timesconfig_files
: Specific config file to load, after those in MultiQC dir / home dir / working dirmodule_order
: Names of modules in order of precedence to show in reportextra_fn_clean_exts
: Extra file extensions to clean from sample namesextra_fn_clean_trim
: Extra strings to clean from sample namespreserve_module_raw_data
: Preserve raw data from modules in the report - besides plots. Useful to use later interactively. Defaults toTrue
. Set toFalse
to save memory.
Examples
Parse logs found in the data
directory.
multiqc.parse_logs('data')
Parse logs found in the data/fastp
directory, the data/SAMPLE1.cutadapt.log
file,
and a data_mqc.tsv
MultiQC custom content file.
multiqc.parse_logs('data/fastp', 'data/SAMPLE1.cutadapt.log', "data_mqc.tsv")
Parse logs found in the data
directory for only the specified modules, and use
and additional pattern to clean sample names.
multiqc.parse_logs(
'data',
run_modules=["fastp", "spades", "quast", "pangolin"],
extra_fn_clean_exts=[".unclassified"],
)
Parse logs found in the data
directory and run FastQC module twice for two sets of files - raw and trimmed reads - according to the provided path pattern (see Order of modules for details).
multiqc.parse_logs(
'data',
module_order=[
dict(
fastqc=dict(
name="FastQC (trimmed)",
anchor="fastqc_trimmed",
path_filters=["*_1_trimmed_fastqc.zip"],
)
),
dict(
quast=dict(
name="FastQC (raw)",
anchor="fastqc_raw",
path_filters=["*_1_fastqc.zip"],
)
),
],
)
Load JSON dump data
Try find the multiqc_data.json
generated by previous MultiQC run in the given directory, and load it into the report.
def parse_data_json(path: str | Path)
Parameters:
path
: Path to the directory containing multiqc_data.json or the path to the file itself.
Example:
multiqc.parse_data_json('multiqc_data/multiqc_data.json')
List what's loaded
Return list
of the modules that have been loaded, ordered according to config:
def list_modules() ‑> list[str]
Return dict
of plot names that have been loaded, indexed by module name and section:
def list_plots() ‑> dict[str, list[str | dict[str, str]]]]
Example:
multiqc.list_plots()
{'fastp': ['Filtered Reads',
'Insert Sizes',
{'Sequence Quality': ['Read 1: Before filtering',
'Read 1: After filtering',
'Read 2: Before filtering',
'Read 2: After filtering']},
{'GC Content': ['Read 1: Before filtering',
'Read 1: After filtering',
'Read 2: Before filtering',
'Read 2: After filtering']},
{'N content': ['Read 1: Before filtering',
'Read 1: After filtering',
'Read 2: Before filtering',
'Read 2: After filtering']}]}
Return list
of clean sample names that have loaded data:
def list_samples() ‑> list[str]
Example:
multiqc.list_samples()
['SAMPLE1_PE', 'SAMPLE2_PE']
Return list
of found log files corresponding to the loaded data:
def list_data_sources() ‑> list[str]
Example:
multiqc.list_data_sources()
['data/SAMPLE1_PE.fastp.json', 'data/SAMPLE2_PE.fastp.json']
Access loaded data
There are several methods to access the data loaded by parse_logs
.
Return parsed module data, indexed (if available) by data key, then by sample. Module is either the module name, or the anchor:
def get_module_data(module: str = None, sample: str = None, key: str = None) ‑> dict
The function takes data from report.saved_raw_data
, which populated by self.write_data_file() calls in individual modules.
This data is not necessarily normalized, e.g. numbers can be strings or numbers, depending on the individual module behaviour.
Example:
> multiqc.get_module_data(module="fastp", sample="SAMPLE1_PE")["summary"]
{'fastp_version': '0.23.2',
'sequencing': 'paired end (301 cycles + 301 cycles)',
'before_filtering': {'total_reads': 55442,
'total_bases': 16571632,
'q20_bases': 16267224,
'q30_bases': 15853021,
'gc_content': 0.38526},
'after_filtering': {'total_reads': 48270,
'total_bases': 14363465,
'q20_bases': 14323363,
'q30_bases': 14199841,
'gc_content': 0.383991}}
Similarly, return parsed general stats data, indexed by sample, then by data key. If sample is specified, return only data for that sample.
def get_general_stats_data(sample: str = None) ‑> dict
Adding custom content
You can also custom section to the report by subclassing from multiqc.BaseMultiqcModule
. This can be used to add a custom table or other content.
Example
Create a table (see plotting for more detail) and add it to the report.
import multiqc
from multiqc.plots import table
plot = table.plot(
data=...,
headers=...,
pconfig={
"id": "my_metrics_table",
"title": "My metrics",
},
)
module = multiqc.BaseMultiqcModule(
name="my-module",
anchor="custom_data",
)
module.add_section(
plot=plot,
name="My metrics",
anchor="my_metrics_section",
description=...,
)
multiqc.report.modules.append(module)
Get plot object
Get a plot object for a specific module and section. For list of available plots, use multiqc.list_plots
.
def get_plot(module: str, section: str) -> Plot
Examples
Get plot object for the "GC Content" plot in the "fastp" module.
plot = multiqc.get_plot("fastp", "GC Content")
Get plot object for the "Number of Contigs" plot in the "QUAST" module.
plot = multiqc.get_plot("QUAST", "Number of Contigs")
Show plot
Prepare plot to be shown in the notebook cell.
class Plot:
def show(self, dataset_id: int | str = 0, flat=False, **kwargs)
Parameters:
dataset_id
: Dataset label, in case if plot has several tabsflat
: Show plot as static images without any interactivitykwargs
: Additional arguments passed to the plot
Examples
Create a bar graph and show it in the notebook cell:
from multiqc.plots import bargraph
plot = bargraph.plot(...)
display(plot.show(violin=True))
Get "fastp GC Content" plot and show it in the notebook cell. Since it has multiple
tabs, we can select which tab to show with the dataset_id
option (defaults to the first tab):
plot = multiqc.get_plot("fastp", "GC Content")
display(plot.show(dataset_id="Read 2: Before filtering"))
Shows Samtools alignment stats as a violin plot. Use flat image without interactivity.
plot = multiqc.get_plot("Flagstat", "Alignment stats")
display(plot.show("Read counts", violin=True, flat=True))
Calling the notebook's built-in display
function is optional when the show
call is the last line in your cell.
Save plot to file
Similarly, you can save plot to a file instead of showing it in a notebook.
class Plot:
def save(self, filename, dataset_id: int | str = 0, **kwargs)
Parameters:
filename
: Path to save the plotdataset_id
: Dataset label, in case if plot has several tabskwargs
: Additional arguments passed to the plot
Examples:
Save the "Number of Contigs" plot for the QUAST module to a file.
plot = multiqc.get_plot("QUAST", "Number of Contigs")
plot.save("quast_contigs.html")
Save the GC Content plot for the dataset labeled "Read 2: Before filtering" to a file, make it flat.
plot = multiqc.get_plot("fastp", "GC Content")
plot.save(
"fastp_gc_content.png",
dataset_id="Read 2: Before filtering",
)
Save Samtools alignment stats as a violin plot to a file.
plot = multiqc.get_plot("Flagstat", "Alignment stats")
plot.save(
"flagstat_alignment_stats.html",
dataset="Read counts",
violin=True,
)
Writing report
Render HTML from parsed module data, and write a report along with auxiliary data files to disk.
def write_report(**kwargs)
Parameters:
title
: Report title. Printed as page header, used for filename if not otherwise specifiedreport_comment
: Custom comment, will be printed at the top of the reporttemplate
: Report template to useoutput_dir
: Create report in the specified output directoryfilename
: Report filename. Use 'stdout' to print to standard outmake_data_dir
: Force the parsed data directory to be createddata_format
: Output parsed data in a different formatzip_data_dir
: Compress the data directoryforce
: Overwrite existing report and data directorymake_report
: Generate the report HTML. Defaults toTrue
, set toFalse
to only export data and plotsexport_plots
: Export plots as static images in addition to the reportplots_force_flat
: Use only flat plots (static images)plots_force_interactive
: Use only interactive plots (in-browser Javascript)strict
: Don't catch exceptions, run additional code checks to help developmentdevelopment
: Development mode. Do not compress and minimise JS, export uncompressed plot datamake_pdf
: Create PDF report. Requires Pandoc to be installedno_megaqc_upload
: Don't upload generated report to MegaQC, even if MegaQC options are foundquiet
: Only show log warningsverbose
: Print more information to the consoleno_ansi
: Disable coloured log outputprofile_runtime
: Add analysis of how long MultiQC takes to run to the reportno_version_check
: Disable checking the latest MultiQC version on the serverrun_modules
: Use only these modulesexclude_modules
: Do not use these modulesconfig_files
: Specific config file to load, after those in MultiQC dir / home dir / working dircustom_css_files
: Custom CSS files to include in the reportmodule_order
: Names of modules in order of precedence to show in report
Example:
multiqc.write_report(
force=True,
output_dir="my_multiqc_report",
title="My Report",
filename="report.html",
)
Load config from file
Load config on top of the current config from a MultiQC config file.
def load_config(config_file: str | Path)
Reset session
Reset the report to start fresh. Drops all previously parsed and loaded data.
def reset()