hicstuff
note
Hi-C pipeline that generates contact maps from sequencing reads.
The module parses two file types from the hicstuff Hi-C pipeline:
- Pipeline log files (
*.log,*.txt), identified by the## hicstuff:header line. The end-of-run summary stats dictionary feeds the General Statistics table and the Read Fate stacked bar plot. - Distance law tables (default
distance_law.txt, also commonly seen with.tsvextensions), identified by a## distance_lawheader. These drive the P(s) contact-probability line graph and its log-log slope plot.
hicstuff writes a summary stats dictionary to the log at the end of each run:
## hicstuff: v3.2.2 log file
## date: 2024-02-16 14:00:23
## enzyme: DpnII,HinfI
## input1: ../tinyMapper/tests/testHiC_R1.fq.gz
## input2: ../tinyMapper/tests/testHiC_R2.fq.gz
## ref: /home/rsg/genomes/S288c/S288c.fa
---
...
2024-02-16,14:00:43 :: INFO :: 77% reads (single ends) mapped with Q >= 30 (154272/200000)
2024-02-16,14:00:44 :: INFO :: 66943 pairs successfully mapped (66.94%)
2024-02-16,14:00:46 :: INFO :: Fetching mapping and pairing stats
2024-02-16,14:00:46 :: INFO :: {'Sample': 'testHiC^CGNT57', 'Total read pairs': 100000, 'Mapped reads': 154272, 'Unmapped reads': 45728, 'Recovered contacts': 66943, 'Final contacts': 66943, 'Removed contacts': 0, 'Filtered out': 0, 'Loops': 0, 'Uncuts': 0, 'Weirds': 0, 'PCR duplicates': 0}
2024-02-16,14:00:46 :: INFO :: Contact map generated after 0h 0m 23s
File search patterns
hicstuff/distancelaw:
contents: '## distance_law'
num_lines: 5
hicstuff/pipeline_stats:
- contents: '## hicstuff:'
fn: '*.txt'
num_lines: 100
- contents: '## hicstuff:'
fn: '*.log'
num_lines: 10