15p.zip -
Introduced in , the "Deep" method (triggered by the --deep command) is a novel approach to co-compression . How It Works
While a standard .zip file uses the (a combination of LZ77 and Huffman coding) to find repeated patterns within a single stream of data, Genozip’s Deep method is domain-specific. It understands the structure of genomic data, allowing it to achieve compression ratios that general-purpose tools cannot match. 15p.zip
: Containing those same reads mapped to a reference genome. Introduced in , the "Deep" method (triggered by
Because BAM files originate from FASTQ files, they contain nearly identical sequence information, creating a massive that traditional compression (like GZIP or BGZF) treats as separate, redundant data. The "Deep" Compression Method : Containing those same reads mapped to a reference genome
: Despite the aggressive reduction, the process is bit-for-bit lossless. Using the Genozip platform , users can reconstruct the original FASTQ and BAM files exactly as they were. Performance and Impact
The primary bottleneck in modern bioinformatics is the massive storage requirement for Next-Generation Sequencing (NGS) data. Standard workflows typically involve two main file types:
: Instead of compressing FASTQ and BAM files as independent silos, Deep exploits the fact that the BAM file is essentially a reorganized version of the FASTQ data.