BJ-Germline-Variantcalling
Background#
BJ-Germline-Variantcalling-Parabricks is a scalable and reproducible bioinformatics pipeline for processing single-cell sequencing data from BioSkryb's Whole Genome Amplification. Pipeline takes raw sequencing data in the form of FASTQ (Illumina/Element) or CRAM (Ultima), performs alignment and removes duplicate reads (Illumina/Element). Pipeline uses Google DeepVariant to make variant calls using custom model train with BioSkryb single-cell data and uses population allele frequency to improve sensitivity in making variant calls.
Pipeline Overview#
flowchart LR
%% Colors %%
classDef panel fill:transparent,stroke:#323232,stroke-dasharray:8
classDef panelt fill:transparent,stroke-opacity: 0
classDef black fill:#12294C,stroke:#12294C,stroke-width:2px,color:#fff
classDef blue fill:#20A4F3,stroke:#20A4F3,stroke-width:2px,color:#fff
classDef green fill:#3BCEAC,stroke:#3BCEAC,stroke-width:2px,color:#fff
classDef yellow fill:#ffd166,stroke:#ffd166,stroke-width:2px,color:#fff
classDef pink fill:#ef476f,stroke:#ef476f,stroke-width:2px,color:#fff
classDef orange fill:#f3722c,stroke:#f3722c,stroke-width:2px,color:#fff
classDef red fill:#BB4430,stroke:#BB4430,stroke-width:2px,color:#fff
classDef ming fill:#387780,stroke:#387780,stroke-width:2px,color:#fff
Start((Start)):::black --fastq--> Alignment[Parabricks Align Reads <br/> <br/> Remove Duplicates]
subgraph Map
Alignment:::green
end
subgraph Variant[Variant Calling]
Alignment --> Deepvariant[Deepvariant]:::yellow
end
subgraph Evaluate
Alignment --> Metrics[Alignment Metrics <br/><br/> GC Metrics <br/><br/> Insert Size Metrics <br/><br/> Bam Metrics <br/><br/> HS Metrics - Exome mode ]:::pink
Deepvariant --> vcfeval[Variant Evaluation]:::pink
Alignment --> ado[Allelic balance - ADO -Benchmarking <br/><br/> Bam-Lorenz Coverage]:::pink
end
subgraph Report
vcfeval --> Mqc[MultiQC Report]:::orange
Metrics --> Mqc
ado --> Mqc
end
Mqc --> End((End)):::black
Map:::panel
Variant:::panel
Evaluate:::panel
Report:::panel
Following are the steps and tools that pipeline uses to perform the analyses:
-
Map reads to reference genome and remove duplicate reads using
PARABRICKS FQ2BAM -
Perform variant calling with
GOOGLE DEEPVARIANTcaller -
Evaluate metrics using
PARABRICKS COLLECTMETRICSwhich includes Alignment, GC Bias, Insert Size, and Coverage metrics -
Evaluate variants with
VCFEvalto assess analytical performance and allelic balance evaluation (Only supported for Genome in a Bottle samples) -
Evaluate coverage uniformity across genomic regions with
BAM-LORENZ-COVERAGE -
Aggregate the metrics across biosamples and tools to create overall pipeline statistics summary using
MULTIQC
Pipeline Parameters#
| Parameter Name | Options | Description |
|---|---|---|
| Genome | GRCh38 (default) |
Reference genome to use for alignment |
Module Parameters#
| Module | Parameter Name | Options | Description |
|---|---|---|---|
| Subsampling | (default) | Enables downsampling of input reads to a specified read count using SEQTK. | |
| Evaluate Variant Calling | (default) | Perform benchmarking on variant calling based on ground truth variants. Only for GIAB samples. |
|
| Allelic balance (ADO) Benchmarking | (default) | Evaluation of allele coverage at known heterozygous sites. | |
| BAM Lorrenz coverage | (default) | Generates a Lorenz curve from BAM files to assess the uniformity of sequencing coverage across the genome. | |
| Mutation Signature Profile | (default) | Performs mutational signature profiling. |
Output files#
Output Directory/File |
Notes |
|---|---|
multiqc/ |
This section includes output files containing metrics from various tools to create a MultiQC report. MultiQC Report Example The all_metrics_mqc.txt contains metrics from the All Metrics section of the MultiQC report found in BaseJumper. |
PARABRICKS_PRIMARY_WORKFLOW_PARABRICKS_FQ2BAM/ GOOGLE_DEEPVARIANT_WF_DEEPVARIANT_POSTPROCESS/ |
PARABRICKS_PRIMARY_WORKFLOW_PARABRICKS_FQ2BAM/Biosample level output containing aligned reads and index file. GOOGLE_DEEPVARIANT_WF_DEEPVARIANT_POSTPROCESS/ Biosample level output containing the variant calls in vcf format and index file. |
execution_info/ |
This section includes detail execution information regarding all the tasks in pipeline run. |