BJ-Germline-Variantcalling
Background#
BJ-Germline-Variantcalling-Parabricks is a scalable and reproducible bioinformatics pipeline for processing single-cell sequencing data from BioSkryb's Whole Genome Amplification. Pipeline takes raw sequencing data in the form of FASTQ (Illumina/Element) or CRAM (Ultima), performs alignment and removes duplicate reads (Illumina/Element). Pipeline uses Google DeepVariant to make variant calls using custom model train with BioSkryb single-cell data and uses population allele frequency to improve sensitivity in making variant calls.
Pipeline Overview#
flowchart LR
%% Colors %%
classDef panel fill:transparent,stroke:#323232,stroke-dasharray:8
classDef panelt fill:transparent,stroke-opacity: 0
classDef black fill:#12294C,stroke:#12294C,stroke-width:2px,color:#fff
classDef blue fill:#20A4F3,stroke:#20A4F3,stroke-width:2px,color:#fff
classDef green fill:#3BCEAC,stroke:#3BCEAC,stroke-width:2px,color:#fff
classDef yellow fill:#ffd166,stroke:#ffd166,stroke-width:2px,color:#fff
classDef pink fill:#ef476f,stroke:#ef476f,stroke-width:2px,color:#fff
classDef orange fill:#f3722c,stroke:#f3722c,stroke-width:2px,color:#fff
classDef red fill:#BB4430,stroke:#BB4430,stroke-width:2px,color:#fff
classDef ming fill:#387780,stroke:#387780,stroke-width:2px,color:#fff
Start((Start)):::black --fastq--> Alignment[Parabricks Align Reads <br/> <br/> Remove Duplicates]
subgraph Map
Alignment:::green
end
subgraph Variant[Variant Calling]
Alignment --> Deepvariant[Deepvariant]:::yellow
end
subgraph Evaluate
Alignment --> Metrics[Alignment Metrics <br/><br/> GC Metrics <br/><br/> Insert Size Metrics <br/><br/> Bam Metrics <br/><br/> HS Metrics - Exome mode ]:::pink
Deepvariant --> vcfeval[Variant Evaluation]:::pink
Alignment --> ado[Allelic balance - ADO -Benchmarking <br/><br/> Bam-Lorenz Coverage]:::pink
end
subgraph Report
vcfeval --> Mqc[MultiQC Report]:::orange
Metrics --> Mqc
ado --> Mqc
end
Mqc --> End((End)):::black
Map:::panel
Variant:::panel
Evaluate:::panel
Report:::panel
Following are the steps and tools that pipeline uses to perform the analyses:
-
Map reads to reference genome and remove duplicate reads using
PARABRICKS FQ2BAM
-
Perform variant calling with
GOOGLE DEEPVARIANT
caller -
Evaluate metrics using
PARABRICKS COLLECTMETRICS
which includes Alignment, GC Bias, Insert Size, and Coverage metrics -
Evaluate variants with
VCFEval
to assess analytical performance and allelic balance evaluation (Only supported for Genome in a Bottle samples) -
Evaluate coverage uniformity across genomic regions with
BAM-LORENZ-COVERAGE
-
Aggregate the metrics across biosamples and tools to create overall pipeline statistics summary using
MULTIQC
Pipeline Parameters#
Parameter Name | Options | Description |
---|---|---|
Genome | GRCh38 (default) |
Reference genome to use for alignment |
Module Parameters#
Module | Parameter Name | Options | Description |
---|---|---|---|
Subsampling | (default) | Enables downsampling of input reads to a specified read count using SEQTK. | |
Evaluate Variant Calling | (default) | Perform benchmarking on variant calling based on ground truth variants. Only for GIAB samples. |
|
Allelic balance (ADO) Benchmarking | (default) | Evaluation of allele coverage at known heterozygous sites. | |
BAM Lorrenz coverage | (default) | Generates a Lorenz curve from BAM files to assess the uniformity of sequencing coverage across the genome. | |
Mutation Signature Profile | (default) | Performs mutational signature profiling. |
Output files#
Output Directory/File |
Notes |
---|---|
multiqc / |
This section includes output files containing metrics from various tools to create a MultiQC report. MultiQC Report Example The all_metrics_mqc.txt contains metrics from the All Metrics section of the MultiQC report found in BaseJumper. |
PARABRICKS_PRIMARY_WORKFLOW_PARABRICKS_FQ2BAM / GOOGLE_DEEPVARIANT_WF_DEEPVARIANT_POSTPROCESS/ |
PARABRICKS_PRIMARY_WORKFLOW_PARABRICKS_FQ2BAM/ Biosample level output containing aligned reads and index file. GOOGLE_DEEPVARIANT_WF_DEEPVARIANT_POSTPROCESS/ Biosample level output containing the variant calls in vcf format and index file. |
execution_info / |
This section includes detail execution information regarding all the tasks in pipeline run. |