BJ-WGS
Background#
BJ-WGS pipeline is a scalable and reproducible bioinformatics pipeline to process single-cell sequencing data from ResolveDNA Whole Genome Amplification or any single-cell or bulk sequencing data. The pipeline currently only has added support for human sequencing data but can certainly be extended to other model systems. The pipeline takes raw sequencing data in form of fastq files and performs alignment, removes duplicate reads, base calibrates the reads, all before haplotype calling. The pipeline also runs DNAScope variant caller, and by default, DNAScope vcf files are used for variant annotation and all other downstream analyses.
Pipeline Overview#
flowchart LR
%% Colors %%
classDef panel fill:transparent,stroke:#323232,stroke-dasharray:8
classDef panelt fill:transparent,stroke-opacity: 0
classDef black fill:#12294C,stroke:#12294C,stroke-width:2px,color:#fff
classDef blue fill:#20A4F3,stroke:#20A4F3,stroke-width:2px,color:#fff
classDef green fill:#3BCEAC,stroke:#3BCEAC,stroke-width:2px,color:#fff
classDef yellow fill:#ffd166,stroke:#ffd166,stroke-width:2px,color:#fff
classDef pink fill:#ef476f,stroke:#ef476f,stroke-width:2px,color:#fff
classDef orange fill:#f3722c,stroke:#f3722c,stroke-width:2px,color:#fff
classDef red fill:#BB4430,stroke:#BB4430,stroke-width:2px,color:#fff
classDef ming fill:#387780,stroke:#387780,stroke-width:2px,color:#fff
Start((Start)):::black --fastq--> Alignment[Align Reads <br/> <br/> Remove Duplicates <br/> <br/> BQSR]
subgraph Map
Alignment:::green
end
subgraph Variant[Variant Calling]
Alignment --> Haplotyper[Haplotyper]:::yellow
Alignment --> DNAScope[DNAScope]:::yellow
end
subgraph Evaluate
Alignment --> M_Sentieon[Alignment Metrics <br/><br/> GC Metrics <br/><br/> Insert Size Metrics <br/><br/> Coverage Metrics]:::pink
DNAScope --> vcfeval[Variant Evaluation]:::pink
DNAScope --> Annotation[Variant Annotation]:::pink
end
subgraph Report
vcfeval --> Mqc[MultiQC Report]:::orange
Annotation --> Mqc
M_Sentieon --> Mqc
end
Mqc --> End((End)):::black
Map:::panel
Variant:::panel
Evaluate:::panel
Report:::panel
Following are the steps and tools that pipeline uses to perform the analyses:
-
Map reads to reference genome using
SENTIEON BWA MEM
-
Remove duplicate reads using
SENTIEON DRIVER LOCUSCOLLECTOR
andSENTIEON DRIVER DEDUP
-
Perform base quality score recalibration (BQSR) using
SENTIEON DRIVER BQSR
-
Perform variant calling with
HAPLOTYPER
caller -
Perform variant calling with
DNAScope
caller -
Perform variant annotation with
SNPEFF
and annotate variants with COSMIC, ClinVar, and dbSNP databases -
Evaluate metrics using
SENTIEON DRIVER METRICS
which includes Alignment, GC Bias, Insert Size, and Coverage metrics -
Evaluate variants with
VCFEval
to assess analytical performance (Only supported for HG001 samples) -
Aggregate the metrics across biosamples and tools to create overall pipeline statistics summary using
MULTIQC
Info
HG001 is a GIAB reference sample. More information is found here Genome in a Bottle
Pipeline Parameters#
Parameter Name | Options | Description |
---|---|---|
Genome | GRCh38 (default) |
Reference genome to use for alignment |
Module Parameters#
Module | Parameter Name | Options | Description |
---|---|---|---|
Variant Annotation | (default) | Perform annotation of genic variants with dbSNP, ClinVar, and COSMIC databases. | |
Evaluate Variant Calling | (default) | Perform benchmarking on variant calling based on ground truth variants. Only for HG001/NA12878 samples. |
Output files#
Output Directory/File |
Notes |
---|---|
multiqc / |
This section includes output files containing metrics from various tools to create a MultiQC report. MultiQC Report Example |
secondary_analyses / alignment / metrics / variant_calls* / |
alignment/ Biosample level output containing aligned reads and index file. metrics/ Metrics output from secondary analyses - Alignment, GC bias, Insert Size, Coverage, and library complexity metrics. The *-pipeline_all_metrics_mqc.txt contains metrics from the All Metrics section of the MultiQC report found in BaseJumper. variant_calls*/ Biosample level output containing the variant calls in vcf format and index file for Haplotype and DNAScope variant caller. |
tertiary_analyses / variant_annotation / |
Contains the annotated vcfs for individual biosamples and as a multisample variants table in txt and hdf5 file format. MultiSample Variants Table Example |
execution_info / |
This section includes execution information regarding the pipeline run. |