BJ-WES
Background#
BJ-WES pipeline is a scalable and reproducible bioinformatics pipeline to process whole exome/targeted panel sequencing data. The pipeline currently only has added support for human sequencing data but can certainly be extended to other model systems. The pipeline takes raw sequencing data in form of fastq files and performs quality control assessments to evaluate the quality of the library build. The pipeline then aligns, removes duplicate reads, base calibrates the reads, and performs variant calling with haplotype caller and DNAScope caller. The pipeline uses DNAScope variant caller for all downstream analyses.
Currently pipeline supports following panels:
- xGen Exome Hyb Panel v2
- TruSight One, and
- TWIST
Info
Contact BaseJumper support if you would like to see additional panels supported in the BaseJumper platform.
Pipeline Overview#
flowchart LR
%% Colors %%
classDef panel fill:transparent,stroke:#323232,stroke-dasharray:8
classDef panelt fill:transparent,stroke-opacity: 0
classDef black fill:#12294C,stroke:#12294C,stroke-width:2px,color:#fff
classDef blue fill:#20A4F3,stroke:#20A4F3,stroke-width:2px,color:#fff
classDef green fill:#3BCEAC,stroke:#3BCEAC,stroke-width:2px,color:#fff
classDef yellow fill:#ffd166,stroke:#ffd166,stroke-width:2px,color:#fff
classDef pink fill:#ef476f,stroke:#ef476f,stroke-width:2px,color:#fff
classDef orange fill:#f3722c,stroke:#f3722c,stroke-width:2px,color:#fff
classDef red fill:#BB4430,stroke:#BB4430,stroke-width:2px,color:#fff
classDef ming fill:#387780,stroke:#387780,stroke-width:2px,color:#fff
Start((Start)):::black --fastq--> Alignment[Align Reads <br/> <br/> Remove Duplicates <br/> <br/> BQSR]
subgraph Map
Alignment:::green
end
subgraph Variant[Variant Calling]
Alignment --> Haplotyper[Haplotyper]:::yellow
Alignment --> DNAScope[DNAScope]:::yellow
end
subgraph Evaluate
Alignment --> M_Sentieon[Alignment Metrics <br/><br/> GC Metrics <br/><br/> Insert Size Metrics <br/><br/> Coverage Metrics <br/><br/> Picard CollectHsMetrics]:::pink
DNAScope --> vcfeval[Variant Evaluation]:::pink
DNAScope --> Annotation[Variant Annotation]:::pink
end
subgraph Report
vcfeval --> Mqc[MultiQC Report]:::orange
Annotation --> Mqc
M_Sentieon --> Mqc
end
Mqc --> End((End)):::black
Map:::panel
Variant:::panel
Evaluate:::panel
Report:::panel
Following are the steps and tools that pipeline uses to perform the analyses:
-
Map reads to reference genome using
SENTIEON BWA MEM
-
Remove duplicate reads using
SENTIEON DRIVER LOCUSCOLLECTOR
andSENTIEON DRIVER DEDUP
-
Perform base quality score recalibration (BQSR) using
SENTIEON DRIVER BQSR
-
Perform variant calling with
HAPLOTYPER
caller -
Perform variant calling with
DNAScope
caller -
Perform variant annotation with
SNPEFF
and annotate variants with COSMIC, ClinVar, and dbSNP databases -
Evaluate metrics using
SENTIEON DRIVER METRICS
which includes Alignment, GC Bias, Insert Size, Coverage metrics, and Picard CollectHsMetrics -
Evaluate variants with
VCFEval
to assess analytical performance (Only supported for HG001 samples) -
Aggregate the metrics across biosamples and tools to create overall pipeline statistics summary using
MULTIQC
Info
HG001 is a GIAB reference sample. More information is found here Genome in a Bottle
Pipeline Parameters#
Parameter Name | Options | Description |
---|---|---|
Exome/Targeted Panel | xGen Exome Hyb Panel v2 (default) TruSight One TWIST |
Reference genome to use for alignment |
Genome | GRCh38 (default) |
Reference genome to use for alignment |
Module Parameters#
Module | Parameter Name | Options | Description |
---|---|---|---|
Variant Annotation | (default) | Perform annotation of genic variants with dbSNP, ClinVar, and COSMIC databases. | |
Evaluate Variant Calling | (default) | Perform benchmarking on variant calling based on ground truth variants. Only for HG001/NA12878 samples. |
Output files#
Output Directory/File |
Notes |
---|---|
multiqc / |
This section includes output files containing metrics from various tools to create a MultiQC report. |
secondary_analyses / alignment / metrics / variant_calls* / |
alignment/ Biosample level output containing aligned reads and index file. metrics/ Metrics output from secondary analyses - Alignment, GC bias, Insert Size, Coverage, and library complexity metrics. The *-pipeline_all_metrics_mqc.txt contains metrics from the All Metrics section of the MultiQC report found in BaseJumper. variant_calls*/ Biosample level output containing the variant calls in vcf format and index file for Haplotype and DNAScope variant caller. |
tertiary_analyses / variant_annotation / |
Contains the annotated vcfs for individual biosamples and as a multisample variants table in txt and hdf5 file format. |
execution_info / |
This section includes execution information regarding the pipeline run. |