BJ-WES

Background#

BJ-WES pipeline is a scalable and reproducible bioinformatics pipeline to process whole exome/targeted panel sequencing data. The pipeline currently only has added support for human sequencing data but can certainly be extended to other model systems. The pipeline takes raw sequencing data in form of fastq files and performs quality control assessments to evaluate the quality of the library build. The pipeline then aligns, removes duplicate reads, base calibrates the reads, and performs variant calling with haplotype caller and DNAScope caller. The pipeline uses DNAScope variant caller for all downstream analyses.

Currently pipeline supports following panels:

xGen Exome Hyb Panel v2
TruSight One, and
TWIST

Info

Contact BaseJumper support if you would like to see additional panels supported in the BaseJumper platform.

Pipeline Overview#

flowchart LR
%% Colors %%
classDef panel fill:transparent,stroke:#323232,stroke-dasharray:8
classDef panelt fill:transparent,stroke-opacity: 0
classDef black fill:#12294C,stroke:#12294C,stroke-width:2px,color:#fff
classDef blue fill:#20A4F3,stroke:#20A4F3,stroke-width:2px,color:#fff
classDef green fill:#3BCEAC,stroke:#3BCEAC,stroke-width:2px,color:#fff
classDef yellow fill:#ffd166,stroke:#ffd166,stroke-width:2px,color:#fff
classDef pink fill:#ef476f,stroke:#ef476f,stroke-width:2px,color:#fff
classDef orange fill:#f3722c,stroke:#f3722c,stroke-width:2px,color:#fff
classDef red fill:#BB4430,stroke:#BB4430,stroke-width:2px,color:#fff
classDef ming fill:#387780,stroke:#387780,stroke-width:2px,color:#fff
    Start((Start)):::black --fastq--> Alignment[Align Reads <br/> <br/> Remove Duplicates <br/> <br/> BQSR]
    subgraph Map
        Alignment:::green
    end
    subgraph Variant[Variant Calling]
        Alignment --> Haplotyper[Haplotyper]:::yellow
        Alignment --> DNAScope[DNAScope]:::yellow
    end
    subgraph Evaluate
        Alignment --> M_Sentieon[Alignment Metrics <br/><br/> GC Metrics <br/><br/> Insert Size Metrics <br/><br/> Coverage Metrics <br/><br/> Picard CollectHsMetrics]:::pink
        DNAScope --> vcfeval[Variant Evaluation]:::pink
        DNAScope --> Annotation[Variant Annotation]:::pink
    end
    subgraph Report
        vcfeval --> Mqc[MultiQC Report]:::orange
        Annotation --> Mqc
        M_Sentieon --> Mqc
    end
    Mqc --> End((End)):::black
    Map:::panel
    Variant:::panel
    Evaluate:::panel
    Report:::panel

Following are the steps and tools that pipeline uses to perform the analyses:

Map reads to reference genome using SENTIEON BWA MEM
Remove duplicate reads using SENTIEON DRIVER LOCUSCOLLECTOR and SENTIEON DRIVER DEDUP
Perform base quality score recalibration (BQSR) using SENTIEON DRIVER BQSR
Perform variant calling with HAPLOTYPER caller
Perform variant calling with DNAScope caller
Perform variant annotation with SNPEFF and annotate variants with COSMIC, ClinVar, and dbSNP databases
Evaluate metrics using SENTIEON DRIVER METRICS which includes Alignment, GC Bias, Insert Size, Coverage metrics, and Picard CollectHsMetrics
Evaluate variants with VCFEval to assess analytical performance (Only supported for HG001 samples)
Aggregate the metrics across biosamples and tools to create overall pipeline statistics summary using MULTIQC

Info

HG001 is a GIAB reference sample. More information is found here Genome in a Bottle

Pipeline Parameters#

Parameter Name	Options	Description
Exome/Targeted Panel	`xGen Exome Hyb Panel v2` _^(default) `TruSight One` `TWIST`	Reference genome to use for alignment
Genome	`GRCh38` _^(default)	Reference genome to use for alignment

Module Parameters#

Module	Parameter Name	Options	Description
Variant Annotation		_^(default)	Perform annotation of genic variants with dbSNP, ClinVar, and COSMIC databases.
Evaluate Variant Calling		_^(default)	Perform benchmarking on variant calling based on ground truth variants. Only for HG001/NA12878 samples.

Output files#

Output Directory/File	Notes
`multiqc`/	This section includes output files containing metrics from various tools to create a MultiQC report.
`secondary_analyses`/ `alignment`/ `metrics`/ `variant_calls*`/	`alignment/` Biosample level output containing aligned reads and index file. `metrics/` Metrics output from secondary analyses - Alignment, GC bias, Insert Size, Coverage, and library complexity metrics. The `-pipeline_all_metrics_mqc.txt` contains metrics from the `All Metrics` section of the MultiQC report found in BaseJumper. `variant_calls/`** Biosample level output containing the variant calls in vcf format and index file for Haplotype and DNAScope variant caller.
`tertiary_analyses`/ `variant_annotation`/	Contains the annotated vcfs for individual biosamples and as a multisample variants table in `txt` and `hdf5` file format.
`execution_info`/	This section includes execution information regarding the pipeline run.