4.2 BAMDiagnostics

Estimating approximate depth, read length frequencies and mapping quality frequencies

BAMDiagnostics provides a set of read statistics for the input BAM file while taking into account all standard input filters. The output are written to .txt files that summarize the following information:

  • Total number of reads
  • Number of reads that passed filters
  • Number of duplicate reads
  • Average read length
  • Maximum read length
  • Number of proper pairs
  • Average fragment length (only known for paired-end data)
  • Total number of soft-clipped positions
  • Average soft-clipped length
  • Average aligned length
  • Mean sequencing depth across the whole genome
  • Average mapping quality

It also provides histograms which display the distributions of fragment lengths, mapping qualities, read lengths, soft-clipped lengths and aligned lengths. All of this data is written for all read groups combined, as well as for each read group separately.

4.2.1 Input

Required inputs :

--bam Input_bam_file.bam Input BAM file.

Optional inputs :

  • none

Specific Parameters :

--diagnosticsPerChromosome To output data per chromosome into a *_diagnostics.txt diagnostics file. Default = Only per-read group summary statistics is provided (per chromosome summary statistics is provided).
--splitMergeInput To create input file for splitMerge. Default = Will not create input file for splitMerge.
--printReferenceLength To print reference lengths of chromosomes to file. Default = Will not print reference lengths of chromosomes to file.
  • See Filter parameters to apply specific filters for bases, reads and parsing window setting.

Engine parameters that are common to all tasks can be found here.

4.2.2 Output

*_filterSummary.txt Filter summary for all read groups combined and individual read groups.
*_fragmentLengthHistogram.txt Counts for all fragment length for all read groups combined and individual read groups.
*_mappingQualityHistogram.txt Mapping quality counts for all read groups combined and individual read groups.
*_readLengthHistogram.txt Read length counts for all read groups combined and individual read groups.
*_softClippedLengthHistogram.txt Length of soft-clipped bases as counts for all read groups combined and individual read groups.
*_alignedLengthHistogram.txt Aligned length counts for all read groups combined and individual read groups
*_diagnostics.txt File containing per-read group summary statistics. Also contains per chromosome summary statistics is provided when --diagnosticsPerChromosome parameter is used. This file can be used as input file for the splitMerge task.

4.2.3 Usage Example

#! /bin/bash

# Set atlas path
atlas=$(dirname "$0")/../build/atlas

# Simulate a BAM File
$atlas simulate --logFile simulate.out

# Create read statistics
$atlas BAMDiagnostics --bam ATLAS_simulations.bam --logFile BAMDiagnostics.out