4.2 BAMDiagnostics
Estimating approximate depth, read length frequencies and mapping quality frequencies
BAMDiagnostics
provides a set of read statistics for the input BAM file while taking into account all standard input filters. The output are written to .txt
files that summarize the following information:
- Total number of reads
- Number of reads that passed filters
- Number of duplicate reads
- Average read length
- Maximum read length
- Number of proper pairs
- Average fragment length (only known for paired-end data)
- Total number of soft-clipped positions
- Average soft-clipped length
- Average aligned length
- Mean sequencing depth across the whole genome
- Average mapping quality
It also provides histograms which display the distributions of fragment lengths, mapping qualities, read lengths, soft-clipped lengths and aligned lengths. All of this data is written for all read groups combined, as well as for each read group separately.
4.2.1 Input
Required inputs :
--bam Input_bam_file.bam |
Input BAM file. |
Optional inputs :
none
Specific Parameters :
--diagnosticsPerChromosome |
To output data per chromosome into a *_diagnostics.txt diagnostics file. Default = Only per-read group summary statistics is provided (per chromosome summary statistics is provided). |
--splitMergeInput |
To create input file for splitMerge . Default = Will not create input file for splitMerge. |
--printReferenceLength |
To print reference lengths of chromosomes to file. Default = Will not print reference lengths of chromosomes to file. |
- See Filter parameters to apply specific filters for bases, reads and parsing window setting.
Engine parameters that are common to all tasks can be found here.
4.2.2 Output
*_filterSummary.txt | Filter summary for all read groups combined and individual read groups. |
*_fragmentLengthHistogram.txt | Counts for all fragment length for all read groups combined and individual read groups. |
*_mappingQualityHistogram.txt | Mapping quality counts for all read groups combined and individual read groups. |
*_readLengthHistogram.txt | Read length counts for all read groups combined and individual read groups. |
*_softClippedLengthHistogram.txt | Length of soft-clipped bases as counts for all read groups combined and individual read groups. |
*_alignedLengthHistogram.txt | Aligned length counts for all read groups combined and individual read groups |
*_diagnostics.txt | File containing per-read group summary statistics. Also contains per chromosome summary statistics is provided when --diagnosticsPerChromosome parameter is used. This file can be used as input file for the splitMerge task. |
4.2.3 Usage Example
#! /bin/bash
# `--fixedSeed = N` is needed to have reproducable results in regression test
. $(dirname $0)/find_atlas
. $(dirname $0)/simulate --type HW --F 0.1 --fixedSeed 0 \
--sampleSize 17 --chrLength 11111 --fracPoly 1.0 \
--alpha 2.0 --beta 2.0 --seqType single --seqCycles 101
for i in {1..17}; do
samtools view simulate_ind"$i".bam | head -250 | tail -10 | cut -f1 \
> blacklist_"$i".txt
u=$(echo "$i*5" | bc)
out="simple$i"
$atlas --task BAMDiagnostics --perChromosome --bam simulate_ind$i.bam \
--fixedSeed $i --out $out --logFile $out.out 2> $out.eout
out="complex$i"
$atlas --task BAMDiagnostics --identifyDuplicates --bam simulate_ind$i.bam \
--filterSoftClips --filterMQ 0,$u --blacklist blacklist_$i.txt \
--filterReadLength 0,$u --filterFragmentLength 0,$u \
--fixedSeed 1$i --out $out --logFile $out.out 2> $out.eout
done