4.3 downsample

Downsampling a BAM file by removing reads

downsample creates downsampled BAM files that contain a specified percentage of the original number of reads. More than one percentage/probability can be specified, and thus allows the creation of several downsampled BAM files at a time. In this task, all reads are considered, even those that do not pass the usual SAM flag filters.

4.3.1 Input

Required inputs :

--bam Input_bam_file.bam Input bam file

Optional inputs :

  • None

Specific parameters :

--prob numeric_value One value or a vector of percentages of reads to be kept in the downsampled BAM files.numeric_value must be between 0 and 1 (inclusive). It is possible to obtain replicates by adding the desired number of replicates in curly brackets {} after the concerned percentage.
--separateReads or --writeN --separateReads will down sample by removing reads and --writeN will down sample by setting bases to N. Default = --writeN

Optional parameters :

--outQual integer_1,integer_2 to constrain the quality scores to the indicated range (inclusive) when writing alignments. Default = uses the full range of quality scores when writing alignments.
--writeBinnedQualities to write Illumina-binned quality scores. Default = writes raw quality scores.

Engine parameters that are common to all tasks can be found here.

4.3.2 Output

*_separated_*Prob*.bam or *_downsampled_*Prob*.bam or Downsampled BAM files(downsampled by removing reads) or Downsampled BAM files(downsampled by setting bases to N).
*_separated_*Prob*.bam.bai or *_downsampled_*Prob*.bam.bai Index files for downsampled BAM files (downsampled by removing reads) or Index files for downsampled BAM files (downsampled by setting bases to N).
*_filterSummary.txt .txt file with per readgroup general filter counts and all readgroups general filter counts.

4.3.3 Usage Example

#! /bin/bash

. $(dirname $0)/find_atlas
. $(dirname $0)/simulate --fixedSeed 66

out="downsample_reads"
$atlas --task downsample --bam simulate.bam --prob 0.5,0.3,0.1 \
       --fixedSeed 67 --out $out --logFile $out.out 2> $out.eout

out="downsample_separate"
$atlas --task downsample --separateReads \
       --bam simulate.bam --prob 0.5,0.3,0.1 \
       --fixedSeed 68 --out $out --logFile $out.out 2> $out.eout

out="downsample_bases"
$atlas --task downsample --downsampleBases \
       --bam simulate.bam --prob 0.5,0.3,0.1 \
       --fixedSeed 69 --out $out --logFile $out.out 2> $out.eout

The code above creates three downsampled BAM files, with 50%, 30% and 10% of the reads in the original bam file respectively.