4.3 downsample
Downsampling a BAM file by removing reads
downsample
creates downsampled BAM files that contain a specified percentage of the original number of reads. More than one percentage/probability can be specified, and thus allows the creation of several downsampled BAM files at a time. In this task, all reads are considered, even those that do not pass the usual SAM flag filters.
4.3.1 Input
Required inputs :
--bam Input_bam_file.bam |
Input bam file |
Optional inputs :
None
Specific parameters :
--prob numeric_value |
One value or a vector of percentages of reads to be kept in the downsampled BAM files.numeric_value must be between 0 and 1 (inclusive). It is possible to obtain replicates by adding the desired number of replicates in curly brackets {} after the concerned percentage. |
--separateReads or --writeN |
--separateReads will down sample by removing reads and --writeN will down sample by setting bases to N. Default = --writeN |
Optional parameters :
--outQual integer_1,integer_2 |
to constrain the quality scores to the indicated range (inclusive) when writing alignments. Default = uses the full range of quality scores when writing alignments. |
--writeBinnedQualities |
to write Illumina-binned quality scores. Default = writes raw quality scores. |
Engine parameters that are common to all tasks can be found here.
4.3.2 Output
*_separated_*Prob*.bam or *_downsampled_*Prob*.bam or | Downsampled BAM files(downsampled by removing reads) or Downsampled BAM files(downsampled by setting bases to N). |
*_separated_*Prob*.bam.bai or *_downsampled_*Prob*.bam.bai | Index files for downsampled BAM files (downsampled by removing reads) or Index files for downsampled BAM files (downsampled by setting bases to N). |
*_filterSummary.txt | .txt file with per readgroup general filter counts and all readgroups general filter counts. |
4.3.3 Usage Example
#! /bin/bash
. $(dirname $0)/find_atlas
. $(dirname $0)/simulate --fixedSeed 66
out="downsample_reads"
$atlas --task downsample --bam simulate.bam --prob 0.5,0.3,0.1 \
--fixedSeed 67 --out $out --logFile $out.out 2> $out.eout
out="downsample_separate"
$atlas --task downsample --separateReads \
--bam simulate.bam --prob 0.5,0.3,0.1 \
--fixedSeed 68 --out $out --logFile $out.out 2> $out.eout
out="downsample_bases"
$atlas --task downsample --downsampleBases \
--bam simulate.bam --prob 0.5,0.3,0.1 \
--fixedSeed 69 --out $out --logFile $out.out 2> $out.eout
The code above creates three downsampled BAM files, with 50%, 30% and 10% of the reads in the original bam file respectively.