If you have multiple sequencing runs, you’ll want to merge your reads into one single set of data for analysis.
We need an alignment contig, the fastq files, and a modified version of the run_align.sh script. - cp final_contigs_300.fasta ~/projects/rana_rapture/fastq/ - Get ~/scripts/01a_run_align_flt.sh or ~/scripts/01b_run_sortbam_flt.sh
bam lists from Newest Sequence DataCreate a bamlist of the A and B .fastq files.
ls *RA* | sed "s/\.fastq//g" > bamlistAls *RB* | sed "s/\.fastq//g" > bamlistBpaste bamlist? > bamlistCreate the index for bwa to work from alignment (if not already completed) - bwa index -a is final_contigs_300.fasta
Time: (~3-4 hours)
Because we are merging files, we need to do the filtering and merge BEFORE we remove dups… so need to modify the run_align.sh script. The 01a version essentially just stops before running the remove duplicates step.
Run the script:
sh 01a_run_align_flt.sh bamlist final_contigs_300.fasta
#!/bin/bash -l
list=$1
ref=$2
wc=$(wc -l ${list} | awk '{print $1}')
x=1
while [ $x -le $wc ]
do
string="sed -n ${x}p ${list}"
str=$($string)
var=$(echo $str | awk -F"\t" '{print $1, $2}')
set -- $var
c1=$1
c2=$2
echo "#!/bin/bash -l
#SBATCH -o slurm_outs/01a_align_flt-%j.out
#SBATCH -J alignflt
bwa mem $ref ${c1}.fastq ${c2}.fastq | samtools view -Sb - | samtools sort - ${c1}.sort
samtools view -f 0x2 -b ${c1}.sort.bam > ${c1}.sort.flt1.bam" > ${c1}.sh
# | samtools rmdup - ${c1}.sort.flt.bam" > ${c1}.sh
sbatch -t 24:00:00 -p med --mem=4G ${c1}.sh
sleep 2
x=$(( $x + 1 ))
done
NOTE: (If already aligned)
If you’ve already aligned previously, you can use the 01b script to pick up the sorted bamfiles and just filter, then use that output for the merge. (use script: 01b_run_sortbam_flt.sh). Notice you don’t need the contig alignment here.
sh 01b_run_sortbam_flt.sh bamlist.flt
#!/bin/bash -l
list=$1
#ref=$2
wc=$(wc -l ${list} | awk '{print $1}')
x=1
while [ $x -le $wc ]
do
string="sed -n ${x}p ${list}"
str=$($string)
var=$(echo $str | awk -F"\t" '{print $1}')
set -- $var
c1=$1
#c2=$2
echo "#!/bin/bash -l
#SBATCH -o slurm_outs/01b_sortbamflt-%j.out
#SBATCH -J sortbamflt
samtools view -f 0x2 -b ${c1} > ${c1}.sort.flt1.bam" > ${c1}.sh
# | samtools rmdup - ${c1}.sort.flt.bam" > ${c1}.sh
sbatch -t 24:00:00 -p med --mem=2G ${c1}.sh
sleep 5
x=$(( $x + 1 ))
doneIf there are node failures, you’ll need to use du -hs to verify.
du -hs *flt2.bam | sort -hs | head -n50