Reference Assembly

Align Individuals to Existing Loci (reference)

If you aready of a reference genome or assembly, you can skip the de Novo steps and do as follows:

Build index of contigs: bwa index final_contigs_300.fasta OR A_model_genome.fasta
- Files will now end in .amb, .ann, etc.
Using all fastq build a list of the fastq RA and RB files into different columns:
- ls *RA* | sed "s/\.fastq//g" > bamlistA
- ls *RB* | sed "s/\.fastq//g" > bamlistB
- paste bamlist? > bamlist
Obtain the run_align.sh script For example copy cp ~millermr/common/run_align.sh ./
- sh run_align.sh bamlist final_contigs_300.fasta
- It’s taking all fastq files (RA and RB) and aligning to the _300.fasta file (or model genome file)
This should create your final bam files

Node Failures and Missing Bams

Sometimes the nodes fail, and you need to find/re-run scripts for certain files. Best option is to find/sort the files that did work, and then grep against the files that didn’t to make a new list, then re-run run_align.sh script with new list.

ls *.flt.bam | sed "s/\.sort\.flt\.bam//g" | grep -f - -v bamlist > bamlist_rerun