There’s plenty of useful information on using the cluster or SLURM here, but we try to add some tidbits below that might be pertinent. Take all with a grain of salt.
There are two ways of approaching all the scripts and programs necessary. Keep them in your main directory in a /scripts
folder, or make copies of them in the project/directory you are working in. The last way allows for project specific changes without changing the original script.
With the large number of files that exist within these pipelines, it’s easy to get lazy, and then get confused about where/what files are. Following the same folder structure across sequencing runs and projects will be really helpful, as does naming conventions. Either way, pick something that works for you and stick with it. One possibility is as follows, but it’s by no means “the way.”
Make a Project Folder
Create a project folder for your organism or project. When naming files/folders don’t use spaces, and separate words using _
or CamelCase. Inside your project folder, you’ll want another folder named for the sequencing run (i.e., SOMMXXX
). Then create these subfolders inside:
/raw
(where raw sequencing files are linked/copied)/split
(where the unzipped and then split by barcode fastq live)
/split_out
(split by plate barcode)/fastq
(where the processed fastq/bams and analyses live)
/align
(the aligned bams)
/slurm_outs
(for all slurms)i.e., mine looks something like this: /home/projects/rapture/SOMM163/raw
Logging in to the server requires your username and access.
ssh -p 2022 USERNAME@agri.cse.ucdavis.edu
You can add this full command as a shortcut to your .bash_profile
doing something like the following, and then you only need to type “farm
” to login.
alias farm='ssh rapeek@farm.cse.ucdavis.edu'
To run a program or task on the server (Never on the head node!), you’ll use sbatch ____
or srun
. However there are a few things you’ll need to add:
-t
for time needed to run program, for 24 hours:
sbatch -t 24:00:00 ___
-p
for importance (low, med, high):
sbatch -t time -p high ___
sbatch --mem=16G -t time - p high ___
Although not always shown below, in our cluster, all sbatch
must have a time (-t
) flag or they will fail to run.
Cancelling tasks on the cluster can be done with:
scancel [JOBNUMBER]
To check whether sbatch is running, type smap -c | grep USERNAME
, or to see everything that is running on the server right now, type top
.
The output of most all server tasks is a “slurm” file. It has helpful information on the job, status, etc. The first place to check when troubleshooting is usually a slurm-____
file. They are numbered by job number, so the most recent file/job will have the highest number.
wc -l
the fastq filesls -l *.hash | wc -l
count files in a dirdu -hs * | sort -hs
wc -l rabo_all*mafs | sort -h | awk '{print $1}'
ls -lt *FILE*
srun
or viewing other tasks while working in a different screen.screen
ctrl
)
ctrl + a + d
screen -r
screen -d
exit
in current screenscreen -ls
screen -r
[number]Ctrl a + ?
) or Key BindingWhile there are many available, we tend to prefer VIM. There’s a whole bunch of resources online (see this webpage) A few tips regarding VIM:
:q!
:wq
:%s:
:%s:AMER:../AMER:
:7,12:
for lines 7 through 12sed
command within vim to find and replace, can use:
shift + :
, then 1, $s/TGCAGG$/TGCAGG.sort.flt.bam/g
(search for line ending with TGCAGG
)For column-wise editing:
Ctrl+v
, then mark across the column you want to edit.Shift+i
to insert text at the beginning of the column,Shift+a
to append text,r
to replace highlighted text, d
to delete, c
to change… etc.