Logging in to Griz

Griz has a login node to interact with the cluster, either by ssh or FTP:

	Login node:	login.gscc.umt.edu

This node can be used to navigate the file system and submit jobs. Do not submit jobs to this node.

While the login node is available both on and off-campus, other resources may require you to be on-campus. For off-campus access, be sure to read up on UM's VPN resource:

SSH logins

To log in to Griz with a command line interface to navigate the file system and submit jobs, you will need an SSH client. On MacOS, Linux, and newer Windows installs an SSH client is pre-installed. Simply open your terminal and type:

ssh [NetID as username]@[server name]

Accept any fingerprint prompts and enter your password when prompted and you should be good to go!

Explicitly, if I wanted to log in to the login node, I would type:

ssh gt156213e@login.gscc.umt.edu

Recommendation

Though newer versions of Windows 10 have an SSH client pre-installed, I still prefer the PuTTY SSH client:

FTP logins

For drag-and-drop file transfers, an FTP client may be preferred. I recommend FileZilla for MacOS and Linux and WinSCP for Windows.

Usage is similar regardless of FTP client chosen. Simply find in the GUI where you can enter the username and server address for the server you want to login to and connect. Enter your password when prompted. Then you should be able to drag and drop files between your local computer and the remote server via the FTP GUI.

Recommendation

For both SSH and FTP logins, consider setting up public-key authentication to login. This can provide enhanced security and ease of access compared to logging in with a password. See the following for in depth instructions:

Remote Desktop logins

Remote desktop logins can also be setup. Please contact one of the administrators to get it set up. I recommend the remote desktop software called X2Go:

Using X2Go, you will be able to login to one of the nodes and be able to interact with the filesystem with your mouse and keyboard as if you were sitting at the remote node with a monitor. You can even interact with the SLURM job management software from the remote desktop and open a terminal from within the remote desktop.

This remote desktop software will prove useful in many situations. However, while some of you may be more comfortable interacting with the cluster in this fashion, I highly recommend logging in through a command line interface with SSH!

Using screen to manage multiple command line interfaces

Programs like screen and tmux are bash programs that allow you to start new command line interface sessions while already logged in, essentially like opening a new tab in a browser. I have more experience with screen so that is what I will discuss here. The advantages of this are enormous. First, you can have multiple screens, allowing you to switch between different bash sessions without logging in to the server with multiple terminals. Most importantly, jobs that you start within a screen keep runnning even after you have logged out! Screens are also accessible wherever you log in. So, you could start a job in a screen session from lab, log out, then go home and login from there, start the screen session, and see the progress of the job. Screen has been one of the most helpful tools for me to manage multiple jobs from many locations.

Below I will go through some basics, but here are some extensive screen docs:

Starting a screen

To start a screen, simply type:

screen

You are now attached to a screen. This will clear your terminal window and open a new session within the same window you are using. Don't worry, your original login session is still there, just "minimized," for lack of a better term.

Note that some screen instances may bring up a welcome message rather than taking you directly to a blank command prompt. Simply hit enter to dismiss this message.

Detach from a screen

To detach (or minimize) from the current screen session and return to your original login session, enter the following key combination:

ctrl+a+d

Your screen and any jobs running in it are still there! They have just been detached.

Many screen commands start with the keystroke ctrl+a.

Attaching to a screen session

To attach to or resume a screen session from which you have previously deattached, type

screen -r

Managing multiple screen sessions

There may be cases where you want to run multiple commands in the background on different screens. In this case, you can have multiple screen sessions open. While in your login session (i.e. while not attached to any screens), you can run the screen command as many times as you want to start separate screen sessions. This is like opening many tabs in your browser.

Important!

Try not to start a new screen session while attached to a previous screen session! This can get confusing very quickly.

To view the status of your current screen sessions, type:

screen -ls

This should produce something like the following:

	gregg_thomas@ecae4883a231:~$ screen -ls
	There are screens on:
			3793335.pts-9.ecae4883a231      (03/18/20 17:36:45)     (Detached)
			2016601.mq      (02/05/20 18:37:44)     (Detached)
			3968059.fastq   (01/25/20 20:16:16)     (Detached)
	3 Sockets in /var/run/screen/S-gregg_thomas.

Note that every screen has an ID number and a name in the format [id number.name]. By default, the names given to screens are not human readable. To change the name of a screen session, you can run the following:

screen -S [id number] -X sessionname [desired name]

For instance, if I want to rename the first screen listed above to something more meaningful, I could run:

screen -S 3793335 -X sessionname test

Which results in the following when running screen -ls:

	gregg_thomas@ecae4883a231:~$ screen -ls
	There are screens on:
			3793335.test    (03/18/20 17:36:46)     (Detached)
			2016601.mq      (02/05/20 18:37:45)     (Detached)
			3968059.fastq   (01/25/20 20:16:17)     (Detached)
	3 Sockets in /var/run/screen/S-gregg_thomas.

Much better!

With multiple screens running and you want to attach to one of them, you can specify the name of that screen when resuming:

screen -r test

When you are attached to a screen and wish to switch to another one, simply exit the current screen with ctrl+a+d and then attach to the desired screen.

Important!

If you log out or your connection is interrupted while still attached to a screen, you may be unable to resume it when you log back in. In this case, to resume this screen, add the detach flag to the resume command:

screen -r -d [screen name]

Using parallel to speed up your workflows

Many times in bioinformatic workflows, we may find ourselves running the same program on tens to thousands of samples (i.e. building gene trees, mapping reads from many individuals, etc.). Even worse, sometimes the programs we are running can be very slow and are not multi-threaded. In this case, it is up to us to speed up the workflow.

GNU parallel can be a big help in these situations.

parallel is an extremely versatile program, but I will go over the easiest way to run multiple commands in parallel.

01. Writing a python script to generate commands

Lets say I have thousands of alignment files from which I want to make gene trees. I have all of my alignments in the same directory, and want to run RAxML on all of them. I would first write a simple python script that generates and prints the RAxML command for each alignment file. Here is an example of a basic command generator script for this purpose:

	############################################################
	# This script reads all alignment files in an input directory
	# and generates a basic RAxML commands for each.
	############################################################

	import sys, os, datetime, random

	############################################################

	print("#!/bin/bash")
	# I like to print the bash shebang line so the output can be read as a
	# bash script to run commands one at a time if necessary

	print("# Example command generator")
	# A descriptive message about what these commands are

	indir = "/projects/project/aligns/"
	outdir = "/projects/project/gene-trees/"
	# Specify both the location of the input files and the desired
	# location for the output files.

	print("# PYTHON VERSION: " + ".".join(map(str, sys.version_info[:3])))
	print("# Script call: " + " ".join(sys.argv))
	print("# Runtime: " + datetime.datetime.now().strftime("%m/%d/%Y %H:%M:%S"))
	print("# Input directory: " + indir)
	print("# Output directory: " + outdir)
	print("# ----------");
	# Always good to include some runtime info for records

	aa_model = "PROTGAMMAJTTF"
	# The RAxML sequence evolution model for all runs.

	for f in os.listdir(indir):
		if not f.endswith("-aln.fa"):
			continue;

		input_file = os.path.join(indir, f)
		# Input file.

		aln_name = os.path.splitext(f)[0]
		# Get the filename without the extension as the alignment name.

		output_directory = os.path.join(outdir, aln_name)
		# Output directory.

		if not os.path.isdir(output_directory):
			os.system("mkdir " + output_directory)
		# If the output directory doesn't exist, create it.

		seed = str(random.randint(1000000,999999999))
		# Generate the starting seed for RAxML.

		####
		raxml_cmd = "raxml -f a" 
		# The RAxML command and the analysis to run.

		raxml_cmd += " -m " + aa_model
		# The sequence evolution model.

		raxml_cmd += " -p " + seed
		# The current starting seed.

		raxml_cmd += " -s " + input_file
		# The current input sequence file.

		raxml_cmd += " -n " + aln_name
		# The name of the job for RAxML.

		raxml_cmd += " -w " + output_directory
		# The location to which RAxML will write output files.
		# These lines build up the RAxML command. Include whatever options you need.
		# This can all be done on one line, but I broke it up for clarity.
		####

		print(raxml_cmd)

Run this script and redirect the output to a file to generate a shell script that can execute all of these commands:

python example_cmd_generator.py > raxml_cmds.sh

Now you have a shell script to run these commands and, importantly, you can use parallel to run them in parallel!

Note that you can use this sort of script for any type of command you have to run multiple times, not just RAxML. Say I had many samples that I needed to map with BWA, I could simply replace the steps constructing the RAxML command with those appropriate for BWA.

02. Executing commands in a file in parallel with parallel

With a file containing many commands, one per line, like the one we generated above, we can execute those commands in parallel very easily with:

parallel -j 20 < raxml_cmds.sh

Easy! -j 20 indicates we want to use 20 threads to run 20 commands at a time. Change this to whatever number you want/however many threads are available. < [filename] means we are feeding the lines from that file into the given command

Important!

Currently the Griz head and login nodes do not have server-wide installs of GNU parallel, but the compute nodes do, so you will be able to run jobs that call parallel. To debug these workflows, you could download and install GNU parallel in your home directory, install GNU Parallel from conda (recommended), or login to a compute node interactively (see: Running Jobs.)

Important!

When submitting jobs to a Griz node that use parallel, the number of tasks should match the number of parallel jobs (-j), not the number of cpus.