Installing software

I think this is still being figured out. Likely, the recommended method will be to have a local Anaconda installation, from which you can install multiple versions of bioinformatic software from bioconda.

Installing Anaconda

To install Anaconda, download the latest version of the installer at the following link.

Important!

When downloading Anaconda to install on Griz, make sure you select "Linux" at the top. The website tries to guess what you want to install based on your current OS, which may be different that the desired OS on the cluster.

Previous Anaconda versions can be found in their archive, and official install docs can be found here.

Briefly, once the install file is in the location you wish to install Anaconda, you will simply type:

bash [install file].sh

Follow the prompts to complete installation.

Anaconda environments

After installing Anaconda you will have to create and manage Anaconda envrionments. Envrionments can be powerful because they let you easily install and manage software locally. For instance, you could set up a main environment that is kept up to date with the latest versions of critical software, such as samtools, GATK, IQ-Tree, etc. But if you have a specific workflow that is dependent on an old version of, say, GATK, you could set up another enviornment with that version.

Here are the official docs for managing conda enviornments:

Below I will run through some of the basics.

01. Starting Anaconda

First, everytime you login, you will have to start Anaconda:

source ~/anaconda3/bin/activate

02. Creating an environment

Your command prompt should now have the (base) prefix, indicating you are in the base Anaconda environment. Next you will need to create a new environment. You should name your environment something descriptive like "gatk3-env" or "biomain". For this example, I will call my main environment "biotools". To create the "biotools" environment type:

conda create --name biotools

Some information will print to the screen, and you will be asked to confirm creation of the environment. Confirm and then your environment should be created! The command prompt prefix should now be the name of your environment (biotools in this example).

In the event that you encounter Permission denied or similar errors when creating an environment, you may provide the full path to the environment you want to create as the name:

conda create --name /path/to/anaconda3/envs/biotools

03. Starting an environment

To start a particular enviornment, be sure Anaconda is running ((base) is in your prompt), and run:

conda activate [env name]

The command prompt prefix should now be the name of your environment and all the software installed in that environment should be available to you.

Recommendation

I recommend adding the following commands to your .bash_profile file so they are run automatically everytime you log in.

source ~/anaconda3/bin/activate
conda activate [env name]

04. Installing software in an environment

To install software in this environment, search for the package you want on bioconda and run the appropriate command. For example, to install samtools:

conda install -c bioconda samtools

Install as much software as you like! It should all be self-contained within this environment.

Note that if you need a specific version of a package installed, you can specify the version number when installing as follows:

conda install -c bioconda samtools=1.9

05. Exiting an environment

When you are in an environment and wish to exit it, simply type:

conda deactivate

If you are in an environment, this will take you to the (base) Anaconda state. If you are in (base), it will exit Anaconda.

The module system and server-wide installs

There is a module system on Griz, but it seems like this won't be used. If you request something to be installed, instead of server-wide installs, Griz is likely to follow the Carnation protocol. This means you email the admins with your request, and they will set up a separate conda environment specifically for it. This is less than optimal for our purposes.

Building from source or installing binaries yourself

It is of course still possible to install software yourself locally if you wish to forego Anaconda, but there's no way to guarantee that all dependencies will be installed for a given piece of software.

Installing & Using R

One of the administrators has a self-compiled version of R that has been optimized somehow. He recommends using this for now by running:

source /share/apps/R-3.6.1/runNewR.sh

However, I am uncertain how this will be maintained in the future. R is available as a module, but again I don't know how that will be maintained. Our best solution may be to install R ourselves. Fortunately, this is easy with Anaconda. Be sure you are in the environment in which you want to run R and type:

conda install -c bioconda r

Package installations may be a bit slower and show some bioconda messages/warnings, but they should still work. Let me know if you run into any problems!

Recommendation

It may be helpful to install R in a separate conda environment from your main one.

source ~/anaconda3/bin/activate
conda activate [env name]