6. Installing and using software

We recommend that you install and use the Conda package manager to install software on EOS HPC.

6.1. Why Conda?

The interesting thing about Conda is that it allows you to use separate environments for separate projects. If you have a project where you’ve installed a number of packages for Python or R you might not need to use them on another project, or you might need different versions of those packages: in this case, you can just create separate environments for them instead of installing and uninstalling multiple times. With separate environments you force yourself to make the dependencies for each project explicit which in turn makes it easier for collaborators to run your code and improves reproducibility.

Conda also provides access to thousands of packages used in data science and bioinformatics. These packages can be installed with a single command, so you don’t have to worry about compilers, dependencies, and where to put binaries. Conda is already available to every user of the EOS HPC.

6.2. Using Conda in your work

Conda (running on python 3.7), is already available on EOS as a module. All you have to do in order to use it, is to load the appropriate module

module load miniconda3/mc3-py37

you can also add this command to your .bashrc file, so it loads at every login, or use the following command:

module initadd miniconda3/mc3-py37

to let the modules do that for you.

A module has also been installed with Conda running on Python 3.10. You can activate it in a similar way by running:

module load miniconda3/mc3-py310

6.3. Configuring Conda Environments

Conda can install packages from different channels. This system is similar to repositories in other package managers. Here we’ll add a few channels that are commonly used in bioinformatics:

[myuser@headnode1]$ conda config --add channels defaults
[myuser@headnode1]$ conda config --add channels bioconda
[myuser@headnode1]$ conda config --add channels conda-forge

6.4. Searching for packages

You can easily search for Conda packages through the website anaconda.org or using the conda search command:

[myuser@headnode1]$ conda search rstudio

Remember that the Conda package may not be called in the same way as the exact official name of the software. For example, the Conda package for the tool biobambam2 is just called biobambam, so searching for biobambam2 would not return any results.

6.5. Using environments

Conda comes with a single environment known as the base environment. To activate the base environment, just type:

[myuser@headnode1]$ conda activate
(base) [myuser@headnode1]$

You now have access to the software installed in the base environment.

If we wanted to create a new environment with the newest version of PySAM, we should follow the steps below:

[myuser@headnode1]$ conda create --name amazing-project pysam
Solving environment: done

## Package Plan ##

  environment location: /Users/das/.conda/envs/amazing-project

  added / updated specs:
    - pysam
    - python=3


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pysam-0.15.1               |   py36h0380709_0         2.0 MB  bioconda
    bcftools-1.9               |       h4da6232_0         789 KB  bioconda
    samtools-1.9               |       h8ee4bcc_1         526 KB  bioconda
    setuptools-40.4.3          |           py36_0         556 KB
    certifi-2018.10.15         |           py36_0         138 KB
    libcurl-7.61.1             |       hf30b1f0_0         457 KB
    libffi-3.2.1               |                1          41 KB  bioconda
    htslib-1.9                 |       hc238db4_4         1.2 MB  bioconda
    curl-7.61.1                |       ha441bb4_0         135 KB
    wheel-0.32.2               |           py36_0          35 KB
    libdeflate-1.0             |       h470a237_0          44 KB  bioconda
    bzip2-1.0.6                |       h1de35cc_5         149 KB
    ------------------------------------------------------------
                                           Total:         6.0 MB

The following NEW packages will be INSTALLED:

    bcftools:        1.9-h4da6232_0          bioconda
    bzip2:           1.0.6-h1de35cc_5
    ca-certificates: 2018.03.07-0
    certifi:         2018.10.15-py36_0
    curl:            7.61.1-ha441bb4_0
    htslib:          1.9-hc238db4_4          bioconda
    libcurl:         7.61.1-hf30b1f0_0
    libcxx:          4.0.1-hcfea43d_1
    libcxxabi:       4.0.1-hcfea43d_1
    libdeflate:      1.0-h470a237_0          bioconda
    libedit:         3.1.20170329-hb402a30_2
    libffi:          3.2.1-1                 bioconda
    libssh2:         1.8.0-h322a93b_4
    ncurses:         6.1-h0a44026_0
    openssl:         1.0.2p-h1de35cc_0
    pip:             10.0.1-py36_0
    pysam:           0.15.1-py36h0380709_0   bioconda
    python:          3.6.6-hc167b69_0
    readline:        7.0-h1de35cc_5
    samtools:        1.9-h8ee4bcc_1          bioconda
    setuptools:      40.4.3-py36_0
    sqlite:          3.25.2-ha441bb4_0
    tk:              8.6.8-ha441bb4_0
    wheel:           0.32.2-py36_0
    xz:              5.2.4-h1de35cc_4
    zlib:            1.2.11-hf3cbc9b_2

Proceed ([y]/n)? y


Downloading and Extracting Packages
pysam-0.15.1         | 2.0 MB    | ################################## | 100%
bcftools-1.9         | 789 KB    | ################################## | 100%
samtools-1.9         | 526 KB    | ################################## | 100%
setuptools-40.4.3    | 556 KB    | ################################## | 100%
certifi-2018.10.15   | 138 KB    | ################################## | 100%
libcurl-7.61.1       | 457 KB    | ################################## | 100%
libffi-3.2.1         | 41 KB     | ################################## | 100%
htslib-1.9           | 1.2 MB    | ################################## | 100%
curl-7.61.1          | 135 KB    | ################################## | 100%
wheel-0.32.2         | 35 KB     | ################################## | 100%
libdeflate-1.0       | 44 KB     | ################################## | 100%
bzip2-1.0.6          | 149 KB    | ################################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

This gives us a clean environment with just the minimal number of packages necessary to support PySAM. To use the software that was installed in the environment, the environment needs to be activated first:

[myuser@headnode1]$ conda activate amazing-project
(amazing-project) [myuser@headnode1]$ python -c 'import pysam; print(pysam.__version__)'
0.6.0

You will notice that the prompt changed to show you that you’re now in the amazing-project environment.

Conda can install any kind of software, as long as its recipe (i.e. instructions) are available in the conda repositories we are using. This means that your entire setup can be installed through Conda (if all packages are available). For example, you can create an environment with Rstudio, R, and ggplot2 with a single command.

6.6. Command reference

To install software in the currently activated environment:

(amazing-project) [myuser@headnode1]$ conda install PACKAGE-NAME

To remove a software package from the currently activated environment:

(amazing-project) [myuser@headnode1]$ conda remove PACKAGE-NAME

To update a software package in the currently activated environment:

(amazing-project) [myuser@headnode1]$ conda update PACKAGE-NAME

Since Conda keeps track of what you are loading in the environment you created, it will tell you exactly which packages are used in the environment. This is very useful for collaborating with others, since your collaborators can create an exact copy of your environment with a single command.

To export your environment so that others can recreate it:

(amazing-project) [myuser@headnode1]$ conda env export > environment.yml

The environment.yml file contains an exact specification of your environment and the packages installed. You share this with other collaborators, and they will be able to recreate your environment by running:

[myuser@headnode1]$ conda env create -f environment.yml

You can read more about using environments for projects here. There’s also also a cheat sheet with Conda commands available.

6.7. I don’t think I can use Conda because…

6.7.1. A Conda package is not available

If building a custom Conda package is not possible, we recommend using a Singularity image instead.

6.7.2. I’m part of a project that specifies the software I should use

In this case the project should and probably will supply you for either a set of Conda packages or Singularity images. If not, most or all of the software will probably be available through Conda anyway, so you can still set up an environment with the software.

6.8. Using graphical interfaces

In order to use programs with a graphical user interface on EOS HPC you should activate X-forwarding, when connecting to the cluster.

You can use X-forwarding to tunnel individual graphical programs to your local desktop. This works well for many programs, but programs that do fancy graphics or anything animated might not work well.

On Linux you simply need to tell SSH that you wish to enable X-forwarding. To do this, add -X to the ssh command when logging in to the cluster, for example:

[local]$ ssh -X USERNAME@eos.unipv.it

Since macOS does not include an X server, you will need to download and install XQuartz on your computer. When installed, reboot the computer. Now, you just need to tell SSH that you wish to enable X-forwarding. To do this, add -X to the ssh command when logging in to the cluster, for example:

[local]$ ssh -X USERNAME@eos.unipv.it

On Windows, we recommend that you use MobaXterm which has an integrated X server.

6.9. Available Modules on the Cluster

Before installing something on your own environment, it is always worth checking what has been already installed for everyone on the HPC. This can be done with the following command:

[myuser@headnode1]$ module avail

Which will show the available modules. You can then activate a specific tool by using the following command:

[myuser@headnode1]$ module load NAME

Where NAME corresponds exactly to the name in the list generated with the previous command.