An Introduction to Linux Directories and the PATH Variable

To effectively manage your compute environment in the cloud, you need to understand the Linux directory structure, as well as key environment variables such as the PATH variable. This blog post is here to teach you the basics.

An Introduction to Linux Directories and the PATH Variable

To effectively manage your compute environment in the cloud, you need to understand the Linux directory structure, as well as key environment variables such as the PATH variable. Knowing the default locations for certain files or software, and how your machine locates software using the PATH environment variable can help you troubleshoot many common errors. This blog post will introduce you to the Linux directory structure and discuss the importance of the PATH variable.

The Linux Directory Structure

Linux file systems have a standard structure that is defined by the Filesystem Hierarchy Standard (FHS), as well as some other additions. We will go through this structure and the main features of each directory below.

Linux directory tree showing the structure of common directories. Colors represent usage categories as described in the sections below.

The Root Directory: /

The root directory / is the top-level directory, meaning everything on your Linux system is located underneath this directory.  This is why you often see paths and filenames that start with /; this is an unambiguous way of specifying a file location.  If you navigate to this directory  (cd / ) and list the contents  (ls ) you will see all of the standard directories as follows:

ubuntu:@ip-10-0-X-XYZ/$ ls
bin		lib32		proc		sys
boot		lib64		root		tmp
dev		lost+found	run		usr
etc		media		sbin		var
home		mnt		snap
lib		opt		srv

We will now have a closer look at each directory and what each is used for.

User Data Directories

Users often store "user data", such as analysis inputs or outputs, or program source packages and other files that have been downloaded from the internet, in their home directory. Alternatively, users may store data on a separate storage drive which they attach to a machine, particularly if they need more space or if the data need to be moved from one machine to another.

/home - Users home directory

Contains a home folder for each user, e.g. /home/ubuntu. The user home directory is the default directory that the user will start in upon login/connection to the machine.

/root - Root home directory

The home directory for the root user (instead of /home/root). See this blog post for more information about the root user.

/mnt - Mount directory for storage volumes

A standard location to mount file systems of attached storage volumes (though these can be mounted anywhere on the system if desired). For example, storage volumes that are added during machine creation in RONIN are automatically mounted in /mnt (e.g., /mnt/volume1).

Software Directories

Many standard Linux directories contain software that is either required by the operating system, or primarily used by users. Each Linux distribution contains a variety of standard, base software packages that are automatically included on every system. Additional software can be installed by the user in the default software directories via package managers such as apt for Debian-based distributions like Ubuntu, YUM for Red Hat Enterprise Linux-based distributions, or snap, an app store for Linux.

/bin - Essential user executables

All the executable programs (executable files) required during booting and standard operation, including many important, basic commands,  e.g. cat, wc. This directory is called /bin, short for "binary", because historically most of the programs here have been compiled into binary executables, although now you can find readable scripts here.

/sbin - System administration executables

Contains executable programs, required by whoever administrates the computer, for maintenance.  Note that many of these commands must be executed using sudo, and may not even be in your PATH unless you are root.

/lib - Essential libraries

Contains libraries (pre-compiled pieces of code that can be reused by software) needed by the executables in the /bin and /sbin folder.

/usr - User executables and data

Contains executables and files used by users, as opposed to excutables and files used by the system. For example, non-essential applications (such as your favorite editor) are located inside the /usr/bin directory instead of the /bin directory, and non-essential system administration binaries are located in the /usr/sbin directory instead of the /sbin directory. Libraries for each are located inside the /usr/lib directory. The /usr/local directory is designed to be a place where users can install their own software outside the distribution's provided software without worrying about overwriting any distribution files.

/opt - Optional packages

Contains subdirectories for optional software packages. This directory is used by proprietary/third party application software that doesn't obey the standard file system hierarchy. With package managers to install software it is relatively rare to see heavy use of /opt.

/snap - Snap packages

Files and folders from installed snap packages.

Note: Computing clusters created with RONIN often have  additional software and user data directories such as the /apps and /shared directories. These directories are mounted by each compute node in the cluster, so they can access a single copy of programs and data. For more information, refer to this blog post.

Other System Directories

These directories contain information that is relevant for running and configuring the Linux system and user applications. With the exception of /tmp and possibly /var, you won't typically need to venture into these directories, but it is still helpful to know what they are for:

/boot - Boot files

Files required when booting the system.

/dev - Device files

Files used to represent hardware devices on the system, e.g. the root drive and attached storage drives.

/etc - Configuration files

System-wide application configuration files, e.g. the startup, shutdown, start, stop script for programs that manage things about the system in the background.

/lost+found - Recovered files

Any files that are recovered by the fsck utility after damage to the file system.

/media - Removable media

Subdirectories created for removable media devices, e.g. cdrom.

/proc - Kernel and process files

Contains information on the system and running processes.

/run - Application run files

Transient file storage for application runtime files.

/srv - Service data

Data for services provided by the system, e.g. website files.

/sys - sysfs filesystem

A virtual filesystem, which stores and allows modification of the devices connected to the system.

/tmp - Temporary files

Stores temporary files for the users or system. Files are either deleted by system utilities or upon reboot of the machine. This directory is a useful places for you to write small temporary files as you create your scripts. This is a good habit to get into; it is easy to find these to delete them, and you will be sure that these directories are never shared across compute nodes.

/var - Variable data files

Writable counterpart to the /usr directory. It includes spool directories and files, administrative and logging data, and transient and temporary files (such as those in/usr/tmp). This is a good place to look if you are trying to find system logging information if something is going wrong with your machine. For example, /var/log on the head node of a cluster contains information about any failures in cluster configuration.

Linux Environment Variables

Now that you understand the Linux file system a bit better, it's also important to know that there are a number of default environment variables that are used by the system.

To view all of the environment variables that are set on your system, run the command printenv

Some common environment variables include:

USER = The current logged in user

HOME = The home directory of the current user

PATH = A list of directories to be searched when executing commands. When you run a command without specifying a path, the system will search those directories in this order and use the first found executable.

The $PATH Variable

The PATH variable is one of the most important environment variables to know about when managing your own software in the cloud, but what exactly does this variable do?

Well, whenever you need to specify a particular file or directory in a command, you often use an absolute path (i.e. the complete path starting from the root directory e.g. /home/ubuntu/inputs/myfile.txt) or a relative path (i.e. a path starting from the current directory). These paths specify the location of the file or directory so that the command can find it.

To execute a command, we can also call upon software executables. We can tell the system what command to execute in the same way, by providing a path, but this is a lot of typing! For example, imagine having to type/usr/bin/cd every time you wanted to change directories, instead of just cd. The PATH environment variable overcomes this issue by providing a list of directories that contain program executables you may wish to execute.

By default, the path variable is always set to include some standard directories  described in the section above such as /bin, /sbin, /usr/bin, /usr/sbin and /usr/local/bin. You can see what directories are listed in your PATH variable by running echo $PATH (each directory is separated by a :). You can also view the exact path to any command or executable that is within your PATH using the which command. For example, to find where the grep executable is located, use which grep.

Note:  If you have previously worked with shared computing clusters, you may be used to the module load command when wanting to use a particular program. This command works by modifying your PATH and other environment variables (if required) so that your program is ready for you to use.

If you have installed a new program on your machine but you are getting a "command not found" error (or if the which command cannot find your executable), it is likely that your new software is located somewhere outside those directories listed in your current PATH. So, to ensure your new program can be found without always having to specify the path to the program, you can either move or copy the program executable to a directory that is already listed in your path variable, OR add a new directory to your PATH that contains the program.

There are two ways to add a new directory to your PATH. The first is a temporary solution where you simply re-export the PATH variable with the added directory included. For example, if you wanted to add the directory /home/ubuntu/myprogram/bin to the END of your PATH, you would run: export PATH=$PATH:/home/ubuntu/myprogram/bin. Alternatively, to add the directory to the START of your PATH, you would run: export PATH=/home/ubuntu/myprogram/bin:$PATH (see Note about directory order below). This method will only change your PATH in the current session so the changes will not be saved if you open another session. If you want the changes to be made permanently, you will need to add the same export command to your ~/.bashrc file. That way the export command will be run whenever you launch a new session.

Note: The order of directories in your PATH is important because if the same program (or executable with the same name) is found in two different directories, the one that is found first in your path will be used. So keep this in mind when adding new directories to your path to determine where they should sit in the list of directories in your PATH. It is also important to know that each user will have their own PATH variable, so the root user may not search through the same directories as the standard user.

With your newfound knowledge of the Linux file system, and the highly important PATH variable, you should now feel more confident in managing your compute environment in the cloud. Your Linux machine is no longer a black box of directories with magical commands that appear out of nowhere.

Now go and deal with those pesky "command not found" errors!