An Introduction to Linux Directories and the PATH Variable
To effectively manage your compute environment in the cloud, you need to understand the Linux directory structure, as well as key environment variables such as the PATH variable. This blog post is here to teach you the basics.
To effectively manage your compute environment in the cloud, you need to understand the Linux directory structure, as well as key environment variables such as the PATH
variable. Knowing the default locations for certain files or software, and how your machine locates software using the PATH
environment variable can help you troubleshoot many common errors. This blog post will introduce you to the Linux directory structure and discuss the importance of the PATH
variable.
The Linux Directory Structure
Linux file systems have a standard structure that is defined by the Filesystem Hierarchy Standard (FHS), as well as some other additions. We will go through this structure and the main features of each directory below.
The Root Directory: /
The root directory /
is the top-level directory, meaning everything on your Linux system is located underneath this directory. This is why you often see paths and filenames that start with /
; this is an unambiguous way of specifying a file location. If you navigate to this directory (cd /
) and list the contents (ls
) you will see all of the standard directories as follows:
ubuntu:@ip-10-0-X-XYZ/$ ls
bin lib32 proc sys
boot lib64 root tmp
dev lost+found run usr
etc media sbin var
home mnt snap
lib opt srv
We will now have a closer look at each directory and what each is used for.
User Data Directories
Users often store "user data", such as analysis inputs or outputs, or program source packages and other files that have been downloaded from the internet, in their home directory. Alternatively, users may store data on a separate storage drive which they attach to a machine, particularly if they need more space or if the data need to be moved from one machine to another.
/home
- Users home directory
Contains a home folder for each user, e.g. /home/ubuntu
. The user home directory is the default directory that the user will start in upon login/connection to the machine.
/root
- Root home directory
The home directory for the root user (instead of /home/root
). See this blog post for more information about the root user.
/mnt
- Mount directory for storage volumes
A standard location to mount file systems of attached storage volumes (though these can be mounted anywhere on the system if desired). For example, storage volumes that are added during machine creation in RONIN are automatically mounted in /mnt
(e.g., /mnt/volume1
).
Software Directories
Many standard Linux directories contain software that is either required by the operating system, or primarily used by users. Each Linux distribution contains a variety of standard, base software packages that are automatically included on every system. Additional software can be installed by the user in the default software directories via package managers such as apt for Debian-based distributions like Ubuntu, YUM for Red Hat Enterprise Linux-based distributions, or snap, an app store for Linux.
/bin
- Essential user executables
All the executable programs (executable files) required during booting and standard operation, including many important, basic commands, e.g. cat
, wc
. This directory is called /bin
, short for "binary", because historically most of the programs here have been compiled into binary executables, although now you can find readable scripts here.
/sbin
- System administration executables
Contains executable programs, required by whoever administrates the computer, for maintenance. Note that many of these commands must be executed using sudo
, and may not even be in your PATH unless you are root
.
/lib
- Essential libraries
Contains libraries (pre-compiled pieces of code that can be reused by software) needed by the executables in the /bin
and /sbin
folder.
/usr
- User executables and data
Contains executables and files used by users, as opposed to excutables and files used by the system. For example, non-essential applications (such as your favorite editor) are located inside the /usr/bin
directory instead of the /bin
directory, and non-essential system administration binaries are located in the /usr/sbin
directory instead of the /sbin
directory. Libraries for each are located inside the /usr/lib
directory. The /usr/local
directory is designed to be a place where users can install their own software outside the distribution's provided software without worrying about overwriting any distribution files.
/opt
- Optional packages
Contains subdirectories for optional software packages. This directory is used by proprietary/third party application software that doesn't obey the standard file system hierarchy. With package managers to install software it is relatively rare to see heavy use of /opt
.
/snap
- Snap packages
Files and folders from installed snap packages.
Note: Computing clusters created with RONIN often have additional software and user data directories such as the/apps
and/shared
directories. These directories are mounted by each compute node in the cluster, so they can access a single copy of programs and data. For more information, refer to this blog post.
Other System Directories
These directories contain information that is relevant for running and configuring the Linux system and user applications. With the exception of /tmp
and possibly /var
, you won't typically need to venture into these directories, but it is still helpful to know what they are for:
/boot
- Boot files
Files required when booting the system.
/dev
- Device files
Files used to represent hardware devices on the system, e.g. the root drive and attached storage drives.
/etc
- Configuration files
System-wide application configuration files, e.g. the startup, shutdown, start, stop script for programs that manage things about the system in the background.
/lost+found
- Recovered files
Any files that are recovered by the fsck
utility after damage to the file system.
/media
- Removable media
Subdirectories created for removable media devices, e.g. cdrom.
/proc
- Kernel and process files
Contains information on the system and running processes.
/run
- Application run files
Transient file storage for application runtime files.
/srv
- Service data
Data for services provided by the system, e.g. website files.
/sys
- sysfs filesystem
A virtual filesystem, which stores and allows modification of the devices connected to the system.
/tmp
- Temporary files
Stores temporary files for the users or system. Files are either deleted by system utilities or upon reboot of the machine. This directory is a useful places for you to write small temporary files as you create your scripts. This is a good habit to get into; it is easy to find these to delete them, and you will be sure that these directories are never shared across compute nodes.
/var
- Variable data files
Writable counterpart to the /usr
directory. It includes spool directories and files, administrative and logging data, and transient and temporary files (such as those in/usr/tmp
). This is a good place to look if you are trying to find system logging information if something is going wrong with your machine. For example, /var/log
on the head node of a cluster contains information about any failures in cluster configuration.
Linux Environment Variables
Now that you understand the Linux file system a bit better, it's also important to know that there are a number of default environment variables that are used by the system.
To view all of the environment variables that are set on your system, run the command printenv
Some common environment variables include:
USER
= The current logged in user
HOME
= The home directory of the current user
PATH
= A list of directories to be searched when executing commands. When you run a command without specifying a path, the system will search those directories in this order and use the first found executable.
The $PATH
Variable
The PATH
variable is one of the most important environment variables to know about when managing your own software in the cloud, but what exactly does this variable do?
Well, whenever you need to specify a particular file or directory in a command, you often use an absolute path (i.e. the complete path starting from the root directory e.g. /home/ubuntu/inputs/myfile.txt
) or a relative path (i.e. a path starting from the current directory). These paths specify the location of the file or directory so that the command can find it.
To execute a command, we can also call upon software executables. We can tell the system what command to execute in the same way, by providing a path, but this is a lot of typing! For example, imagine having to type/usr/bin/cd
every time you wanted to change directories, instead of just cd
. The PATH
environment variable overcomes this issue by providing a list of directories that contain program executables you may wish to execute.
By default, the path variable is always set to include some standard directories described in the section above such as /bin
, /sbin
, /usr/bin
, /usr/sbin
and /usr/local/bin
. You can see what directories are listed in your PATH
variable by running echo $PATH
(each directory is separated by a :
). You can also view the exact path to any command or executable that is within your PATH
using the which
command. For example, to find where the grep
executable is located, use which grep
.
Note: If you have previously worked with shared computing clusters, you may be used to themodule load
command when wanting to use a particular program. This command works by modifying yourPATH
and other environment variables (if required) so that your program is ready for you to use.
If you have installed a new program on your machine but you are getting a "command not found" error (or if the which
command cannot find your executable), it is likely that your new software is located somewhere outside those directories listed in your current PATH
. So, to ensure your new program can be found without always having to specify the path to the program, you can either move or copy the program executable to a directory that is already listed in your path variable, OR add a new directory to your PATH
that contains the program.
There are two ways to add a new directory to your PATH
. The first is a temporary solution where you simply re-export the PATH
variable with the added directory included. For example, if you wanted to add the directory /home/ubuntu/myprogram/bin
to the END of your PATH
, you would run: export PATH=$PATH:/home/ubuntu/myprogram/bin
. Alternatively, to add the directory to the START of your PATH
, you would run: export PATH=/home/ubuntu/myprogram/bin:$PATH
(see Note about directory order below). This method will only change your PATH
in the current session so the changes will not be saved if you open another session. If you want the changes to be made permanently, you will need to add the same export
command to your ~/.bashrc
file. That way the export command will be run whenever you launch a new session.
Note: The order of directories in yourPATH
is important because if the same program (or executable with the same name) is found in two different directories, the one that is found first in your path will be used. So keep this in mind when adding new directories to your path to determine where they should sit in the list of directories in yourPATH
. It is also important to know that each user will have their ownPATH
variable, so the root user may not search through the same directories as the standard user.
With your newfound knowledge of the Linux file system, and the highly important PATH
variable, you should now feel more confident in managing your compute environment in the cloud. Your Linux machine is no longer a black box of directories with magical commands that appear out of nowhere.
Now go and deal with those pesky "command not found" errors!