How To Make Your Object Storage Look Like a File System
In RONIN, we treat Object Storage (which are Amazon S3 Buckets) as a separate thing from file systems not just to throw more jargon into a complicated space, but for Really Good Reasons. If you are not nodding vigorously along with this first sentence please read The Fine Print at the bottom of this post for some of the reasons. In short, the file systems that most researchers are familiar with are designed to support different operations than Object Storage, and because of that, it is difficult to make an object store look like a file system without annoying performance problems or strange failures. Nevertheless there are times when the ability to manipulate your S3 bucket more like a file system might be convenient (if you understand the limitations). People ask us about this and some example use cases are below.
- You are running a class where you want everyone to have a virtual machine with the same software and to be able to browse and copy data (on an object store) to this machine, the but you don't want to have to train everyone on using keys and on the AWS CLI.
- You would like to use pipelines that previously accessed incoming data on a large shared network attached store, but you need to move to S3 immediately, before you can rewrite the code.
One solution to this is (for very limited use cases without stringent security requirements) is to use FUSE (Filesystem in USEr Space), which is a piece of software that creates a bridge between the operating system kernel and user space to allow people to create filesystems for other kinds of devices, affording a lot of flexibility. There are several FUSE clients that allow you to mount object storage as a filesystem, including s3fs-fuse client and goofys. We focus here on s3-fuse, because it provides some of those file system functions that people tend to expect at the cost of performance. Your needs may differ.
We do NOT recommend S3 FUSE clients for most use cases. In particular, if you are trying to do anything like the following, talk to us.
- You need to support writing to object storage.
- You are trying to make object storage look to researchers like a file system with which they can run their research codes.
Here we describe how to set up an EC2 instance which automatically mounts a RONIN object store.
Installing and Configuring
Installing s3fs is fairly easy because packages exist for most distributions (see complete installation instructions). For Ubuntu, type:
sudo apt install s3fs
The next thing is to obtain a key for the object store you wish to mount (Figure 1). This can be a read-only or read/write key that you find in RONIN. Best practices recommend that you use the least powerful privileges that you need to do the job, so if you don't NEED to write to the S3 bucket, just use a read-only key. You should also rotate (change) the keys periodically, because just like with physical keys to doors, the longer the key remains in use the more likely it is that copies will be out there in the hands of people who no longer need access. Note that changing and even deleting keys is very easy to do from the RONIN Object Storage page.
Thinking back to the first classroom use case, if you rotate the key after the class is over, students will not be able to use it anymore, even if they retain their virtual machines. This is a solution that can be implemented by a RONIN User or Project administrator. Another approach is to modify the role created in RONIN by default for the EC2 instances to have a policy that allows these instances to access the S3 bucket (without any user keys). This must be implemented by a RONIN administrator who can access the AWS account in which RONIN is installed.
When you download a key, you obtain a CSV file with two important columns, AccessKeyID and SecretAccessKey. You need to create a password file on your EC2 instance that contains this information. In the command below, replace the variables $AccessKeyID and $SecretAccessKey with they key you just downloaded.
echo $AccessKeyID:$SecretAccessKey > /etc/passwd.s3fs
Mounting Manually and Automatically
First create an empty directory in which to mount the S3 bucket. Here, our object store name is
roninstorage so we create a directory called
/roninstorage at the root level.
sudo mkdir /roninstorage
To mount the object store from the command line, you will need the SERVER, REGION and PATH from the Object Storage connection information (Figure 2). Replace $SERVER, $REGION and $PATH with these variables in the command below. Also replace
/roninstorage with the directory you created above.
sudo s3fs $PATH ~/roninstorage -o dbglevel=info -f -o curldbg -o allow-other -o use_path_request_style -o url=https://$SERVER -o endpoint=$REGION
For example, with the information from Figure 2, our command would look like this.
sudo s3fs roninstorage.store.ronin.cloud /roninstorage -o dbglevel=info -f -o curldbg -o allow-other -o use_path_request_style -o url=https://s3.ap-southeast-2.amazonaws.com -o endpoint=ap-southeast-2
The other parameters in this command turn on debugging so that you can see if there is an error, and importantly,
-o allow-other allows anyone on the machine to be able to access the mount.
To avoid having to do this every single time you need the mount, you can edit the file
/etc/fstab so that it is mounted automatically at login. You will add a line of the format below to
/etc/fstab using your favorite text editor. Replace $PATH, $SERVER and $REGION with the values from the Object Storage connection information panel, and replace
/roninstorage with your own directory.
$PATH /roninstorage fuse.s3fs _netdev,allow_other, use_path_request_style,url=https://$SERVER,endpoint=$REGION 0 0
Our /etc/fstab entry would then look like the following.
roninstorage.store.ronin.cloud /roninstorage fuse.s3fs _netdev,allow_other,use_path_request_style, url=https://s3.ap-southeast-2.amazonaws.com, endpoint=ap-southeast-2 0 0
As always, before you change system files it is a good idea to make a snapshot of your machine (that you can delete when you know your changes work).
Modifying EC2 Roles
This section applies only to a RONIN administrator who has administrative access to the AWS account in which RONIN resides. Sometimes it may be desirable to give blanket access to a set of object stores, even S3 buckets managed outside of RONIN, to machines created within RONIN. This can be managed by modifying the Identity and Access Management (IAM) profile role that is attached to EC2 instances.
In RONIN, machines are all created with the IAM role
ronin-clone which has policies that allow AWS Systems Manager and Amazon CloudWatch to work transparently. If you have a need, you can replace this role or modify it to add permissions. However, if you do this you need to be sure that you do not accidentally expose data or remove RONIN functionality by deleting policies that have been put in place. This is black diamond territory!
AWS has a great description of how you can grant EC2 access to S3 buckets using a role. Once you have set things up with appropriate permissions, you can modify your mount commands to use the option
iam_role to specify a role, or simply set
iam_role=auto. You must do this because the old keys will not reflect your new role. For example, our mount command above would look like this:
sudo s3fs roninstorage.store.ronin.cloud /roninstorage -o dbglevel=info -f -o curldbg -o allow-other -o iam_role=auto -o use_path_request_style -o url=https://s3.ap-southeast-2.amazonaws.com -o endpoint=ap-southeast-2
Note that modifying the role will also allow users on the EC2 instances to access the S3 bucket using the CLI or SDK; you are simply granting programmatic access.
One of the biggest differences between object storage and Linux friendly file systems is that the latter support POSIX file operations, which maintain some state (such as your position in a file). Object storage is stateless to better accomodate the scale of the cloud. Thus, depending on the client that you use, certain operations may not be supported, leading to unpredictable errors if you attempt to treat a FUSE mount just like a regular mounted drive volume. Certain operations may be incredibly slow, given what is involved in approximating them. For example, seeking and writing bits of objects will be tricky. In general, FUSE mounted object storage will not have the performance of a drive volume. It will not have the performance of using the AWS CLI to access S3. Ask your RONIN administrator if you are wondering if FUSE is right for you, because there are very few cases where it is. And to make the peripheral drift illusion in the header image stop, stare at the RONIN logo.
Enjoy With Caution
Now you should have a good sense of how to make an object store look like a file system, both within RONIN, as a user or project administrator, or as a RONIN/AWS administrator using the AWS console. You should also have a good sense of why this might not be a good idea! It's hard to make something into something it's really not; use these approaches when you need to simplify access to object storage for your users and FUSE limitations are not an issue.