Pipeline Alerts with Slack Webhooks
Keeping an eye on any analyses that are running in the cloud is crucial, but watching a terminal window all day is no fun! What if there was a way you could also be alerted whenever a pipeline errors or completes? Well, with a nifty tool like Slack's Incoming Webhooks, you can!
Keeping an eye on any analyses that are running in the cloud and catching any errors that may result in idle computing resources is crucial in ensuring none of your valuable cloud credits are going to waste.
A previous blog post showed how you can easily monitor your computing resources in RONIN, but what if there was a way you could also be alerted whenever a pipeline errors or completes? Well, with a nifty tool like Slack's Incoming Webhooks, you can!
Most of us will already be familiar with the team communication platform, Slack, but if not, don't worry as you don't need to be an advanced user to take advantage of the Incoming Webhooks tool. Incoming Webhooks are a simple way to post messages to Slack from your virtual machine in the cloud. The messages are sent using a simple command which can be incorporated into scripts or used to create a custom alerting program. Examples of both of these will be covered below, but first you will need to create a Slack account (if you haven't already got one) and then follow a few simple steps described here (https://api.slack.com/messaging/webhooks) to set up Incoming Webhooks.
If you follow steps 1-3 of the above link, you should end up with a Webhook URL that looks something like: https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
This URL is specific to you and the Slack channel you created. Using the Slack application, you can configure your notification settings to be alerted whenever a message is posted to your Webhook Slack channel.
To test your URL and alert settings, run the following command in your terminal, replacing the URL with your specific Webhook URL:
curl -X POST -H 'Content-type: application/json' --data '{"text":"This is a test message"}' https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
If you have configured your Slack notification settings correctly, you should have received a notification from your Slack channel with the message "This is a test message".
Alerts when a command completes successfully or errors
You can use the command above in all different scenarios to alert you about many different things on your virtual machine. A great example is being alerted if a particular analysis completed successfully or there were any errors you may need to check. The following code shows you how this could be achieved using a simple bash if-else statement:
For those of you who may not be familiar with bash scripting, let's break down what the above script is doing:
-
The first line just tells the terminal what programming language the script is written in - in this case we are just using the standard bash language that is used by the terminal.
-
The next line is assigning our Webhook URL to a variable called "URL" so we don't need to type the whole URL each time we want to use it.
-
The 3rd line indicates this is where you would run your desired command or script.
-
The following line will then check the exit code of that command or script and saves it to a variable called "STATUS"
-
The if statement will then check if the exit code was 0 (success) and if so, it will send an alert to Slack saying that the analysis is complete and then tell the machine to shutdown in 15 minutes and exit from the script. If the exit code was not 0 (failure) it will send an alert to Slack saying that there was error duing the analysis and will tell the machine to shutdown in 15 minutes and exit from the script.
Note: The shutdown commands are optional but they just ensure that you are never paying for a machine when it isn't running any analyses. Setting the shutdown command to occur after 15 minutes also gives you time to log back in and cancel the shutdown (shutdown -c
) if you'd prefer to keep working or to check any error messages right away. Shutdown commands need to be run by the root user so ensure you run the above script as the root user if you wish to include the shutdown commands.
Alerts when a process is no longer running
You can see from the above example that the Slack Webhooks are really useful for letting you know how your analysis or pipeline is progressing and to give you an idea of when you may need to check the machine. You can also set up simple scripts that will monitor certain processes on your machine and send you a custom message once a process is no longer running. For example, if you save the script below as slack.sh
and make it executable (chmod +x slack.sh
) you can then specify a particular process ID and message as command-line arguments (e.g. ./slack.sh 518474 "Analysis complete"
) and it will alert you with that message via Slack once that process is no longer running:
#!/bin/bash
#This script posts a message to slack when the named process completes
pid=$1
message=$2
(while kill -0 $pid; do sleep 1; done) && curl -X POST -H 'Content-type: application/json' --data '{"text":"'"${message}"'"}' https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
Hint: Use the ps -ef
command to get a list of all running processes on your machine and retrieve the process ID (PID) for your process of interest.
Note: Remember to run the above script in the background if you are monitoring a long-running process so that it doesn't terminate when you close your terminal - see this related blog post for more information.
Alerts when a machine has been idle for a given amount of time
Finally, you may want to be alerted about any machines that have been idle for a significant amount of time and should be turned off. This can easily be accomplished by launching a cron job with a script that monitors idle CPU percentage on the machine as follows:
- Install the sysstat package:
sudo apt install sysstat
- Save the following script somewhere on your machine as
idle-alert.sh
and ensure to replace the template SLACK URL with your SLACK Webhook URL:
#!/bin/bash
IDLECPU=$(sar 900 95 | grep "Average" | sed 's/^.* //')
IDLECPU=$( printf "%.0f" $IDLECPU )
SLACKURL="https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX"
MACHINE=$(hostname)
MESSAGE="$MACHINE has been idle for the past 24 hours"
if [ "$IDLECPU" -gt 95 ]
then
curl -X POST -H 'Content-type: application/json' --data '{"text":"'"${MESSAGE}"'"}' ${SLACKURL}
fi
In the script above, the command "sar" will monitor CPU usage every 15 minutes (900 seconds) 95 times in a row (equates to just under 24 hours) and will then calculate the average idle CPU percentage across that time period. If the average idle CPU percentage is greater than 95%, it will send a message with the machine's IP address (hostname) to your slack saying that the machine has been idle for the past 24 hours.
Note: You can change the timing as necessary but ensure you respectively adjust the timing of how often cron runs the script in step 4 below. Shutdown commands may also be added to the IF statement to automatically turn off the machine after a certain period of inactivity. Also feel free to adjust the MACHINE variable to something more meaninful, such as the machine name in RONIN.
3. Make the script executable by running sudo chmod +x idle-alert.sh
4. Add this script as a cron job to automatically run this script once per day at 8am by running sudo crontab -e
and then adding the following line (with the correct script path) to the bottom of the crontab file: 0 8 * * * /path/to/idle-alert.sh
There's obviously many more neat ways you could use this simple but effective alerting system, but hopefully you now have the basics you need to keep an eye on all of your analyses in the cloud, without literally having to keep any eye on them!
Oh, and just remember to silence your Slack notifications at night so you won't be disturbed when your analysis decides to error or complete at 1am in the morning.