AWS Tutorial 20 - Adding Machine and Process Monitoring To Your AWS Instances with Inspeqtor
Disclaimer: Inspeqtor is an excellent piece of open source software, any errors are mine and mine alone. This was fully tested by deploying onto a clean EC2 instance and verifying that it functioned correctly end to end.
One of the aspects of cloud computing versus traditional hosting is that with cloud computing you tend to work with computing resources that, in general, are:
- less powerful
- less reliable
- have less storage
Finally there tend to be more of these resources. One way to term this might be that traditional data centers are molecular where as cloud computing is more atomic.
In my own experience, I ran a data center for 5 years without having to worry about process monitoring and tools like Monit or Inspeqtor but the very first time I put my AWS data center under heavy load, I found things crashing right, left and center. All of my problems were magically solved simply by the addition of Monit to watch dog the sidekiq process and restart it when it gets too large. And while this solved my sidekiq problem, two nights ago, I ran out of disc space on a key resource – my MariaDB instance.
One approach would be to continue to use Monit and add rules to it for disc space monitoring but I've been intrigued by the simple configuration that Mike Perham's Inspeqtor offers. Sidekiq has served me well as of late and Mike's support, even the free community support, he offers is fantastic. So rather than double down on Monit, I'm going to branch out and use Inspeqtor.
Goals
We want to use Inspeqtor as follows:
- be configured on all boxes by ansible so we never have to do anything manually
- function on Ubuntu 14.04 with upstart
- deliver alerts by email (sendmail) that contain the problem and the instance id
- monitor sidekiq
- monitor apache
- monitor disc space
- monitor ram
- monitor load
Inspeqtor vs Monit
Monit and Inspeqtor are two very different animals. Whereas Monit is a general purpose monitoring tool, Inspeqtor is specialized focusing on process that are run thru init.d / upstart as well as generalized machine configuration. So while you can technically do more with Monit, you'll have a much easier time doing what you generally need with Inspeqtor.
Configuring Sendmail
Inspeqtor can work with a number of different email delivery approaches from gmail to a local sendmail instance. The configuration for Inspeqtor for different email delivery engines looks like this:
#
# This is the default alert statement. This tells Inspeqtor where to
# send alert emails.
#
# Here we'll configure the default to send email alerts via gmail to "dev@example.com"
#
# send alerts via gmail
# with username mike, password fuzzbucket, to_email dev@example.com
#
# Here's a generic email example, not requiring Google Mail.
# Your SMTP server must accept Authentication/TLS.
#
# send alerts via email with
# username bob,
# password "foo bar baz",
# smtp_server smtp.example.com,
# tls_port 587,
# to_email analytics@example.com,
# from_email inspeqtor@example.com
#
# Here is another generic email example, not requiring authentication.
# Your local SMTP server must be listening on port 25.
#
send alerts via email with
smtp_server localhost,
to_email fuzzygroup@gmail.com,
from_email inspeqtor@
I've got the other approaches commented out just showing the local smtp_server (in my case sendmail).
Here's an ansible role to configure sendmail:
mkdir -p ansible_root/roles/sendmail/tasks
touch ansible_root/roles/sendmail/tasks/main.yml
Edit the file main.yml and add these lines:
- name: install sendmail
apt: name=sendmail state=present
In your playbook, call this role as follows:
- { role: sendmail, tags: sendmail}
Here's how to verify if your local sendmail instance is actually running:
echo "ficrawler1 My test email being sent from sendmail" | /usr/sbin/sendmail fuzzygroup@gmail.com
Check your inbox for the message. You may find that you need to check a spam or junk folder since this isn't a modern mail server using SPIF / DKIM standards. If the message didn't arrive then you need to troubleshoot and figure out why.
Configuring Inspeqtor with Ansible
Inspeqtor relies on several files that determine how it works:
- /etc/inspeqtor/inspeqtor.conf – how the overall inspeqtor instance runs and how to notifies
- /etc/inspeqtor/host.inq – what to monitor about the host itself
- /etc/inspeqtor/services.d/WHATEVER_YOU_WANT_TO_MONITOR.inq
Examples of each of these are given below.
Here is /etc/inspeqtor/inspeqtor.conf
#
# Welcome to the global Inspeqtor config file!
#
#
# The cycle time is how often Inspeqtor will capture metrics and
# verify rules, in seconds.
#
set cycle_time 15
#
# The deploy length is the maximum length of your application deploys, in
# seconds. If you start a deploy and then never signal its finish, Inspeqtor will
# time out the deploy after this many seconds and start checking rules again.
#
# This is a failsafe. Normally you will signal Inspeqtor when your
# deploys finish.
#
set deploy_length 300
#
# Set logging level, legal values are:
# warn
# info (default)
# debug (-l debug)
# verbose (-l verbose)
# At info, inspeqtor will not log anything when everything is ok.
#
set log_level info
# Inspeqtor Pro can send collected metrics to Statsd
# set statsd_location localhost:8125
#
# This is the default alert statement. This tells Inspeqtor where to
# send alert emails.
#
# Here we'll configure the default to send email alerts via gmail to "dev@example.com"
#
# send alerts via gmail
# with username mike, password fuzzbucket, to_email dev@example.com
#
# Here's a generic email example, not requiring Google Mail.
# Your SMTP server must accept Authentication/TLS.
#
# send alerts via email with
# username bob,
# password "foo bar baz",
# smtp_server smtp.example.com,
# tls_port 587,
# to_email analytics@example.com,
# from_email inspeqtor@example.com
#
# Here is another generic email example, not requiring authentication.
# Your local SMTP server must be listening on port 25.
#
send alerts via email with
smtp_server localhost,
to_email fuzzygroup@gmail.com,
from_email inspeqtor@ip-172-31-38-2
Here is /etc/inspeqtor/host.inq
check host
if load:1 > 1 for 2 cycles then alert
if load:5 > 1 then alert
if cpu:user > 95% for 2 cycles then alert
if swap > 20% for 2 cycles then alert
if disk:/ > 90% then alert
Here is /etc/inspeqtor/services.d/service.inq.template
This is a generic starting point template to monitor any service in /etc/init.d
cat /etc/inspeqtor/services.d/service.inq.template
# NOTE this file should be renamed to <name>.inq where name is explained below.
#
# Inspeqtor is designed to monitor a host and the services running
# on that host. Services must be controlled by your OS's init system:
# upstart, systemd, launchd or runit.
#
# Inspeqtor knows how to monitor services for each major init system,
# as long as you give the exact name of that service.
#
# In systemd:
# /usr/lib/systemd/system/<name>.service
# In upstart:
# /etc/init/<name>.conf
# In runit:
# /etc/service/<name>/run
# In launchd:
# ~/Library/LaunchAgents/<name>.plist
#
# Supporting traditional init.d is a little trickier, see the
# https://github.com/mperham/inspeqtor/wiki/Initd wiki page
# for more details. tl;dr You need to populate a PID file at
# /var/run/<name>.pid or /var/run/<name>/<name>.pid
#
#
# Here we define the service to monitor. The name of the service
# ('mysql') must match the name that your init system uses.
# You'll want to rename this file to mysql.inq to match.
#
check service mysql
#
# if you want to monitor daemon-specific metrics, you'll need
# to tell Inspeqtor how to connect to the daemon.
# See https://github.com/mperham/inspeqtor/wiki/Daemon-Specific-Metrics
#
#with username root, socket /var/run/mysqld/mysqld.sock
#
# Add any normal process metrics you want to verify.
#
if memory:rss > 2g then alert
#
# Since a cycle defaults to 15 seconds, this rule triggers if
# there's excessive CPU usage for more than 30 seconds.
#
if cpu:user > 90% for 2 cycles then alert
#
# Alert if we see too many queries or slow queries. These are
# examples of Daemon-Specific Metrics.
#
#if mysql:Queries > 100/sec for 2 cycles then alert
#if mysql:Slow_queries > 1/sec for 2 cycles then alert
For more on writing your own inq files, see the wiki.
Here is my sample sidekiq.inq file
check service sidekiq
if memory:rss > 6g then alert, restart
if cpu:user > 95% for 2 cycles then alert
Configuring Inspeqtor with Ansible
Rather than write out a playbook, roles and template files manually, I hosted it on github. Clone it from there and adapt it for your needs. But, in case you're curious, here is the overall structure:
tree
.
├── ansible.cfg
├── group_vars
│ └── all
├── inventories
│ └── ficrawler11
├── playbook_inspeqtor.yml
├── playbooks
├── readme.md
└── roles
├── inspeqtor
│ ├── files
│ │ ├── apache.inq
│ │ ├── host.inq
│ │ ├── inspeqtor.conf
│ │ └── sidekiq.inq
│ └── tasks
│ └── main.yml
├── sendmail
│ └── tasks
│ └── main.yml
└── setup
└── tasks
└── main.yml
The setup task exists to register an ansible variable that gives the instance-id so it can be used in alerting. This is handled by calling the instance id api which I covered previously. While there is an instance_ids method in the Ansible EC2 module, this approach means you don't have your security keys as its a private API you only call from inside the instance itself.
Managing Inspeqtor on a Daily Basis
With almost any Unix tool you need to know how to do at least two things:
- start / stop
- view logs
Start / Stop on Ubuntu is handled with:
sudo service inspeqtor restart
Logs can be viewed with:
sudo tail -f /var/log/upstart/inspeqtor.log
More Info
More info on Inspeqtor can be found on the wiki.
Posted In: #aws #inspeqtor #ubuntu