IP Monitoring & Diagnostics With Command Line Tools: Part 11- Building & Deploying Your Own Tools

Design your monitoring system to be self-installing and maintain just one single set of source files that are cloned to your target systems. Implement self-configuring logic to avoid manual reconfiguration when new systems are added.


More articles in this series:


A well-designed predictive monitoring system will reduce your maintenance overhead. Deployment to multiple machines should be automated and dependable. Portable, self-configuring logic from a single original source is easy to implement if you let the file system and shell variables work for you.

Designing The System

Enumerate the target systems. Note the details of their operating system, its version and the IP address. Check that the necessary user accounts are set up ready for monitoring. Decide which monitoring probes will be deployed in each system.

Nominate one machine as the central-cortex that orchestrates everything. The source code for the monitoring kit is maintained here and deployed to the satellite systems. If the machine capacity permits, this can also host the database and a web server for displaying the results and implementing a dashboard control surface.

Zero Configuration

Use a dot-run shell script, sourced into the run-time scripts to manage these key zero configuration goals:

  • Atomic installation. Copy a top-level directory to the target system with everything it needs within it.
  • Install path. At run-time, the software should self-discover where it has been deployed and configure itself accordingly.
  • Indirect path references. The file system is accessed with computed paths in variables instead of hard coded literal paths.
  • Machine specific configurations. Use the host name to select the machine specific configuration.

The File System Map

Design your file-system so that similar kinds of components and assets are collected together. Each directory creates a separate namespace. Always separate data from code and be mindful of live data and cache directories so they are not overwritten by the deployment mechanism.

[base_directory]
  !
  +- [toolkit]
  !    !
  !    +- env_build.sh
  !    !
  !    +- probe_runner.sh
  !    !
  !    +- {... other_scripts_and_tools ...}
  !
  +- [probes_container]
  !    !
  !    +- [every_minute]
  !    !    !
  !    !    +- { ... task_scripts ...}
  !    !
  !    +- [top_of_the_hour]
  !    !
  !    +- [daily_at_midnight]
  !    !
  !    +- [weekly_early_monday_am]
  !    !
  !    +- [monthly_first_morning]
  !    !
  !    +- [quarterly]
  !    !
  !    +- [annually]
  !    !
  !    +- [... other_tasks ...]
  !
  +- [configuration_data]
  !
  +- [live_data]
  !
  +- [cache_containers]
  !
  +- [log_files]

The toolkit directory contains the high-level scripts. The environment builder (env_build.sh) derives the paths to everything else and is sourced into every other script. The probe runner is called by cron and runs the individual monitoring probes.

Reference information and lookup tables live in the configuration data directory. Measurement probes can access this as needed.

The live data, cache and log file containers store dynamic working reference information. The deployment logic must be blocked from overwriting them.

A Small Caveat Regarding $0

The path to the current script can be determined from the $0 (dollar zero) positional argument:

MY_PATH=$(dirname $0)

Inside a sourced (dot-run) script, the value of $0 is not updated and it still contains the path to the calling script. Use the $BASH_SOURCE array variable instead. The shell synchronises its value to yield the desired path to the sourced script automatically:

MY_PATH=$(dirname $BASH_SOURCE[0])

Both variations are functionally identical in a normal script but only the $BASH_SOURCE method works inside a sourced script.

Finding & Running The Environment Builder

The env_build.sh script file is sourced (dot-run) into all of the other scripts when they are called to action. This is more convenient than visiting every host to manually edit environment variables into the profiles on every user account.

Find the environment builder from inside the command line scripts by using relative directory paths. The scripts that are called to action may have different levels of nesting but relative addressing within directory paths solves that. They only need to find the environment builder and source its content. Here are the basic relative pathing rules:

Directory Relative path
/ The top-level root directory. Do not put your scripts here.
./ The current working directory. Use the pwd command to see what that is.
../ Go up a directory level to the parent directory. Equivalent to a dirname command.
../../ Go up two levels.
~/ Your user account home directory.
{nothing} Search the list of directories described by the $PATH variable and use the first matching item.

 

Scripts that live in the same toolkit directory can use this command to source the environment builder:

. "$(dirname "$BASH_SOURCE[0]")/env_build.sh"

The monitoring probes are collected into separate directories inside the probes_container so they need a relative path to find the environment builder. Go up two levels and then down into the toolkit directory:

. "$(dirname "$BASH_SOURCE[0]")/../../toolkit/env_build.sh"

Working Out The Base Path

Derive the base-path at the root of the measuring kit installation from the path of the sourced env_build.sh script available from the $BASH_SOURCE[0] variable. This is a constant and predictable path and eliminates the variability in the paths of the command line scripts that invoke it.

The relative pathing that was needed to find the environment builder can confuse the dirname command. Derive a fully qualified path first with the realpath command.

You will need to install the realpath tool separately if it is not already available.

Obtain the base-path with multiple nested dirname command substitutions after eliminating the relative paths in $BASH_SOURCE[0].

MY_BASE_PATH=$(dirname $(dirname $(realpath $BASH_SOURCE[0])))

Build The Shared Environment

Assemble the indirect references to the component paths by appending them to the base-path. Use these instead of hard coded literal paths when reading or writing files:

PATH_TOOLKIT="${MY_BASE_PATH}/toolkit"
PATH_TASKS="${MY_BASE_PATH}/probes_container"
PATH_CONFIGS="${MY_BASE_PATH}/configuration_data"
PATH_LIVE_DATA="${MY_BASE_PATH}/live_data"
PATH_CACHES="${MY_BASE_PATH}/cache_containers"
PATH_LOGS="${MY_BASE_PATH}/log_files"

Define some additional useful static values to avoid repeating them inside the measurement probes.

TIMESTAMP=$(date +%Y-%m-%dT%H:%M:%S%z)

HOSTNAME=$(hostname -s)

Build the switch-case structure outlined previously to implement host specific configurations. Extend this as needed to cover all your machines.

case ${HOSTNAME} in

  NASW)
    source ${CONFIGS}/config_NASW.sh
    ;;

  *)
    source ${CONFIGS}/config_catch_all.sh
    ;;
esac

Everything is now operating relative to the install location (regardless of where that is).  None of the scripts ever need to know or care where they are installed. This is real-world, zero-conf in action!

The Cron Scheduled Probe Runner

Avoid editing the crontab to introduce a new measuring probe with a probe runner script which scans the probe container directories for items to run. Pass the chosen interval as a parameter to avoid implementing multiple probe runners with identical code. Use the fully qualified path to the probe runner (represented by {---}) so that cron can find it:

* * * * * {---}/toolkit/probe_runner.sh "every_minute"

The probe_runner.sh scans the selected probe container directory indicated by positional argument $1 for measurement probes matching a specific filename pattern. It passes the list of probes to a while() loop to run them one at a time.

. "$(dirname "$BASH_SOURCE[0]")/env_build.sh"

MY_PROBE_INTERVAL=$1

ls ${PATH_PROBES}/${MY_PROBE_INTERVAL}/probe_*.sh |
while read -r MY_PROBE
do
   echo "${TIMESTAMP} Running $(basename ${MY_PROBE})"
           >> ${PATH_LOGS}/probe_run.log

   ${MY_PROBE}
done

The -r flag on the while read command prevents unwanted backslashes (\) from being interpreted as shell meta-characters. The basename command shortens the probe path name for logging to just the script name and discards the path component.

Don't forget to use the chmod command to set the execute flag on the probe runner script and the measurement probes. Also be aware of user accounts, ownership and read/write access controls.

Symbolic Names For Measurements

Define a unique symbolic-name for each measurement. Use an easy to remember naming scheme and document them in your maintenance guide.

When recording a measurement, combine this symbol with the hostname to describe a specific monitoring probe identity. Timestamps support trend analysis when the measurements are aggregated.

Every script, cache file, component, database record and log file can then be consistently associated with that symbolic name. This helps the central-cortex manage and aggregate the results for analysis.

Example Monitoring Probes

Here are two example monitoring probes Their results will go into their own symbolically named cache files. They could be stored in a database table just as easily.

By doing a little extra work to create a monitoring environment and probe runner, the individual probe scripts are very simple and economically coded.

Because the monitoring probes are lightweight, running them every minute should impose almost no load on the CPU. If you want to run them less often, simply move the probe script to another scheduled interval container directory. There is no need to change the crontab.

Process counter

Save this probe script as probe_PROCESS_COUNT.sh in the every_minute directory:

SYMBOLIC_NAME="PROCESS_COUNT"

source "$(dirname "$BASH_SOURCE[0]")/../../toolkit/env_build.sh"

MY_COUNT=$(expr $(ps -eo pid= | wc -l) - 1)

echo "${TIMESTAMP} ${HOSTNAME} ${SYMBOLIC_NAME} ${MY_COUNT}"
             >> "${PATH_CACHES}/${SYMBOLIC_NAME}.dat"

The count value uses nested command substitutions. Inside the deepest level, the ps command lists only the PID values and counts them with a wc command.

If 100% accuracy is important then we must eliminate the heading line and the ps command from the list of processes or the count will be 'off by two'. The wc command does not figure in the PID list because it has not yet been called to action. That may not be true in all operating systems though.

The trailing equals sign (=) on the pid flag will suppress the headings. Reduce the count by one more with an expr command substitution. Use expr for compatibility with older versions of bash.

Each measurement is tagged with a timestamp, hostname and symbolic name for aggregation in the central-cortex. Collisions and data loss are avoided because each hostname + symbol combination is unique.

Disk space usage observer

Here is another example probe to check disk space usage. This one is called probe_DISK_SPACE.sh:

SYMBOLIC_NAME="DISK_SPACE"

source "$(dirname "$BASH_SOURCE[0]")/../../toolkit/env_build.sh"

TARGET_VOLUME="volume1"

MY_PERCENTAGE=$(df |
grep "${TARGET_VOLUME}" |
tr -s ' ' |
cut -d ' ' -f 5)

MY_UNIQUETAG="${HOSTNAME} ${SYMBOLIC_NAME} ${TARGET_VOLUME}"

echo "${TIMESTAMP} ${MY_UNIQUETAG} ${MY_PERCENTAGE}"
              >> "${PATH_CACHES}/${SYMBOLIC_NAME}.dat"

The MY_UNIQUETAG variable is only there to shorten the echo line to avoid a line break.

Don't forget to use the chmod command to set the execute flag on the probe scripts.

Deployment Mechanisms

Use continuous integration tools to deploy the measurement kit to the remote systems on a nightly basis from a Git repository.

Only check your code into the Git repository when it is fully working and error free. The continuous integration tools run integrity checks every night. If the code is clean, it is deployed across the entire network of machines. Your Ops team will be able to set this up for you.

Do not overwrite the live data, cache and log file containers. Move them outside of the measurement kit and redefine their location in the environment builder to simplify the deployment rules. Provided your probe runner and the probe scripts have permission to access those locations, their physical path is now completely transparent.

Conclusion

The probe runner finds the measurement probes to run in each container. Relocating, adding or removing probes is very easy now. Just a simple drag and drop and no alterations to the crontab.

The central-cortex will gather the measurements from the cached results and aggregate them for display.

You might also like...

Why AI Won’t Roll Out In Broadcasting As Quickly As You’d Think

We’ve all witnessed its phenomenal growth recently. The question is: how do we manage the process of adopting and adjusting to AI in the broadcasting industry? This article is more about our approach than specific examples of AI integration;…

Designing IP Broadcast Systems: Integrating Cloud Infrastructure

Connecting on-prem broadcast infrastructures to the public cloud leads to a hybrid system which requires reliable secure high value media exchange and delivery.

Video Quality: Part 1 - Video Quality Faces New Challenges In Generative AI Era

In this first in a new series about Video Quality, we look at how the continuing proliferation of User Generated Content has brought new challenges for video quality assurance, with AI in turn helping address some of them. But new…

Minimizing OTT Churn Rates Through Viewer Engagement

A D2C streaming service requires an understanding of satisfaction with the service – the quality of it, the ease of use, the style of use – which requires the right technology and a focused information-gathering approach.

Production Control Room Tools At NAB 2024

As we approach the 2024 NAB Show we discuss the increasing demands placed on production control rooms and their crew, and the technologies coming to market in this key area of live broadcast production.