IP Monitoring & Diagnostics With Command Line Tools: Part 11- Building & Deploying Your Own Tools

Design your monitoring system to be self-installing and maintain just one single set of source files that are cloned to your target systems. Implement self-configuring logic to avoid manual reconfiguration when new systems are added.

Designing The System

Enumerate the target systems. Note the details of their operating system, its version and the IP address. Check that the necessary user accounts are set up ready for monitoring. Decide which monitoring probes will be deployed in each system.

Nominate one machine as the central-cortex that orchestrates everything. The source code for the monitoring kit is maintained here and deployed to the satellite systems. If the machine capacity permits, this can also host the database and a web server for displaying the results and implementing a dashboard control surface.

Zero Configuration

Use a dot-run shell script, sourced into the run-time scripts to manage these key zero configuration goals:

Atomic installation. Copy a top-level directory to the target system with everything it needs within it.
Install path. At run-time, the software should self-discover where it has been deployed and configure itself accordingly.
Indirect path references. The file system is accessed with computed paths in variables instead of hard coded literal paths.
Machine specific configurations. Use the host name to select the machine specific configuration.

The File System Map

Design your file-system so that similar kinds of components and assets are collected together. Each directory creates a separate namespace. Always separate data from code and be mindful of live data and cache directories so they are not overwritten by the deployment mechanism.

[base_directory] ! +- [toolkit] ! ! ! +- env_build.sh ! ! ! +- probe_runner.sh ! ! ! +- {... other_scripts_and_tools ...} ! +- [probes_container] ! ! ! +- [every_minute] ! ! ! ! ! +- { ... task_scripts ...} ! ! ! +- [top_of_the_hour] ! ! ! +- [daily_at_midnight] ! ! ! +- [weekly_early_monday_am] ! ! ! +- [monthly_first_morning] ! ! ! +- [quarterly] ! ! ! +- [annually] ! ! ! +- [... other_tasks ...] ! +- [configuration_data] ! +- [live_data] ! +- [cache_containers] ! +- [log_files]

The toolkit directory contains the high-level scripts. The environment builder (env_build.sh) derives the paths to everything else and is sourced into every other script. The probe runner is called by cron and runs the individual monitoring probes.

Reference information and lookup tables live in the configuration data directory. Measurement probes can access this as needed.

The live data, cache and log file containers store dynamic working reference information. The deployment logic must be blocked from overwriting them.

A Small Caveat Regarding $0

The path to the current script can be determined from the $0 (dollar zero) positional argument:

MY_PATH=$(dirname $0)

Inside a sourced (dot-run) script, the value of $0 is not updated and it still contains the path to the calling script. Use the $BASH_SOURCE array variable instead. The shell synchronises its value to yield the desired path to the sourced script automatically:

MY_PATH=$(dirname $BASH_SOURCE[0])

Both variations are functionally identical in a normal script but only the $BASH_SOURCE method works inside a sourced script.

Finding & Running The Environment Builder

The env_build.sh script file is sourced (dot-run) into all of the other scripts when they are called to action. This is more convenient than visiting every host to manually edit environment variables into the profiles on every user account.

Find the environment builder from inside the command line scripts by using relative directory paths. The scripts that are called to action may have different levels of nesting but relative addressing within directory paths solves that. They only need to find the environment builder and source its content. Here are the basic relative pathing rules:

Directory	Relative path
/	The top-level root directory. Do not put your scripts here.
./	The current working directory. Use the pwd command to see what that is.
../	Go up a directory level to the parent directory. Equivalent to a dirname command.
../../	Go up two levels.
~/	Your user account home directory.
{nothing}	Search the list of directories described by the $PATH variable and use the first matching item.

Scripts that live in the same toolkit directory can use this command to source the environment builder:

. "$(dirname "$BASH_SOURCE[0]")/env_build.sh"

The monitoring probes are collected into separate directories inside the probes_container so they need a relative path to find the environment builder. Go up two levels and then down into the toolkit directory:

. "$(dirname "$BASH_SOURCE[0]")/../../toolkit/env_build.sh"

Working Out The Base Path

Derive the base-path at the root of the measuring kit installation from the path of the sourced env_build.sh script available from the $BASH_SOURCE[0] variable. This is a constant and predictable path and eliminates the variability in the paths of the command line scripts that invoke it.

The relative pathing that was needed to find the environment builder can confuse the dirname command. Derive a fully qualified path first with the realpath command.

You will need to install the realpath tool separately if it is not already available.

Obtain the base-path with multiple nested dirname command substitutions after eliminating the relative paths in $BASH_SOURCE[0].

MY_BASE_PATH=$(dirname $(dirname $(realpath $BASH_SOURCE[0])))

Build The Shared Environment

Assemble the indirect references to the component paths by appending them to the base-path. Use these instead of hard coded literal paths when reading or writing files:

PATH_TOOLKIT="${MY_BASE_PATH}/toolkit" PATH_TASKS="${MY_BASE_PATH}/probes_container" PATH_CONFIGS="${MY_BASE_PATH}/configuration_data" PATH_LIVE_DATA="${MY_BASE_PATH}/live_data" PATH_CACHES="${MY_BASE_PATH}/cache_containers" PATH_LOGS="${MY_BASE_PATH}/log_files"

Define some additional useful static values to avoid repeating them inside the measurement probes.

TIMESTAMP=$(date +%Y-%m-%dT%H:%M:%S%z) HOSTNAME=$(hostname -s)

Build the switch-case structure outlined previously to implement host specific configurations. Extend this as needed to cover all your machines.

case ${HOSTNAME} in NASW) source ${CONFIGS}/config_NASW.sh ;; *) source ${CONFIGS}/config_catch_all.sh ;; esac

Everything is now operating relative to the install location (regardless of where that is). None of the scripts ever need to know or care where they are installed. This is real-world, zero-conf in action!

The Cron Scheduled Probe Runner

Avoid editing the crontab to introduce a new measuring probe with a probe runner script which scans the probe container directories for items to run. Pass the chosen interval as a parameter to avoid implementing multiple probe runners with identical code. Use the fully qualified path to the probe runner (represented by {---}) so that cron can find it:

* * * * * {---}/toolkit/probe_runner.sh "every_minute"

The probe_runner.sh scans the selected probe container directory indicated by positional argument $1 for measurement probes matching a specific filename pattern. It passes the list of probes to a while() loop to run them one at a time.

. "$(dirname "$BASH_SOURCE[0]")/env_build.sh" MY_PROBE_INTERVAL=$1 ls ${PATH_PROBES}/${MY_PROBE_INTERVAL}/probe_*.sh | while read -r MY_PROBE do echo "${TIMESTAMP} Running $(basename ${MY_PROBE})" >> ${PATH_LOGS}/probe_run.log ${MY_PROBE} done

The -r flag on the while read command prevents unwanted backslashes (\) from being interpreted as shell meta-characters. The basename command shortens the probe path name for logging to just the script name and discards the path component.

Don't forget to use the chmod command to set the execute flag on the probe runner script and the measurement probes. Also be aware of user accounts, ownership and read/write access controls.

Symbolic Names For Measurements

Define a unique symbolic-name for each measurement. Use an easy to remember naming scheme and document them in your maintenance guide.

When recording a measurement, combine this symbol with the hostname to describe a specific monitoring probe identity. Timestamps support trend analysis when the measurements are aggregated.

Every script, cache file, component, database record and log file can then be consistently associated with that symbolic name. This helps the central-cortex manage and aggregate the results for analysis.

Example Monitoring Probes

Here are two example monitoring probes Their results will go into their own symbolically named cache files. They could be stored in a database table just as easily.

By doing a little extra work to create a monitoring environment and probe runner, the individual probe scripts are very simple and economically coded.

Because the monitoring probes are lightweight, running them every minute should impose almost no load on the CPU. If you want to run them less often, simply move the probe script to another scheduled interval container directory. There is no need to change the crontab.

Process counter

Save this probe script as probe_PROCESS_COUNT.sh in the every_minute directory:

SYMBOLIC_NAME="PROCESS_COUNT" source "$(dirname "$BASH_SOURCE[0]")/../../toolkit/env_build.sh" MY_COUNT=$(expr $(ps -eo pid= | wc -l) - 1) echo "${TIMESTAMP} ${HOSTNAME} ${SYMBOLIC_NAME} ${MY_COUNT}" >> "${PATH_CACHES}/${SYMBOLIC_NAME}.dat"

The count value uses nested command substitutions. Inside the deepest level, the ps command lists only the PID values and counts them with a wc command.

If 100% accuracy is important then we must eliminate the heading line and the ps command from the list of processes or the count will be 'off by two'. The wc command does not figure in the PID list because it has not yet been called to action. That may not be true in all operating systems though.

The trailing equals sign (=) on the pid flag will suppress the headings. Reduce the count by one more with an expr command substitution. Use expr for compatibility with older versions of bash.

Each measurement is tagged with a timestamp, hostname and symbolic name for aggregation in the central-cortex. Collisions and data loss are avoided because each hostname + symbol combination is unique.

Disk space usage observer

Here is another example probe to check disk space usage. This one is called probe_DISK_SPACE.sh:

SYMBOLIC_NAME="DISK_SPACE" source "$(dirname "$BASH_SOURCE[0]")/../../toolkit/env_build.sh" TARGET_VOLUME="volume1" MY_PERCENTAGE=$(df | grep "${TARGET_VOLUME}" | tr -s ' ' | cut -d ' ' -f 5) MY_UNIQUETAG="${HOSTNAME} ${SYMBOLIC_NAME} ${TARGET_VOLUME}" echo "${TIMESTAMP} ${MY_UNIQUETAG} ${MY_PERCENTAGE}" >> "${PATH_CACHES}/${SYMBOLIC_NAME}.dat"

The MY_UNIQUETAG variable is only there to shorten the echo line to avoid a line break.

Don't forget to use the chmod command to set the execute flag on the probe scripts.

Deployment Mechanisms

Use continuous integration tools to deploy the measurement kit to the remote systems on a nightly basis from a Git repository.

Only check your code into the Git repository when it is fully working and error free. The continuous integration tools run integrity checks every night. If the code is clean, it is deployed across the entire network of machines. Your Ops team will be able to set this up for you.

Do not overwrite the live data, cache and log file containers. Move them outside of the measurement kit and redefine their location in the environment builder to simplify the deployment rules. Provided your probe runner and the probe scripts have permission to access those locations, their physical path is now completely transparent.

Conclusion

The probe runner finds the measurement probes to run in each container. Relocating, adding or removing probes is very easy now. Just a simple drag and drop and no alterations to the crontab.

The central-cortex will gather the measurements from the cached results and aggregate them for display.

You might also like...

Live Sports Production: Part 3 – Evolving OB Infrastructure

Welcome to Part 3 of ‘Live Sports Production’ - This multi-part series uses a round table style format to explore the technology of live sports production with some of the industry’s leading broadcast engineers. It is a fascinating insight into w…

Monitoring & Compliance In Broadcast: Part 3 - Production Systems

‘Monitoring & Compliance In Broadcast’ explores how exemplary content production and delivery standards are maintained and legal obligations are met. The series includes four Themed Content Collections, each of which tackles a different area of the media supply chain. Part 3 con…

Building Software Defined Infrastructure: Systems & Data Flows

For broadcasters seeking to build robust workflows from software defined infrastructure, key considerations arise around data flows and the pro’s and cons of open and closed systems.

Broadcast Standards: Microservices Functionality, Routing, API’s & Analytics

Here we delve into the inner workings of microservices and how to deploy & manage them. We look at their pros and cons, the role of DevOps, Event Bus architecture, the role of API’s and the elevated need for l…

IP Monitoring & Diagnostics With Command Line Tools: Part 8 - Caching The Results

Storing monitoring outcomes in temporary cache containers separates the observation and diagnostic processes so they can run independently of the centralised marshalling and reporting process.