IP Monitoring & Diagnostics With Command Line Tools: Part 9 - Continuous Monitoring

Scheduling a continuous monitoring process will detect problems at the earliest opportunity. If the diagnostic tools run often enough, they can forecast a server outage before a mission critical failure happens. Pre-emptive diagnosis and automatic corrections are a very good thing.

Why continuous monitoring is a good idea.

A manual monitoring approach is useful when diagnosing specific problems in a single machine. In a large and increasingly complex network, automation is necessary to avoid being overwhelmed.

In high availability scenarios that support live broadcasting, a problem may arise that will eventually crash the machine if it is not rectified. Detecting this as soon as the symptoms are evident can alert the support team well in advance. They can pre-emptively correct the issue before it becomes critical.

An operating system is composed of many individual processes. There is a strict limit on how many of these can run simultaneously. A server process might spawn child processes to deal with incoming requests. If a child process loses contact with its parent, the relationship is deadlocked. The parent process waits for a response that will never arrive and the child will not quit because it cannot pass back the exit status. If this is caused by a systemic problem, other processes will stall too. Eventually, all of the process slots will be allocated and new processes cannot be created. That will halt a server completely. A forced server reboot is the only solution.

Count the processes that are prone to this happening and compare the historical values. If the count increases above a nominal threshold, remedial action can remove the cause of the failures and dispose of the defunct processes in an orderly fashion so the system can resume normal operation.

The corrective action could be invoked automatically with self-healing code. This is an additional layer of pre-emptive support over and above the defensive coding that we have already discussed.

What is cron?

There is a versatile and powerful scheduler called cron built into UNIX. Add tasks to the configuration in a cron-table file to call tools and scripts to action. The tasks can be configured to run according to a set of rules (Time-specs). For example, gather information daily, then collate it and email a report every Monday morning.

The cron daemon checks the task list every minute and will execute anything whose Time-spec matches the current date and time.

About the cron tables

The configuration for the cron scheduler is maintained via a table of tasks. Each one has a Time-spec that describes when it should run. This is the cron-table (called crontab). There are two variants of the crontab files:

System wide
Per-user

The system wide crontab is used for various housekeeping and background tasks that the OS needs to run. We should leave it alone.

The per-user cron-tables are owned by the individual accounts. The cron tasks will run under the user account to which they belong. You cannot view or alter the crontab for another user account unless you have super-user privileges.

Avoid running tasks with the root account. If the task requires elevated privileges, grant them to a special user account and use that instead.

Using the crontab command

Scheduled execution is a feature of all operating systems but it may be implemented differently on some. There are several alternative cron-table files and their paths have changed from time-to-time. Apple has replaced cron with their own launchd process. The crontab command hides these complexities from you and is easier to use than manually finding and editing the config files.

Confusingly, crontab describes a command and file that it operates on.

Use the crontab -e command to edit the per-user crontab files. It knows where they live and can find the right one. Opening the crontab will create a new and empty file if it does not already exist:

crontab -e

The crontab will be opened with the default text editor. Use a different editor by adding this special variable export instruction to your login profile:

export EDITOR={path-to-your-preferred-editor}

List your own crontab to see the changes with the listing-flag (small letter L):

crontab -l

Beware: Do not use the crontab command without parameters. It will replace your personal crontab with an empty file and your tasks will be removed. If you do this accidentally, abort your editing session with a [CONTROL] + [C] keystroke to leave without overwriting the file.

When you exit and save the changes, the crontab -e command should signal the cron daemon to reload the configuration to activate the new tasks. If this does not happen automatically, reload it manually like this:

kill -HUP {cron-process-PID-value}

Use command substitution to build a signalling instruction (line-breaks added for clarity):

kill -HUP $(ps -aux | grep -i "\/crond" | grep -v grep | tr -s ' ' | cut -d ' ' -f 2)

The grep commands filter the ps listing to extract the line we need. The second is needed to discard the first grep command from the list. The tr and cut commands return the PID number from the result. The substitution passes the PID number to the kill command.

Although the command is named kill, it should be called something more benign because it sends signals to processes.

Configure the run-time environment

The run-time environment can be altered with optional special variables at the head of the crontab:

Definition	Description
SHELL=/bin/bash	Override the default shell for the user account.
MAILTO=anotheruser	All output from the task is sent by email unless it is redirected. Define the recipient here.
CRON_TZ=London	Localise the task to run with a different time-zone setting.

Note: This environment will apply to all tasks described in the crontab.

Crontab task entries

The format of a crontab line is very simple. There are five space-separated values to describe a Time-spec value when the task will be called to action. The rest of the line describes the command to be run:

{time-spec} {task-command-line}

Tasks are deactivated with a hash character (#) prefix. This prevents the task from being scheduled but keeps it intact for later use.

#{time-spec} {task-command-line}

Embedded percent signs (%) represent newline characters. The second and subsequent virtual lines are redirected to the standard input of the command described prior to the first percent sign.

{time-spec} {task-command-line}%{redirected-to-stdin}

Redirecting the output of the command to /dev/null (or any other file) inhibits the mail message containing the task output.

{time-spec} {task-command-line} > /dev/null

Time-spec format

The space-separated Time-spec describes when a task is scheduled to run:

{minute} {hour} {day-of-month} {month} {day-of-week}

Field	Value range
{minute}	0 to 59
{hour}	0 to 23
{day-of-month}	1 to 31 depending on the month.
{month}	1 to 12 or a three-letter abbreviation.
{day-of-week}	0 to 6 (Sunday to Saturday) or a three-letter abbreviation.

Use a wildcard asterisk (*) to match all possible values. A range of values can be specified with a dash character (-) and a comma (,) can be used to separate a list of values or ranges.

The task will run if either or both the {day-of-week} and the {day-of-month}+{month} patterns match the current day.

Here are some Time-spec examples:

Time-spec	Description and example purpose
*0 8 * 1**	8:00 AM Monday - Deliver a weekly report.
0 4 * * *	4:00 AM every morning - Run a garbage collection task.
*** * * * ***	Run every minute - Measure disk space, count processes or check workflow queues for stalled jobs, intrusion checks.
0 * * * *	Run once an hour - Database backups.
0 0 * * *	At midnight - Rotate the log files.
*0 0 * 0**	Every week at midnight on Sunday - Analyse data for reports.
0 0 1 * *	Every month on the first morning - Housekeeping tasks.
0 0 1 3,6,9,12 *	Every 3 months - Compile reports.
0 0 1 1 *	New Year's Day - Big garbage collection.

The complete crontab line for delivering a weekly report looks like this. Email is inhibited here because that would be handled inside the script:

0 8 * * 1 /my_tools/run_weekly_report.sh > /dev/null

Deploying tasks

The crontab tool is easy to use but accessing it from a dashboard implemented in PHP is difficult.

Adding a layer of abstraction can simplify your architecture at the expense of a little extra coding. Using data-driven techniques to let the file-system work for you results in more flexible designs.

Implement a task manager written as a shell-script. The task manager is called by cron but loads plug-in tasks from a folder. These are picked up with a ls command and passed to a while loop to execute them one-by-one. Tasks can be added or removed without needing to rebuild the crontab. We will explore this idea in more detail soon.

Conclusion

Build monitoring tasks with simple components and defensive coding techniques. Implement self-healing code to fix problems automatically. Almost no maintenance is required after deployment unless you alter something they depend on. Strive for elegant simplicity.

You might also like...

IP Security For Broadcasters: Part 12 - Zero Trust

As users working from home are no longer limited to their working environment by the concept of a physical location, and infrastructures are moving more and more to the cloud-hybrid approach, the outdated concept of perimeter security is moving aside…

IP Security For Broadcasters: Part 11 - EBU R143 Security Recommendations

EBU R143 formalizes security practices for both broadcasters and vendors. This comprehensive list should be at the forefront of every broadcaster’s and vendor’s thoughts when designing and implementing IP media facilities.

IP Security For Broadcasters: Part 10 - NATS Advanced Messaging

As IT and broadcast infrastructures become ever more complex, the need to securely exchange data is becoming more challenging. NATS messaging is designed to simplify collaboration between often diverse software applications.

IP Security For Broadcasters: Part 9 - NMOS Security

NMOS has succeeded in providing interoperability between media devices on IP infrastructures, and there are provisions within the specifications to help maintain system security.

IP Security For Broadcasters: Part 8 - RADIUS Network Access

Maintaining controlled access is critical for any secure network, especially when working with high-value media in broadcast environments.