IP Monitoring & Diagnostics With Command Line Tools: Part 9 - Continuous Monitoring
Scheduling a continuous monitoring process will detect problems at the earliest opportunity. If the diagnostic tools run often enough, they can forecast a server outage before a mission critical failure happens. Pre-emptive diagnosis and automatic corrections are a very good thing.
More articles in this series:
Continuous monitoring is a powerful tool for predicting failures when the system exhibits symptoms that are difficult to spot with a manual inspection. Some observations need to be made more often than others to detect a pattern. A flexible solution that is easy to maintain and extend can be built using operating system services as a foundation.
Why continuous monitoring is a good idea.
A manual monitoring approach is useful when diagnosing specific problems in a single machine. In a large and increasingly complex network, automation is necessary to avoid being overwhelmed.
In high availability scenarios that support live broadcasting, a problem may arise that will eventually crash the machine if it is not rectified. Detecting this as soon as the symptoms are evident can alert the support team well in advance. They can pre-emptively correct the issue before it becomes critical.
An operating system is composed of many individual processes. There is a strict limit on how many of these can run simultaneously. A server process might spawn child processes to deal with incoming requests. If a child process loses contact with its parent, the relationship is deadlocked. The parent process waits for a response that will never arrive and the child will not quit because it cannot pass back the exit status. If this is caused by a systemic problem, other processes will stall too. Eventually, all of the process slots will be allocated and new processes cannot be created. That will halt a server completely. A forced server reboot is the only solution.
Count the processes that are prone to this happening and compare the historical values. If the count increases above a nominal threshold, remedial action can remove the cause of the failures and dispose of the defunct processes in an orderly fashion so the system can resume normal operation.
The corrective action could be invoked automatically with self-healing code. This is an additional layer of pre-emptive support over and above the defensive coding that we have already discussed.
What is cron?
There is a versatile and powerful scheduler called cron built into UNIX. Add tasks to the configuration in a cron-table file to call tools and scripts to action. The tasks can be configured to run according to a set of rules (Time-specs). For example, gather information daily, then collate it and email a report every Monday morning.
The cron daemon checks the task list every minute and will execute anything whose Time-spec matches the current date and time.
About the cron tables
The configuration for the cron scheduler is maintained via a table of tasks. Each one has a Time-spec that describes when it should run. This is the cron-table (called crontab). There are two variants of the crontab files:
- System wide
- Per-user
The system wide crontab is used for various housekeeping and background tasks that the OS needs to run. We should leave it alone.
The per-user cron-tables are owned by the individual accounts. The cron tasks will run under the user account to which they belong. You cannot view or alter the crontab for another user account unless you have super-user privileges.
Avoid running tasks with the root account. If the task requires elevated privileges, grant them to a special user account and use that instead.
Using the crontab command
Scheduled execution is a feature of all operating systems but it may be implemented differently on some. There are several alternative cron-table files and their paths have changed from time-to-time. Apple has replaced cron with their own launchd process. The crontab command hides these complexities from you and is easier to use than manually finding and editing the config files.
Confusingly, crontab describes a command and file that it operates on.
Use the crontab -e command to edit the per-user crontab files. It knows where they live and can find the right one. Opening the crontab will create a new and empty file if it does not already exist:
crontab -e
The crontab will be opened with the default text editor. Use a different editor by adding this special variable export instruction to your login profile:
export EDITOR={path-to-your-preferred-editor}
List your own crontab to see the changes with the listing-flag (small letter L):
crontab -l
Beware: Do not use the crontab command without parameters. It will replace your personal crontab with an empty file and your tasks will be removed. If you do this accidentally, abort your editing session with a [CONTROL] + [C] keystroke to leave without overwriting the file.
When you exit and save the changes, the crontab -e command should signal the cron daemon to reload the configuration to activate the new tasks. If this does not happen automatically, reload it manually like this:
kill -HUP {cron-process-PID-value}
Use command substitution to build a signalling instruction (line-breaks added for clarity):
kill -HUP $(ps -aux |
grep -i "\/crond" |
grep -v grep |
tr -s ' ' |
cut -d ' ' -f 2)
The grep commands filter the ps listing to extract the line we need. The second is needed to discard the first grep command from the list. The tr and cut commands return the PID number from the result. The substitution passes the PID number to the kill command.
Although the command is named kill, it should be called something more benign because it sends signals to processes.
Configure the run-time environment
The run-time environment can be altered with optional special variables at the head of the crontab:
Definition | Description |
---|---|
SHELL=/bin/bash | Override the default shell for the user account. |
MAILTO=anotheruser | All output from the task is sent by email unless it is redirected. Define the recipient here. |
CRON_TZ=London | Localise the task to run with a different time-zone setting. |
Note: This environment will apply to all tasks described in the crontab.
Crontab task entries
The format of a crontab line is very simple. There are five space-separated values to describe a Time-spec value when the task will be called to action. The rest of the line describes the command to be run:
{time-spec} {task-command-line}
Tasks are deactivated with a hash character (#) prefix. This prevents the task from being scheduled but keeps it intact for later use.
#{time-spec} {task-command-line}
Embedded percent signs (%) represent newline characters. The second and subsequent virtual lines are redirected to the standard input of the command described prior to the first percent sign.
{time-spec} {task-command-line}%{redirected-to-stdin}
Redirecting the output of the command to /dev/null (or any other file) inhibits the mail message containing the task output.
{time-spec} {task-command-line} > /dev/null
Time-spec format
The space-separated Time-spec describes when a task is scheduled to run:
{minute} {hour} {day-of-month} {month} {day-of-week}
Field | Value range |
---|---|
{minute} | 0 to 59 |
{hour} | 0 to 23 |
{day-of-month} | 1 to 31 depending on the month. |
{month} | 1 to 12 or a three-letter abbreviation. |
{day-of-week} | 0 to 6 (Sunday to Saturday) or a three-letter abbreviation. |
Use a wildcard asterisk (*) to match all possible values. A range of values can be specified with a dash character (-) and a comma (,) can be used to separate a list of values or ranges.
The task will run if either or both the {day-of-week} and the {day-of-month}+{month} patterns match the current day.
Here are some Time-spec examples:
Time-spec | Description and example purpose |
---|---|
0 8 * * 1 | 8:00 AM Monday - Deliver a weekly report. |
0 4 * * * | 4:00 AM every morning - Run a garbage collection task. |
* * * * * | Run every minute - Measure disk space, count processes or check workflow queues for stalled jobs, intrusion checks. |
0 * * * * | Run once an hour - Database backups. |
0 0 * * * | At midnight - Rotate the log files. |
0 0 * * 0 | Every week at midnight on Sunday - Analyse data for reports. |
0 0 1 * * | Every month on the first morning - Housekeeping tasks. |
0 0 1 3,6,9,12 * | Every 3 months - Compile reports. |
0 0 1 1 * | New Year's Day - Big garbage collection. |
The complete crontab line for delivering a weekly report looks like this. Email is inhibited here because that would be handled inside the script:
0 8 * * 1 /my_tools/run_weekly_report.sh > /dev/null
Deploying tasks
The crontab tool is easy to use but accessing it from a dashboard implemented in PHP is difficult.
Adding a layer of abstraction can simplify your architecture at the expense of a little extra coding. Using data-driven techniques to let the file-system work for you results in more flexible designs.
Implement a task manager written as a shell-script. The task manager is called by cron but loads plug-in tasks from a folder. These are picked up with a ls command and passed to a while loop to execute them one-by-one. Tasks can be added or removed without needing to rebuild the crontab. We will explore this idea in more detail soon.
Conclusion
Build monitoring tasks with simple components and defensive coding techniques. Implement self-healing code to fix problems automatically. Almost no maintenance is required after deployment unless you alter something they depend on. Strive for elegant simplicity.
You might also like...
Designing IP Broadcast Systems: NMOS
SMPTE have delivered reliable low latency video and audio distribution over IP networks, but it’s NMOS that is delivering solutions to discovery & registration challenges that satisfy operational requirements.
HDR & WCG For Broadcast - HDR Picture Fundamentals: Color
How humans perceive color and the various compromises involved in representing color, using the historical iterations of display technology.
Audio At IBC 2024
Great audio is fundamental to any great broadcast and professional audio remains one of the busiest areas of the show both in terms of number of exhibitors and innovative new technologies on show. IP and cloud developments seem set to…
Network Orchestration & Monitoring At IBC 2024
Software defined systems is one of the hottest topics of the broadcast industry and IBC will be the perfect opportunity to get first hand demonstrations and expert advice from the vendors at the forefront of the leading edge of the…
Encoding & Transport For Remote Contribution At IBC 2024
The technology required to get high quality content from the venue to the viewer for live sports production remains an area of intense research and development, so there will be plenty of innovation and expertise in this area on the…