IP Monitoring & Diagnostics With Command Line Tools: Part 8 - Caching The Results

Storing monitoring outcomes in temporary cache containers separates the observation and diagnostic processes so they can run independently of the centralised marshalling and reporting process.


More articles in this series:


Maintaining security and integrity is important. Taking measurements in satellite-nodes and transferring them to a central-cortex are two separate activities. Measuring techniques might require higher levels of privilege than is necessary for simply aggregating results. Decouple them and run each one separately with just the right amount of permissions. This reduces the attack surface for possible intrusions.

Why caching is a good idea

Decouple measurement and aggregation to improve security. The simplified logic also reduces the risk of things going wrong. After the first run, a default result is always available, resulting in fewer errors (although it may not be up to date if the monitor is stalled).

There are four basic techniques:

  • Record single measurements in a file
  • Append observations to a log
  • Capture measurements in a rotating buffer and vote on the result
  • Store measurements in a database table

The observations take place in the satellite-nodes where they are cached and the central-cortex independently retrieves the results from the satellite caches when it needs them.

Caching single results in a file

Use this technique for storing the live result of a measurement. It might be a count of processes, open files or whether a server process is running or stopped.

The single caret (>) I/O redirection always overwrites older values with the latest result.

Create a cache container (my_cache) for your files in the /var/log folder.

Set appropriate file ownership and read access permissions as each file is created. Use the chown command to set the owner, the chgrp command to set the group and the chmod command to set the access permissions.

Beware that if you set the owner to be a different account and restrict the permissions, you may not be able to overwrite the file with a new measurement.

Include the monitoring and fetching accounts in the same group so they can share access. The chmod 640 value gives read-write access to the original file owner and allows other users in the same group to only read the file. Everyone else is denied access.

This example stores the measurement and sets up the file permissions:

echo "My test results" > /var/log/my_cache/result.dat
chmod 640 /var/log/my_cache/result.dat

The central-cortex would pull the file across like this:

scp {user}@{hostname}:/var/log/my_cache/result.dat {local_file_name}

Logging the output as a list

Use the double caret (>>) redirection to build up a time-based log of activity. Then analyse for trends after a historical data set is compiled.

Design a rigid convention for the format so that you can analyse the logs consistently later on. Each line should be constructed like this:

Item Description
Date ISO notation (YYYY-MM-DD)
Time Use 24-hour notation (HH:MM)
Symbolic name Identifies the measurement. Filter with this when different observations share the same log file.
Status Indicates a routine value or an exception of some kind:
• Info
• Warning
• Error
• Fatal
Description A textual description of the log entry.

Choose a unique character to separate each item.

Unattended log files grow very large. Running a scheduled job to compress and archive them every day keeps your system neat and tidy.

There are other system logs that you might find useful. Many of these also live in the /var/log folder. Some services store their logs differently but they are not hard to find.

Use a rotating buffer

Intermittent failures trigger false warnings if they are observed as a single event. Record half a dozen readings at one-minute intervals and count how many failures are captured. Trigger the warning when the vote is unanimous.

Trim the input file with a tail command whenever a new result is recorded. Redirecting an input file back to itself will destroy it because the empty output file is created first. Avoid this with the atomic file name technique.

echo {yes_or_no} >> rotating_buffer.dat

cat rotating_buffer.dat | tail -6 > rotating_buffer.dat_
mv rotating_buffer.dat_ rotating_buffer.dat

COUNT=$(cat rotating_buffer.dat | grep "NO" | wc -l | tr -d ' ')

if [ "${COUNT}" -eq "6" ]
then
   echo "Six consecutive failures - Call for help"
fi

This was used in a high-availability server where a pager call was triggered only after six consecutive NO results. It prevented unwarranted call-outs for the engineers.

Use a database instead of a log file

A database is useful for recording a history of measurements to analyse trends over a very long time. This is better than log files because it avoids log rotation.

The operations team can set up a database for you. It needs to have a minimally privileged user account that allows remote access to write new data. The table configuration is done with a more powerful account.

The results table should have these columns:

Column Description
KEY A primary key identifies individual measurements so they can be accessed or edited.
SYMBOL Use the symbolic name for filtering.
TIME_STAMP The timestamp for the measurement supports trend analysis. Use the ISO date format: YYYY-MM-DD HH:MM:SS.
DATA_TYPE Describe the measurement using one of a limited set of symbolic data types.
UNITS Describe the units of measure because all the measurements will be collated in the same table.
VALUE The specific value of the measurement is stored separately to facilitate arithmetic operations.

 

Use SQL from the command line

There are three useful techniques to understand when using a mysql command directly from inside a shell script to write data to the database:

• Direct execution of SQL queries from the command line
• Source running SQL queries from a separate file
• Running embedded SQL queries with input-redirection

Avoid storing account credentials in scripted commands, because they are visible in ps listings that can be viewed by other users.

Instead, create a file called .my.cnf in your home directory. Configure the database access credentials there without hard wiring them into the scripts. Note the leading dot on the custom config file name. Here is an example:

[client]
user = {db-user-name}
password = {password}

Note that this user name is an account within the database and not an operating system user account.

Where you would previously need to type a command like this:

mysql -u {db-user-name} -p {password}

Now, you only need to type the mysql command on its own without the account name and password.

Protect the file against intruders by setting the file ownership permissions with this command:

chmod 400 .my.cnf

Now it can only be read by the owning account.

If the database is running on a different machine, include the host and port details provided by your operations team. We will omit those in subsequent examples for simplicity:

mysql -h {hostname} -p {port number}

Omit the target database name from the configuration file to avoid associating everything with one single database for all tables.

Directly executing SQL from a shell script

Execute queries directly by adding the -execute flag followed by some SQL instructions. The -e flag is a useful abbreviation. This example displays the server version:

mysql -e STATUS | grep "^ Server version"

Source running SQL scripts

Encapsulate more complex queries into a separate SQL script file and redirect any messages into a log file:

mysql my_example_db < script.sql > output.log

Specify the target database name on the command line. Alternatively add a USE my_example_db instruction at the start of the SQL script. The query is now more robust because the target database is integral to the script.

USE my_example_db;
SHOW TABLES;

Redirecting embedded SQL to standard input

A here-document is an embedded stream of text that is redirected to the standard-input of a command. The redirection stops when the terminating tag is encountered.

#!/bin/sh

mysql my_database <<SQL_QUERY_SOURCE
SELECT COUNT(*)
FROM my_table_name
WHERE MEASUREMENT_SYMBOL="FILE_COUNT"
SQL_QUERY_SOURCE

Note that there must not be a space after the redirecting carets (<<) and the terminating tag must be spelled consistently.

Passing arguments from shell scripts

Modify the query source code with passed parameters from your script when you describe the SQL. This works with the -execute flag or a here document.

#!/bin/sh

PARAMETER="my_table_name"

mysql my_database << SQL_QUERY_SOURCE
SELECT COUNT(*) FROM $PARAMETER;
SQL_QUERY_SOURCE

Aggregating the results

The central-cortex can gather the results from the caches in the satellite-nodes. Or it can query the database for results that are recorded there. Any third-party systems can be accessed via HTTP and if necessary, their measurements can be written to the database or aggregated with the rest of the locally cached data in the cortex.

Conclusion

Caching makes our systems more robust and secure. The design becomes architecturally very simple for nodes that we own and build. The third-party machines should provide API access via HTTP as we discussed earlier.

The measurement data has a lifecycle like this:

• Measurements captured by privileged commands in satellite-nodes
• Cache long form results in data files
• Capture ongoing events into log files
• Capture state values to rotating buffers
• Write single measurements into a database
• Results gathered back to the central-cortex for aggregation
• Optional acknowledgement sent back to satellite-nodes in some way so they can garbage collect
• Access third party systems via HTTP from the central-cortex and if necessary, store those observations in a database
• Satellite-nodes clean up any temporary cached files that do not need to be retained

You might also like...

Why AI Won’t Roll Out In Broadcasting As Quickly As You’d Think

We’ve all witnessed its phenomenal growth recently. The question is: how do we manage the process of adopting and adjusting to AI in the broadcasting industry? This article is more about our approach than specific examples of AI integration;…

Designing IP Broadcast Systems: Integrating Cloud Infrastructure

Connecting on-prem broadcast infrastructures to the public cloud leads to a hybrid system which requires reliable secure high value media exchange and delivery.

Video Quality: Part 1 - Video Quality Faces New Challenges In Generative AI Era

In this first in a new series about Video Quality, we look at how the continuing proliferation of User Generated Content has brought new challenges for video quality assurance, with AI in turn helping address some of them. But new…

Minimizing OTT Churn Rates Through Viewer Engagement

A D2C streaming service requires an understanding of satisfaction with the service – the quality of it, the ease of use, the style of use – which requires the right technology and a focused information-gathering approach.

Production Control Room Tools At NAB 2024

As we approach the 2024 NAB Show we discuss the increasing demands placed on production control rooms and their crew, and the technologies coming to market in this key area of live broadcast production.