IP Monitoring & Diagnostics With Command Line Tools: Part 8 - Caching The Results

Storing monitoring outcomes in temporary cache containers separates the observation and diagnostic processes so they can run independently of the centralised marshalling and reporting process.

Why caching is a good idea

Decouple measurement and aggregation to improve security. The simplified logic also reduces the risk of things going wrong. After the first run, a default result is always available, resulting in fewer errors (although it may not be up to date if the monitor is stalled).

There are four basic techniques:

Record single measurements in a file
Append observations to a log
Capture measurements in a rotating buffer and vote on the result
Store measurements in a database table

The observations take place in the satellite-nodes where they are cached and the central-cortex independently retrieves the results from the satellite caches when it needs them.

Caching single results in a file

Use this technique for storing the live result of a measurement. It might be a count of processes, open files or whether a server process is running or stopped.

The single caret (>) I/O redirection always overwrites older values with the latest result.

Create a cache container (my_cache) for your files in the /var/log folder.

Set appropriate file ownership and read access permissions as each file is created. Use the chown command to set the owner, the chgrp command to set the group and the chmod command to set the access permissions.

Beware that if you set the owner to be a different account and restrict the permissions, you may not be able to overwrite the file with a new measurement.

Include the monitoring and fetching accounts in the same group so they can share access. The chmod 640 value gives read-write access to the original file owner and allows other users in the same group to only read the file. Everyone else is denied access.

This example stores the measurement and sets up the file permissions:

echo "My test results" > /var/log/my_cache/result.dat chmod 640 /var/log/my_cache/result.dat

The central-cortex would pull the file across like this:

scp {user}@{hostname}:/var/log/my_cache/result.dat {local_file_name}

Logging the output as a list

Use the double caret (>>) redirection to build up a time-based log of activity. Then analyse for trends after a historical data set is compiled.

Design a rigid convention for the format so that you can analyse the logs consistently later on. Each line should be constructed like this:

Item	Description
Date	ISO notation (YYYY-MM-DD)
Time	Use 24-hour notation (HH:MM)
Symbolic name	Identifies the measurement. Filter with this when different observations share the same log file.
Status	Indicates a routine value or an exception of some kind: • Info • Warning • Error • Fatal
Description	A textual description of the log entry.

Choose a unique character to separate each item.

Unattended log files grow very large. Running a scheduled job to compress and archive them every day keeps your system neat and tidy.

There are other system logs that you might find useful. Many of these also live in the /var/log folder. Some services store their logs differently but they are not hard to find.

Use a rotating buffer

Intermittent failures trigger false warnings if they are observed as a single event. Record half a dozen readings at one-minute intervals and count how many failures are captured. Trigger the warning when the vote is unanimous.

Trim the input file with a tail command whenever a new result is recorded. Redirecting an input file back to itself will destroy it because the empty output file is created first. Avoid this with the atomic file name technique.

echo {yes_or_no} >> rotating_buffer.dat cat rotating_buffer.dat | tail -6 > rotating_buffer.dat_ mv rotating_buffer.dat_ rotating_buffer.dat COUNT=$(cat rotating_buffer.dat | grep "NO" | wc -l | tr -d ' ') if [ "${COUNT}" -eq "6" ] then echo "Six consecutive failures - Call for help" fi

This was used in a high-availability server where a pager call was triggered only after six consecutive NO results. It prevented unwarranted call-outs for the engineers.

Use a database instead of a log file

A database is useful for recording a history of measurements to analyse trends over a very long time. This is better than log files because it avoids log rotation.

The operations team can set up a database for you. It needs to have a minimally privileged user account that allows remote access to write new data. The table configuration is done with a more powerful account.

The results table should have these columns:

Column	Description
KEY	A primary key identifies individual measurements so they can be accessed or edited.
SYMBOL	Use the symbolic name for filtering.
TIME_STAMP	The timestamp for the measurement supports trend analysis. Use the ISO date format: YYYY-MM-DD HH:MM:SS.
DATA_TYPE	Describe the measurement using one of a limited set of symbolic data types.
UNITS	Describe the units of measure because all the measurements will be collated in the same table.
VALUE	The specific value of the measurement is stored separately to facilitate arithmetic operations.

Use SQL from the command line

There are three useful techniques to understand when using a mysql command directly from inside a shell script to write data to the database:

• Direct execution of SQL queries from the command line
• Source running SQL queries from a separate file
• Running embedded SQL queries with input-redirection

Avoid storing account credentials in scripted commands, because they are visible in ps listings that can be viewed by other users.

Instead, create a file called .my.cnf in your home directory. Configure the database access credentials there without hard wiring them into the scripts. Note the leading dot on the custom config file name. Here is an example:

[client] user = {db-user-name} password = {password}

Note that this user name is an account within the database and not an operating system user account.

Where you would previously need to type a command like this:

mysql -u {db-user-name} -p {password}

Now, you only need to type the mysql command on its own without the account name and password.

Protect the file against intruders by setting the file ownership permissions with this command:

chmod 400 .my.cnf

Now it can only be read by the owning account.

If the database is running on a different machine, include the host and port details provided by your operations team. We will omit those in subsequent examples for simplicity:

mysql -h {hostname} -p {port number}

Omit the target database name from the configuration file to avoid associating everything with one single database for all tables.

Directly executing SQL from a shell script

Execute queries directly by adding the -execute flag followed by some SQL instructions. The -e flag is a useful abbreviation. This example displays the server version:

mysql -e STATUS | grep "^ Server version"

Source running SQL scripts

Encapsulate more complex queries into a separate SQL script file and redirect any messages into a log file:

mysql my_example_db < script.sql > output.log

Specify the target database name on the command line. Alternatively add a USE my_example_db instruction at the start of the SQL script. The query is now more robust because the target database is integral to the script.

USE my_example_db; SHOW TABLES;

Redirecting embedded SQL to standard input

A here-document is an embedded stream of text that is redirected to the standard-input of a command. The redirection stops when the terminating tag is encountered.

#!/bin/sh mysql my_database <<SQL_QUERY_SOURCE SELECT COUNT(*) FROM my_table_name WHERE MEASUREMENT_SYMBOL="FILE_COUNT" SQL_QUERY_SOURCE

Note that there must not be a space after the redirecting carets (<<) and the terminating tag must be spelled consistently.

Passing arguments from shell scripts

Modify the query source code with passed parameters from your script when you describe the SQL. This works with the -execute flag or a here document.

#!/bin/sh PARAMETER="my_table_name" mysql my_database << SQL_QUERY_SOURCE SELECT COUNT(*) FROM $PARAMETER; SQL_QUERY_SOURCE

Aggregating the results

The central-cortex can gather the results from the caches in the satellite-nodes. Or it can query the database for results that are recorded there. Any third-party systems can be accessed via HTTP and if necessary, their measurements can be written to the database or aggregated with the rest of the locally cached data in the cortex.

Conclusion

Caching makes our systems more robust and secure. The design becomes architecturally very simple for nodes that we own and build. The third-party machines should provide API access via HTTP as we discussed earlier.

The measurement data has a lifecycle like this:

• Measurements captured by privileged commands in satellite-nodes
• Cache long form results in data files
• Capture ongoing events into log files
• Capture state values to rotating buffers
• Write single measurements into a database
• Results gathered back to the central-cortex for aggregation
• Optional acknowledgement sent back to satellite-nodes in some way so they can garbage collect
• Access third party systems via HTTP from the central-cortex and if necessary, store those observations in a database
• Satellite-nodes clean up any temporary cached files that do not need to be retained

You might also like...

IP Monitoring & Diagnostics With Command Line Tools: Part 7 - Remote Agents

How to run diagnostic processes in each machine and call them remotely from a centralised system that can marshal the results from many other networked systems. Remote agents act on behalf of that central system and pass results back to…

Growing Momentum For 5G In Remote Production

A combination of factors that includes new 3GPP 5G standards & optimizations that have reduced latencies & jitter, new network slicing capabilities and the availability of new LEO satellite services are bringing increasing momentum to the use of 5G for…

Building Software Defined Infrastructure: Part 4 - Integration

Welcome to Part 4 of Building Software Defined Infrastructure. This multi-part content series from Tony Orme explores the microservices based IT technologies that are driving the next phase of transition from hardware to software based broadcast systems. This series is essential…

Monitoring & Compliance In Broadcast: Accessibility & The Impact Of AI

The proliferation of delivery devices and formats increases the challenges presented by accessibility compliance, but it is an area of rapid AI powered innovation.

IP Monitoring & Diagnostics With Command Line Tools: Part 6 - Advanced Command Line Tools

We continue our series with some small code examples that will make your monitoring and diagnostic scripts more robust and reliable