IP Monitoring & Diagnostics With Command Line Tools: Part 8 - Caching The Results

Storing monitoring outcomes in temporary cache containers separates the observation and diagnostic processes so they can run independently of the centralised marshalling and reporting process.


More articles in this series:


Maintaining security and integrity is important. Taking measurements in satellite-nodes and transferring them to a central-cortex are two separate activities. Measuring techniques might require higher levels of privilege than is necessary for simply aggregating results. Decouple them and run each one separately with just the right amount of permissions. This reduces the attack surface for possible intrusions.

Why caching is a good idea

Decouple measurement and aggregation to improve security. The simplified logic also reduces the risk of things going wrong. After the first run, a default result is always available, resulting in fewer errors (although it may not be up to date if the monitor is stalled).

There are four basic techniques:

  • Record single measurements in a file
  • Append observations to a log
  • Capture measurements in a rotating buffer and vote on the result
  • Store measurements in a database table

The observations take place in the satellite-nodes where they are cached and the central-cortex independently retrieves the results from the satellite caches when it needs them.

Caching single results in a file

Use this technique for storing the live result of a measurement. It might be a count of processes, open files or whether a server process is running or stopped.

The single caret (>) I/O redirection always overwrites older values with the latest result.

Create a cache container (my_cache) for your files in the /var/log folder.

Set appropriate file ownership and read access permissions as each file is created. Use the chown command to set the owner, the chgrp command to set the group and the chmod command to set the access permissions.

Beware that if you set the owner to be a different account and restrict the permissions, you may not be able to overwrite the file with a new measurement.

Include the monitoring and fetching accounts in the same group so they can share access. The chmod 640 value gives read-write access to the original file owner and allows other users in the same group to only read the file. Everyone else is denied access.

This example stores the measurement and sets up the file permissions:

echo "My test results" > /var/log/my_cache/result.dat
chmod 640 /var/log/my_cache/result.dat

The central-cortex would pull the file across like this:

scp {user}@{hostname}:/var/log/my_cache/result.dat {local_file_name}

Logging the output as a list

Use the double caret (>>) redirection to build up a time-based log of activity. Then analyse for trends after a historical data set is compiled.

Design a rigid convention for the format so that you can analyse the logs consistently later on. Each line should be constructed like this:

Item Description
Date ISO notation (YYYY-MM-DD)
Time Use 24-hour notation (HH:MM)
Symbolic name Identifies the measurement. Filter with this when different observations share the same log file.
Status Indicates a routine value or an exception of some kind:
• Info
• Warning
• Error
• Fatal
Description A textual description of the log entry.

Choose a unique character to separate each item.

Unattended log files grow very large. Running a scheduled job to compress and archive them every day keeps your system neat and tidy.

There are other system logs that you might find useful. Many of these also live in the /var/log folder. Some services store their logs differently but they are not hard to find.

Use a rotating buffer

Intermittent failures trigger false warnings if they are observed as a single event. Record half a dozen readings at one-minute intervals and count how many failures are captured. Trigger the warning when the vote is unanimous.

Trim the input file with a tail command whenever a new result is recorded. Redirecting an input file back to itself will destroy it because the empty output file is created first. Avoid this with the atomic file name technique.

echo {yes_or_no} >> rotating_buffer.dat

cat rotating_buffer.dat | tail -6 > rotating_buffer.dat_
mv rotating_buffer.dat_ rotating_buffer.dat

COUNT=$(cat rotating_buffer.dat | grep "NO" | wc -l | tr -d ' ')

if [ "${COUNT}" -eq "6" ]
then
   echo "Six consecutive failures - Call for help"
fi

This was used in a high-availability server where a pager call was triggered only after six consecutive NO results. It prevented unwarranted call-outs for the engineers.

Use a database instead of a log file

A database is useful for recording a history of measurements to analyse trends over a very long time. This is better than log files because it avoids log rotation.

The operations team can set up a database for you. It needs to have a minimally privileged user account that allows remote access to write new data. The table configuration is done with a more powerful account.

The results table should have these columns:

Column Description
KEY A primary key identifies individual measurements so they can be accessed or edited.
SYMBOL Use the symbolic name for filtering.
TIME_STAMP The timestamp for the measurement supports trend analysis. Use the ISO date format: YYYY-MM-DD HH:MM:SS.
DATA_TYPE Describe the measurement using one of a limited set of symbolic data types.
UNITS Describe the units of measure because all the measurements will be collated in the same table.
VALUE The specific value of the measurement is stored separately to facilitate arithmetic operations.

 

Use SQL from the command line

There are three useful techniques to understand when using a mysql command directly from inside a shell script to write data to the database:

• Direct execution of SQL queries from the command line
• Source running SQL queries from a separate file
• Running embedded SQL queries with input-redirection

Avoid storing account credentials in scripted commands, because they are visible in ps listings that can be viewed by other users.

Instead, create a file called .my.cnf in your home directory. Configure the database access credentials there without hard wiring them into the scripts. Note the leading dot on the custom config file name. Here is an example:

[client]
user = {db-user-name}
password = {password}

Note that this user name is an account within the database and not an operating system user account.

Where you would previously need to type a command like this:

mysql -u {db-user-name} -p {password}

Now, you only need to type the mysql command on its own without the account name and password.

Protect the file against intruders by setting the file ownership permissions with this command:

chmod 400 .my.cnf

Now it can only be read by the owning account.

If the database is running on a different machine, include the host and port details provided by your operations team. We will omit those in subsequent examples for simplicity:

mysql -h {hostname} -p {port number}

Omit the target database name from the configuration file to avoid associating everything with one single database for all tables.

Directly executing SQL from a shell script

Execute queries directly by adding the -execute flag followed by some SQL instructions. The -e flag is a useful abbreviation. This example displays the server version:

mysql -e STATUS | grep "^ Server version"

Source running SQL scripts

Encapsulate more complex queries into a separate SQL script file and redirect any messages into a log file:

mysql my_example_db < script.sql > output.log

Specify the target database name on the command line. Alternatively add a USE my_example_db instruction at the start of the SQL script. The query is now more robust because the target database is integral to the script.

USE my_example_db;
SHOW TABLES;

Redirecting embedded SQL to standard input

A here-document is an embedded stream of text that is redirected to the standard-input of a command. The redirection stops when the terminating tag is encountered.

#!/bin/sh

mysql my_database <<SQL_QUERY_SOURCE
SELECT COUNT(*)
FROM my_table_name
WHERE MEASUREMENT_SYMBOL="FILE_COUNT"
SQL_QUERY_SOURCE

Note that there must not be a space after the redirecting carets (<<) and the terminating tag must be spelled consistently.

Passing arguments from shell scripts

Modify the query source code with passed parameters from your script when you describe the SQL. This works with the -execute flag or a here document.

#!/bin/sh

PARAMETER="my_table_name"

mysql my_database << SQL_QUERY_SOURCE
SELECT COUNT(*) FROM $PARAMETER;
SQL_QUERY_SOURCE

Aggregating the results

The central-cortex can gather the results from the caches in the satellite-nodes. Or it can query the database for results that are recorded there. Any third-party systems can be accessed via HTTP and if necessary, their measurements can be written to the database or aggregated with the rest of the locally cached data in the cortex.

Conclusion

Caching makes our systems more robust and secure. The design becomes architecturally very simple for nodes that we own and build. The third-party machines should provide API access via HTTP as we discussed earlier.

The measurement data has a lifecycle like this:

• Measurements captured by privileged commands in satellite-nodes
• Cache long form results in data files
• Capture ongoing events into log files
• Capture state values to rotating buffers
• Write single measurements into a database
• Results gathered back to the central-cortex for aggregation
• Optional acknowledgement sent back to satellite-nodes in some way so they can garbage collect
• Access third party systems via HTTP from the central-cortex and if necessary, store those observations in a database
• Satellite-nodes clean up any temporary cached files that do not need to be retained

You might also like...

Essential Guide: Next-Gen 5G Contribution

This Essential Guide explores the technology of 5G and its ongoing roll out. It discusses the technical reasons why 5G has become the new standard in roaming contribution, and explores the potential disruptive impact 5G and MEC could have on…

NAB Show 2024 BEIT Sessions Part 2: New Broadcast Technologies

The most tightly focused and fresh technical information for TV engineers at the NAB Show will be analyzed, discussed, and explained during the four days of BEIT sessions. It’s the best opportunity on Earth to learn from and question i…

Standards: Part 6 - About The ISO 14496 – MPEG-4 Standard

This article describes the various parts of the MPEG-4 standard and discusses how it is much more than a video codec. MPEG-4 describes a sophisticated interactive multimedia platform for deployment on digital TV and the Internet.

The Big Guide To OTT: Part 9 - Quality Of Experience (QoE)

Part 9 of The Big Guide To OTT features a pair of in-depth articles which discuss how a data driven understanding of the consumer experience is vital and how poor quality streaming loses viewers.

Chris Brown Discusses The Themes Of The 2024 NAB Show

The Broadcast Bridge sat down with Chris Brown, executive vice president and managing director, NAB Global Connections and Events to discuss this year’s gathering April 13-17 (show floor open April 14-17) and how the industry looks to the show e…