IP Monitoring & Diagnostics With Command Line Tools: Part 12 - Pulling It All Together

When the distributed monitoring system is deployed and running, gather the results and present them on wall-mounted displays, desktop browsers, mobile phones or tablets.


More articles in this series:


Present the monitoring results in a useful and human friendly fashion. Putting a display on the wall showing the current system status is straightforward. Drive the display with web pages that can also be viewed on desktops and mobile devices.

System Overview Displays

Implement the system status displays as web pages that automatically update the details on a regular basis. An inexpensive Raspberry-Pi single board computer with an HDMI output will drive a wall mounted display screen. Auto-start the Raspberry-Pi web browser with a preset starting page so the display comes up on its own on a reboot.

Similar web pages viewed on the support team desktop screens will support clickable widgets to call up more detailed information from each item. When the engineers visit the server room, they can observe the effects of their work on a tablet or rack mounted console.

There are initially three basic kinds of display needed in an operation control centre:

  • Network diagram.
  • Status board.
  • Arrivals board.

There are many ways to visualise the measurement results data, especially if you want to drill down and analyse long-term trends. Add your own ideas for more diverse and useful displays.

Why Are The Symbolic Names So Important?

Every measurement is tagged with a hostname, measurement symbolic name and a timestamp.

When analysing measurements, the hostname is used as a filter to select one machine.

The timestamp is used to create time-windows, compare values against earlier measurements or ensure you have the latest recorded result.

The symbolic names propagate from the initial detection to the display manager. They must always be spelled consistently throughout because they are used to construct SQL queries, fetch cached data and merge results from several hosts.

The HTML widget elements in the display also use the symbolic names to create consistent ID values and embed metadata. The JavaScript code in the page can exploit that metadata to construct XHR requests to fetch new results from the caches or database. and update the display with the latest data.

Building The Network Diagram Screen

Draw a picture of your network in an illustrator app with each host node as a rectangle. Inside the rectangle, add placeholder text blocks with recognisable dummy strings to describe the measurements. Add a separate text block for each value you intend to update with new results. Now save the diagram as a Scalable Vector Graphic (SVG) file.

Open the SVG in a code editor to see the raw SVG code.  Remove the unnecessary heading items. Look for your recognisable text string tags. They may be inside <text> or <tspan> tags. Incorporate the tag ID value constructed from the {target-hostname} and {measurement-symbolic-name} separated by an underscore character (_). These ID values must each be unique within the page, so append a suffix if they appear more than once. Do not alter any other attributes on the tags.

<text id="NODE_NAME_DISK_SPACE">{percentage-value}</text>

Embed the SVG into your web page when you are done. The SVG is a first-class citizen in a web page and JavaScript interacts directly with the object model constructed from it.

Manufacture the JavaScript to request the latest data from the database. Call the server with an XHR request to avoid reloading the page.  Implement the SQL query in PHP and return a JavaScript Object Notation (JSON) formatted payload as a response. Parse the JSON result with JavaScript to extract the hostname, symbolic measurement name and the new value for each measurement.

Iterate through the new results. Assemble the HTML Element ID in the script using the same rules as the diagram object and search for the object in the Document Object Model (DOM) using a getElementById() function call. The returned object has a textContent property. Store the new value there and the browser will update the display immediately. Here is a fragment of JavaScript to update a displayed item as an example:

myNewValue = "75%";

myTargetHostName = "NODE_NAME";

mySymbolicName   = "DISK_SPACE";

myTargetId = myTargetHostName + "_" + mySymbolicName;

myTargetObject = document.getElementById(myTargetId);

myTargetObject.textContent = myNewValue;

If you want to highlight the containing rectangle to indicate the host status, define the ID value to be just the host name:

<rect id="NODE_NAME" ... />

Use JavaScript to locate the host named rectangle object and change the fill property with a new colour to indicate the node status:

myTargetObject = document.getElementById("NODE_NAME");

myTargetObject.style.fill = "red";

Encapsulate the whole update process in a JavaScript function and call it with a setInterval() timer to schedule it to run on a regular basis. Every minute is fine since that is the granularity of cron when it runs the measurement probes.

Building The Status Board Screen

The status board is a web page whose layout is dynamically controlled by a database table. This layout steering table has the following columns:

Column Description
Primary key ID This is used to create a unique HTML Element on the page.
Host name Required to filter results from the measurement cache in the database.
Symbolic process name Identifies which result value to use as a value source.
Selected widget type The type of widget display determines whether we only need the latest value for a numeric cell or a range of values to draw a small graph. Other formats are possible.
Left Left position on screen.
Top Top position on screen.
Width Width of the widget container box.
Height Height of the widget container box.
Background colour The default background colour.

 

The page building logic requests the display controlling records and iterates through them. Each one provides the information needed to dynamically create a <div> element and position it on the page:

Set the ID of the of the <div> block container to a unique value.

<div ID="widget_{primary-key-id}">

Construct a CSS style block from the database values and add this as a style="" attribute inside the opening <div> tag, the values shown here would be derived from the database query result:

style="position: absolute;
       top: {top-value}px;
       left: {left-value}px;
       height: {height-value}px;
       width: {width-value}px;
       background-color: {colour-value};"

The inner content of the <div> block depends on the type of widget. A simple value can place a number inside a <span> block. A series of values could have a small table grid. You could insert an SVG to draw a graph or insert an image (<img>) tag to mimic a display indicator LED or other iconic symbol. Carefully factor the design of these widgets, and create a library of reusable code to draw them.

Use the host and symbolic names to identify values within the widgets and update them periodically like the network status diagram. The refresh logic can update graphs, pie-charts and progress-bars. If you design your widget collection properly, the same drawing code can be reused multiple times.

Building the arrivals board screen

An arrivals board is similar to the one you see at an airport. Use this to display the media processing queues. Track the workflow job status dispositions in a simple table grid. Each row represents one job running through the workflow. The columns indicate the various attributes of the jobs. The workflow manager can update a cache with progress information as the jobs run. That cached progress status can be acquired by the arrivals board update logic.

Here are some ideas for the columns you might want to implement:

Column Description
Job name Identifies the job.
Submitter Identifies who submitted the job.
Type You may be processing multiple kinds of jobs.
Current disposition What stage of processing the job is currently at.
Status Indicates whether the job is waiting, running, completed or failed.
Submit timestamp When the job was submitted.
Processing started When the job started processing.
Completion timestamp When the job completed.
Location Node name where the job is running.


 

The report manager

Each morning. the central cortex gathers measurements from the caches and runs the daily analysis. Filter and process the results and deliver the daily, weekly, monthly and other reports automatically by email.

Collating the results into reports or aggregating them for display from a SQL database cache is very easy to do. Use TCPDF with PHP to manufacture PDF reports and PHP_MAILER to dispatch them. Both of these libraries are open source and very easy to use.

Conclusion

Very few code changes are needed to alter the behaviour when the measuring system is data-driven. This significantly reduces the maintenance overhead. Because things are controlled by data and configuration files, implementing a dashboard control surface is quite easy.

Create new measurement tools and drop them into one of the probe containers. The only new code to write is the kernel of each new measurement.

Modifying the layout and content of the status display just requires some minimal changes to the SQL database table that steers the display generator. The dashboard can manage changes to that.

The major key to flexibility in this design is the use of unique symbolic names for each measurement and how they are propagated through the entire monitoring complex.

In closing, here are some prime directives to bear in mind when designing your own monitoring solution:

  • Always look for opportunities to pass data values to modify script behaviour rather than duplicating code.
  • Go the extra mile in your code so your users do not have to perform complex actions.
  • Design for easy maintainability at the expense of brevity and obfuscated single lines of code.
  • Avoid namespace collisions with carefully designed file system structures.
  • Use defensive coding techniques to pre-empt problems.
  • Comment everything in the source code.
  • Design your data flow so that things only need to defined and measured once.
  • Document everything thoroughly and keep it up to date with changes.
  • Provide contextual online help to the users where appropriate.
  • Read the UNIX man pages in full for a command before using it and look online for examples that illustrate how it is used.

You might also like...

Microphones: Part 2 - Design Principles

Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.

Expanding Display Capabilities And The Quest For HDR & WCG

Broadcast image production is intrinsically linked to consumer displays and their capacity to reproduce High Dynamic Range and a Wide Color Gamut.

Standards: Part 20 - ST 2110-4x Metadata Standards

Our series continues with Metadata. It is the glue that connects all your media assets to each other and steers your workflow. You cannot find content in the library or manage your creative processes without it. Metadata can also control…

Delivering Intelligent Multicast Networks - Part 2

The second half of our exploration of how bandwidth aware infrastructure can improve data throughput, reduce latency and reduce the risk of congestion in IP networks.

If It Ain’t Broke Still Fix It: Part 1 - Reliability

IP is an enabling technology which provides access to the massive compute and GPU resource available both on- and off-prem. However, the old broadcasting adage: if it ain’t broke don’t fix it, is no longer relevant, and potentially hig…