In the previous article in this series we looked at advanced server security and how the controller within a hard disk drive or SSD can be vulnerable to hacking even with the most advanced firewalls and anti-virus software. In this article, we delve deeper into the remedies and how Lights Out Control further ensures safe server operation.
As well as providing external control of a server, some out-of-band control systems also help significantly improve security. iLO (Integrated Lights Out) is one version of this and is a proprietary embedded design from Hewlett Packard that solves two challenges. Firstly, it allows servers to be controlled even when the operating system is not running, thus providing access to all the peripheral devices, and secondly, it provides unparalleled levels of hardware and server security.
In an old-school datacenter, IT engineers would be near the equipment to carry out certain hardware tasks. For example, if the server needed a power recycle or an operating system needed installing, they would require a CD-Rom or USB to be connected to it with the relevant media available. Many maintenance operations on the server needed a basic operating system to be running otherwise they wouldn’t be able to communicate over the ethernet port.
As datacenters developed and their capacity increased, physically running up and down racks of equipment trying to find the correct server was both inefficient and dangerous. Power cycling the wrong server could have disastrous consequences and loading operating systems or Virtualization Code onto multiple servers could prove to be a challenge, especially if they were physically dispersed within a datacenter or multiple datacenters.
The iLO design is an embedded circuit board that sits inside the server but is its own independent system. It has its own ethernet port and IP address to facilitate an external connection, and more importantly, it operates even if the server is powered off.
Fig 1 – iLO checks all peripheral devices including the UEFI and BIOS before loading the operating system. As well as providing better monitoring and control for maintenance, security is significantly improved.
It is possible to control the whole server remotely as if you were physically sat next to it. The IT engineer can log onto the iLO card and power the server on and off, connect virtual CD-Rom and USB memory devices for maintenance and for loading the operating system.
Using management software, an IT engineer can load the operating system into multiple servers all over the world without having to leave their desk. The monitoring and diagnosis functions help flag any issues quickly so they can be rectified, and the monitoring and logging system provides detailed information about the server. This includes power supply voltage levels, CPU temperatures, fan speeds, memory capacity, whether the CPU is working or not, the CPU types fitted and much more.
But iLO has developed over the years to include deep security checking and validation of the server’s own hardware and firmware.
Boot Validation Sequence
When the power supply to the server is first applied, the CPU motherboard is inhibited and the iLO firmware boots and enables its own ethernet/IP port to allow a web browser to open the password protected monitoring and administration pages. Immediately after this, it will check the UEFI (Unified Extensible Firmware Interface) and BIOS (Basic Input/Output System), and then test the firmware in each of the connected devices. Only when this has successfully completed does it load the operating system.
Back in 1981 when the first open architecture PC’s hit the market, they did not have disk drives as standard. Instead, users would use compact cassette recorders to load software into the limited memory space.
Even today, servers and computers still harbor this legacy architecture in their design.
When a CPU is released from its reset state it jumps to a specific memory address where the first instruction of the program will run. As disk drives could not be assumed to be fitted, the CPU had to have some code to execute in place of the operating system and this was the primary role of BIOS. Included in the BIOS is POST (Power On Self Test) and this provided some basic checking of connected hardware.
The original designers of the PC actively encouraged third parties to build peripherals and write software, consequently, the POST could not assume any specific peripherals were fitted. POST tested for connected devices, so the system booted into a known state allowing the operating system to be loaded.
The BIOS also included a method of providing common hardware access for some peripherals using the concept of software interrupts. In the x86 architecture, it is possible to create an interrupt by issuing a software instruction from anywhere in the user-space code. This had the advantage for early developers as they did not need to know where the device port map address of the CPU resides. The interrupt vectors are loaded during boot by the BIOS so the relevant code is run to access devices such as the keyboard, mouse, screen and disk drive.
As PC architectures and operating systems have developed, software device drivers in the kernel of the operating systems have taken over the role of the BIOS. However, the BIOS is still needed to execute the first instruction of the CPU after reset, provide some basic peripheral testing and system testing, and then load the operating system from the boot device.
To keep up with advances in x86 architecture, and provide improved testing, control, and security, UEFI has been developed and is expected to take over from BIOS. Again, it’s contains the first executable instruction after the CPU reset but it addresses some of the limitations of the original 16bit 1Mbit x86 addressable space.
The original BIOS code was stored in a PROM (Programmable Read Only Memory) and it was almost impossible to add code or change it. However, as computer hardware developed, BIOS and UEFI code was stored in EEPROM (Electrical Erasable Programable Read Only Memory). Although this provided some significant advantages for upgrades and bug fixes, it also potentially made these devices vulnerable to attack.
During its secure boot, iLO checks the certification of the code in the BIOS or UEFI and validates it. In other words, it can ascertain whether the code has been tampered with. This process continues for all the devices with iLO validating each version of firmware.
Each version of firmware has its own unique key and iLO checks this against its own database. Through the silicon root of trust, vendors collaborate to guarantee each version of firmware is validated by the manufacturer of the device and that it has not been tampered with.
After all the tests have successfully completed, only then does iLO allow the processor to boot its first instruction from the BIOS/UEFI, and then the operating system or virtualized monitoring software. And even after the operating system is running, iLO continues to periodically check each devices firmware to test for malware or if the code has been tampered with.
Silicon Secure Root of Trust
At a first glance there might seem to be some duplication when UEFI is installed instead of the BIOS. That is, the UEFI provides firmware certificate validation as well as providing some external control and testing. However, the iLO system contains the concept of the silicon root of trust. Every component and firmware version can be traced back through a reliable audit trail. This is incredibly powerful in the fight against cybercrime.
Furthermore, iLO provides software interfaces with support for languages such as .NET, Java, and Powershell to facilitate remote control and automation. An array of centralized monitoring and logging tools use these interfaces to constantly monitor the state of each server in the infrastructure to provide speedy notifications if a hardware or security issue does occur. This monitoring information can be collated and tagged to form the basis of AI monitoring systems – a critical tool in the fight against cybercrime.
If it is suspected that a firmware version has been tampered with or has just become corrupted through some temporary hardware anomaly, a method of rolling back or reinstalling the firmware for each device is possible. A copy of trusted firmware versions is securely stored so that if a firmware breach is detected then a known good version of the firmware can be installed in the device thus mitigating any third-party attacks but also keeping the server up and running with the shortest possible downtime.
Cybercrime attacks have changed beyond all recognition over recent years and hardware manufacturers have not only raised their game to fight against these attacks but are increasingly proactive in doing so. Firmware validation and silicon root of trusts are critical in pursuit of this fight. In the same way commercial airplane manufacturers can trace the provenance of every single component on their aircraft, the same is becoming true of advanced high-end OEM server manufacturers. The need to be secure not only includes your server infrastructure but extends to the finished x86 server product a vendor may bring into your broadcast facility. They need to be secure too!
In the next article we continue our in-depth journey into infrastructure security and look at virtualization security.
You might also like...
The CRC (cyclic redundancy check) was primarily an error detector, but it did allow some early error correction systems to be implemented. There are many different CRCs but they all work in much the same way, which is that the…
The mathematics of finite fields and sequences seems to be a long way from everyday life, but it happens in the background every time we use a computer and without it, an explanation of modern error correction cannot be given.
Computer marketing departments typically do not promote all company products. Rather they focus on high margin products.
Here we look at one of the first practical error-correcting codes to find wide usage. Richard Hamming worked with early computers and became frustrated when errors made them crash. The rest is history.
Error correction is fascinating not least because it involves concepts that are not much used elsewhere, along with some idiomatic terminology that needs careful definition.