In this series of three articles, we investigate the underlying aspects of computer server design for high value security and 24-hour operation. In the first article we look at advanced server security, in the second article we understand how servers are controlled, and in the third article we gain a deeper understanding of virtualization and the benefits for secure operation.
As computer software and operating systems become more secure, cyber-criminals are looking for other methods of attacking them. These methods may not be immediately obvious but deep within the servers there are potentially hidden vulnerabilities. And special attention must be paid to their remedies.
In the abstract sense, the disk drive is a string of sectors numbered sequentially. The servers operating system builds a reference table consisting of the files and their associated sector. If the user wants to read a file, then the operating system will determine from its table which sector the data is stored at and request the sector from the disk drive.
Although the sector size and file referencing may be a function of the operating system, the file mapping to the hard disk drive cylinder, head, and sector is not. The hardware configuration of the disk is most probably different across multiple vendors. Consequently, a method of converting the logical sector number to a physical cylinder, head, and sector is needed implying some intelligence.
Disk drives also have methods of self-diagnosis and fault detection. The SMART (Self-Monitoring Analysis and Reporting Technology) constantly monitors the health of the disk drive and determines if any parameters go out of specification. For example, the spin-up time of the spindle or the error rate of the read system. If these breach certain thresholds, they may inhibit the operation of the drive.
Sectors occasionally fail and the drive can determine the location of bad physical sectors. It marks them unusable to stop the operating system from writing to them. Again, this implies some form of intelligence in the disk drive.
To co-ordinate these tasks the disk drive contains three distinct components; the magnetic storage platters, the spindle and heads actuators and motors, and the disk controller, this is where the intelligence takes place and without the correct safeguards, is the potential target of the attack.
To understand why the hard disk controller (HDC) can be the unintended source of vulnerability we need to understand more about the low-level operation of reading and writing data to the drive.
Simplistically, when the operating system reads data from the sector, the processor in the HDC will map the logical sector to the physical cylinder, head, and sector, read the data from the sector, store it in cache memory, and send it out on a SATA cable to the servers operating system. Although this is an effective method, it is also slow. For example, if the HDC is a 16bit processor running at 150MHz, the best data throughput we could hope for is 150*16MBits/sec = 2,400MBits/sec, or 2.4GBits/sec.
Fig 1 – the underside of this hard disk drive shows the controller PCB with processor, cache memory, and non-volatile memory.
However, the SATA (revision 2) bus can transfer data at 6GBits/sec, so the HDC is underperforming and acts as a bottleneck. To rectify this, a form of hardware acceleration is used called DMA (Dynamic Memory Access). DMA bypasses the HDC processor and copies the data directly from the sector to the cache memory on the HDC. When complete, the SATA DMA process transfers the data from the HDC cache back to the servers operating system. The same is true for writing to a sector but in the processes are reversed. The SATA DMA copies its data to the HDC cache and the HDC DMA copies the data from the cache to the cylinder, head, and sector on the platter. This improves data throughput significantly as there is no processor bottleneck to get in the way.
Consequently, there is a period when data exists in the HDC cache that can be accessed by an unsolicited hostile third party.
The HDC has its own firmware code stored in non-volatile memory on the HDC’s circuit board, and it is possible for this firmware to be attacked. Furthermore, many hard disk manufacturers provide maintenance facilities to update the firmware over the SATA connection back to the server making the potential for compromise even higher.
If a hacker can infiltrate the firmware and run just a small snippet of their own code, then they can change the data in the HDC cache allowing them to effectively write to the hard disk drive. If this occurs, then the hacker has control of your server. From there on, they can wreak havoc in your broadcast facility. More importantly, this attack could go unnoticed for many days, weeks, or even months. It could be sat there just waiting for the cybercriminals to enable it.
And this method of operation is not limited to just hard disk drives. Solid State Drives have intelligence built into them, along with a cache and ultimately access to the file system. Furthermore, how do you know if the disk drive, graphics card, network interface controller or even the power-supply doesn’t already have some backdoor code hacked into it when you buy it?
One of the major advantages of moving to IP is that broadcasters can use COTS (Commercial Off the Shelf) equipment. Although this opens a whole new world of opportunities for broadcasters, the example detailed above demonstrates why we must be careful about our understanding of COTS. It certainly isn’t an excuse to procure cheap x86 computers from your local store, put them in a rack, and expect them to perform with the reliability and security of a Tier-4 datacenter with 99.995% uptime.
As far as reliability and security is concerned, broadcasting is in the same arena as banking (especially high-frequency trading), telecommunications, and commerce websites. To understand the type of COTS servers we need to buy, we must look at the procurement decisions made by these industries and learn from them.
As well as having dual power supplies and redundant hardware, high-end COTS servers used in banking, telecoms and commercial websites use the concept of “Silicon Root of Trust”. This provides two distinct levels of reliability and security; the procured hardware and firmware is known to be secure, and the server establishes secure operation at a hardware level.
Fig 2 – the cache is used to hardware accelerate data transfer between the computer and hard disk drive, but it also has potential for security vulnerability if not correctly protected.
To confirm your devices firmware hasn’t been hacked before you even open the sealed delivery box, a reliable supply chain must be established. Enterprise OEM vendors generally provide this as part of maintenance and support packages. Each device that is installed in the server has an audit trail of trusted suppliers who each validate their area of responsibility, whether it’s hardware or firmware. For example, the key manufacture processes for each hard disk drive will be recorded and the vendor will be able to establish who loaded the firmware, where and when.
For broadcasters, this method of thinking is a major step away from the procurement, maintenance and service procedures of the past. Broadcast hardware vendors were trusted by implication as the industry is relatively small and everybody seems to know everybody else or has a friend who works at company XYZ. And much of the equipment has been traditionally hardware based with little opportunity for remote attack. It’s only over the past few years computer IT equipment has been making serious inroads to television stations.
As we move to COTS, we must take a more proactive approach to security. Root of trust contracts are a critical component to achieving this and it’s unlikely they will be found in local computer stores or at auction websites.
Another level of device validation takes place in the server itself. As part of the hardware design, vendors providing high-end COTS servers and infrastructure equipment have an extra level of security build into their hardware.
Long before the peripheral devices are accessed and the operating system is loaded, thousands of lines of embedded code are executed to interrogate each device to validate its firmware. Vendors work with device suppliers, through trusted partnerships, to provide certification keys for every version of firmware that is executed on the server to authenticate it. This low-level software tests each devices’ firmware to confirm no hacks have taken place and no rogue firmware is running on them.
Only when the server has interrogated all the installed devices and authenticated every version of software does it load the boot sector on the primary disk and run the operating system. This takes security to a whole new level and again is something that will not be found in your local computer store. The server not only checks itself but is able to validate all the peripheral devices within it from disk drives and SSD’s to network interface cards.
IP infrastructure security is much more than hiding everything behind a firewall and saying, “it’s secure, we’re safe”. Broadcasters need to look at the whole infrastructure picture and understand vulnerabilities not just at the firewall and user software level, but deep into the hardware and firmware.
One method of mitigating against firmware issues as highlighted above is to work closely with an enterprise OEM vendor to establish a forensically auditable root of trust, and this should even be extended to your traditional broadcast equipment suppliers. If they provide a software and server package as a complete product, do you think you should ask them some searching questions about the root of trust that has been established with the server vendor and their suppliers? What is its provenance? Can they authenticate and validate every version of firmware running on the server no matter where the devices came from? Do they get updates of hacks and new firmware updates?
In the next article we continue our investigation into the hardware and look at how “Lights Out” management not only helps with maintenance but improves security too.
You might also like...
In the last article in this series, we looked at how optimizing workflows improves reliability, and enhances agility and responsiveness. In this article, we investigate advanced monitoring systems to improve data analysis and aid optimization.
Optimization gained from transitioning to the cloud isn’t just about saving money, it also embraces improving reliability, enhancing agility and responsiveness, and providing better visibility into overall operations.
IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.
The new year is a time to ponder the past and muse about the future. In the past, nearly each technical device needed to produce broadcast TV cost more than building a new house, was as huge as it was…
Entertainment over the internet has gained significant traction over the last years. For this reason, companies have developed new business models in order to retain customers, by meeting their emerging needs and studying the behavior patterns of online streaming consumption.