The Sponsors Perspective: SpycerNode Harnesses High Performance Computing

In 2018, ​​Rohde & Schwarz announced a new multi-user shared access storage system called SpycerNode. It offers a radically different approach to coping with broadcast & media storage requirements. In this article, we take a closer look to see how its approach differs and where this might add value to broadcasters, content owners and facilities.

This article was first published as part of Essential Guide: Intelligent Storage for Broadcast Workflows

New storage turnkey systems are harnessing the latest advantages in High Performance Computing (HPC). In terms of storage it refers to the system’s performance, scalability and redundancy. 

HPC is a combination of hardware, file system and RAID approach. The main difference between traditional and HPC approaches is the way it handles certain challenges. HPC employs redundancy using software “RAID” technologies called erasure coding to increase overall performance and reduce rebuild times. Scalability is almost infinite, and expansion is possible during operation.

The most compelling aspects of utilizing HPC is its building-block characteristics. It provides the ability to build a single global name space with different speeds of storage node.

This inherent system flexibility comes in three different chassis. 2u12 and 5u84 which are available with NL-SAS HDD and SAS SSDs in different capacities. An additional 2u24 chassis design is a pure flash system. There are main processor units and JBOD units. A main unit is always redundant, equipped with two appliance controllers (AP). Each AP features two 100 Gbit interfaces resulting in four 100 Gbit interfaces per main unit.

The combination of different chassis systems makes SpycerNode suited to a great range of applications. The 2u system represents a compact, lightweight unit that is appropriate to applications where space-saving is important, being ideal for application within OB Truck environments, or as a very dense, high speed storage device for on-premise applications. The larger 5u system offers heavyweight storage facilities on-premise within broadcast production centres and post facilities.

Erasure Coding Boosts Data Security

Any high-performance video storage system must have proven data security. Data losses in the middle of projects have catastrophic consequences and must be avoided at all costs. Rohde and Schwarz have made a significant progression in this area - the data protection is based on erasure coding and declustering.

Erasure coding means that a data block is always written including parity, while declustering spreads a spare drive virtually over all the other drives in an array.

A new term is being used, and this is Data Awareness. In the case of a drive failure, all drives are contributing to the rebuild plus the system only needs to rebuild the data that has been written to the affected disk unlike traditional RAID where all drives are writing to one spare disk which has to be rebuilt completely, no matter if the system has a filling level of 10% or 80%. This can lock up the storage system and cause rebuild times up to several days.

SpycerNode implements a sophisticated data and spare space disk layout scheme reducing the overhead to clients when recovering from disk failures. The system uniformly spreads or declusters user data, redundancy information, and spare space across all the disks of a declustered array.

Erasure coding, in contrast to RAID breaks, expands and encodes user-data into fragments including parity information. Typically, the Reed-Solomon erasure-coding algorithm is used to calculate the parity, which essentially is forward error correction. SpycerNode is set per default to 8+2p that means a data block is segmented into 8 data and 2 parity strips.

This significantly decreases rebuild times and reduces performance impact during that period. There are no limitations with the RAID controller as well, which results in much higher IOPS (Input/output operations per second). 

Declustering is a part of the data protection approach of HPC setups (formerly known as RAID). It is software based and in comparison to traditional RAID, the spare disk is spread over all other disks and is not a dedicated disk. This decreases rebuild times and reduces performance impact. There are no limitations with the RAID controller, which results in much higher IOPS. Importantly, there is no impact on system performance over time due to declustering.

A key benefit of the software data protection mechanism is that the system has no single point of failure and this brings multiple benefits:

  • Much faster rebuild times of up to a factor of four, and less impact on performance during rebuild.
  • Full redundancy is possible from the smallest unit.
  • No impact on performance due to fragmentation: the system scatters the data ensuring an even flow of data, in any application scenario,
  • All system components, such as power supplies and cooling units, are redundant to ensure maximum uptime.

Data Security Designed From The Ground Up

Several propriety features are included, that enhance data availability and security, such as an end-to-end Checksum mechanism, that prevents silent data corruption. Each time a data block is written a checksum is created and written to disk. When the data block is read another checksum is created and compared to the original one. If both checksums match data is delivered to the application. If the checksums don’t match the data block is recreated out of the parity and the newly created checksum is compared once again with the original one. If they match, the corrected data is delivered to the application and the faulty data block on disk is corrected with the parity information.

Decreases in the performance and integrity of drives over time is a major issue in disk arrays. Drives normally do not drop out instantly - it can take a significant amount of time until a unit is considered to be faulty and removed from the SAS bus.

Data corruption and slow overall performance could be caused by a single “dying” drive. The Disk Hospital feature constantly measures the average technical performance of a drive. If a certain threshold is exceeded the drive is removed from the array and replace by the spare drive.

Fragmentation has a huge impact on the performance of storage systems. Pre-allocation and defragmentation may help to maintain or to restore acceptable values, but the storage never performs as it did on day one. Defragmenting takes time and locks up a system for productive access. It is not advisable to fill up the storage more than 80% since the data is written to the outer sectors of a disk slows down performance as well.

IBM Spectrum Scale HPC File System Improves Operational Efficiency

For some time, Rohde & Schwarz has collaborated with IBM on its storage system and has implemented the Spectrum Scale HPC file system which brings numerous advantages.

It allows online scalability in bandwidth and capacity – a feature which significantly enhances system agility across a broad range of applications. To reduce storage overheads, it offers ILM (Information Lifecycle Management) to intelligently manage the location of all files on the storage system. Files are moved between fast online and archive storage to optimize infrastructure utilization automatically. Expensive online storage isn’t blocked with files which currently do not require maximum performance.

Single namespace enables users to combine all storage repositories under one URL. This improves usability vastly since files are always at the same location in the same directory no matter on which physical drive and storage unit they actually are stored.

SpycerNode aggregates different types of drives under a single volume. Policies can be created to direct certain file types to the storage pool with the required technical capabilities. DPX files can be written to the fast online pool while other files may be directed to the high capacity pool. To offload data from a more expensive online pool, the project can be moved to the more cost effective nearline pool. This approach allows to use the storage more efficiently, avoids duplicates and simplifies service and maintenance.

R&S®SpycerNode represents a significant progression in storage technology: it harnesses the considerable potential of High-Performance Computing, exploiting the potential of new software-based storage technologies.

Supported by

You might also like...

Data Recording and Transmission: Part 25 - Encryption Strategies

As in all systems where there are opposed ideologies, there is a kind of cold war in which advances on one side need to be balanced by advances on the other. In encryption, the availability of increased computing power at…

Data Recording and Transmission: Part 24 - Message Integrity

Once upon a time, the cause of data corruption would be accidental. A dropout on a tape or interference picked up on a cable would damage a few bits. Error correction was designed to deal with that.

The Sponsors Perspective: Storage - How To Solve 5G’s Biggest Challenge

The arrival of 5G brings both opportunities and challenges to communications, media and entertainment companies, as well as the original equipment manufacturers (OEMs) working to support them.

Data Recording and Transmission: Part 23 - Delivering Data

The requirements for data transmission have changed out of all recognition since the early days of computing where the goal was simply to make something that worked. Today that’s the easy part.

Data Recording and Transmission: Part 22 - Reed Solomon Codes

The Reed Solomon codes are defined by what the decoder expects to see and the encoder has to be configured to suit that.