Mark Cree July 31, 2019
We are all aware of the explosion of data. What is not so obvious is the changing landscape of storage workflows. AI, Machine Learning, IoT, Autonomous Vehicles, Electronic Design Automation, and Genomics, for example, are very different from the workflows of yesterday. Data creation before these new workflows was driven by the retention and growth of digital media. Digital media is typically made up of unstructured data such as photos, videos, user’s home directories and application data storage. These workflows usually created relatively large files that were not always performance sensitive.
New workflows such as AI, are very different from digital media storage. These new workflows generate and consume billions of small files and are increasingly performance sensitive. Moreover, they dramatically shift the workflow dynamics. Lots of small files means lots of file metadata to read and directories to scan. In fact, it is not uncommon to have up to 90 percent, or more, of the actual input/output of these workflows be requests for data about the data, in other words, metadata.
These new workflows require metadata performance that traditional file systems simply cannot deliver. Traditional file systems are limited by their input/output architectures, which are optimized for file reads and writes, not responding to metadata requests. A file system has to execute several layers of operations whether reading or writing to a file, or responding to a file metadata request, increasing response latency for all operations. Even in high-performance, scale-out storage architectures and parallel file systems that put metadata into flash memory, as nodes and file counts increase, so does the response latency as all nodes need to have a consistent view of the file data and metadata. Storage vendors tend to only advertise their aggregate metadata performance and not mention their metadata latency which is the key to turbo-charging these new workflows.
What these new workflows really need is a way to keep all the benefits a file system provides, while also moving and accelerating metadata processing closer to the application or user.
Putting a copy of metadata from NAS systems, and cloud migrated files, in the network creates a short-path to metadata processing. Metadata can be processed directly on the network and NAS systems can focus on what they do best, which is to respond to file read and write requests. Workflows get an additional performance boost as the caches in the NAS systems are no longer caching metadata, leaving more memory to cache files.
Putting storage intelligence in the network significantly reduces the workflow latency generated by storage systems and shifts the storage control plane from individual storage systems to a much faster and universal network-based control plane. Applications like AI, Machine Learning, IoT, Autonomous Vehicles, EDA, and Genomics will benefit significantly from ultra-high-performance metadata processing that is universally available in the network.