Mark Cree March 27, 2019
It is almost a secret in the storage industry that in most file workflows 70-90 percent of all requests made to a Network-attached storage (NAS) system are for metadata. Metadata is data the file system keeps on the status and attributes of a file. In current NAS systems, whether all-flash, hybrid, or disk-based, these requests are handled by the file system and are treated just like any other request and follow a similar process to completion. What this means is a simple metadata request for the size of a file will receive equivalent response rates as actually starting to read or write a file.
Why Are Metadata Request Treated Like Data Requests?
If metadata requests make up the majority of the requests going to NAS systems shouldn’t they get special faster treatment so the entire workflow speeds up? The answer is yes, but in the current state of storage system technology there is no way to separate metadata request handling from the underlying file system since the file system owns the metadata.
Metadata Gets Gravity for Cloud Migrated Files
Once a file has been migrated to a public or private cloud all these metadata requests become a real problem. Just because a file is in a cloud does not change its workflow properties. The file was most likely tiered to a cloud because it reached a minimum access threshold that said it is now cold or inactive. But as we know, what goes up must come down. In this case, hopefully not very often since recall is expense in both overall system performance and possible ingress/egress charges.
Here the likelihood that up to 90 percent of all requests going to a storage system are for metadata gets interesting. Let’s say our cloud migration system for inactive data is pretty good and data only gets recalled or accessed from the cloud 2 percent or less of the time. That sounds like a small amount of recalls until you get into millions and billions of files which is very common today.
With billions of files you have a serious data gravity issue. A lot of files, both local and in the cloud, are going to be touched for metadata requests. If the file is on a public cloud there are likely ingress/egress charges that occur to touch the file to response to a metadata request. Some of the smarter cloud tiering systems will store the metadata object for a file separate from the file itself, so they don’t have to bring the entire file back and reconstitute it. Either way, it becomes a serious bottleneck to overall performance in a hybrid cloud solution where the NAS system is magnitudes faster than the attached clouds. What ends up happening is overall performance degrades. Simple things like directory reads where part of the directory is spread across both domains with file stubbing/symbolic link systems slow down to the rate at which these systems can retrieve the file or metadata object from the attached clouds.
A Better Way
Ideally, hybrid cloud storage should be able to abstract the metadata for all attached clouds and NAS systems in the network, creating a performance equalizing plane. Applying new technologies like Deep Packet Process (DPI), which in the past has been used primarily in network security systems, to storage workflows opens up endless possibilities. A network-based metadata “map” becomes possible that is kept current by watching network traffic with DPI. The metadata map will not have to peal through the layers of the file system to respond to a metadata request like current NAS systems, or pull-back metadata from a cloud, it will be able to instantly respond to metadata request at DRAM memory rates. Essentially, off-loading the metadata processing from both cloud and on-premise NAS systems and somewhat equalizing the performance difference between the two.
The future of storage will not come from better mouse traps. It will come from innovations that break preconceived limitations and deliver something industry-advancing.