A large European genomics research institute offers one of the most comprehensive repositories of freely available molecular data resources, training and services to the scientific community; its mission: to drive global genomics research on a massive scale. On average, the institute responds to more than 38 million requests a day over its website, which receives more than 3.3 million unique visitors a month. In 2016, researchers downloaded more than 8.7 petabytes of data.
To make this level of resources available to both the scientific and business community, the institute stores more than 15 years of research data that is accessible at all times on its high-performance network-attached storage (NAS) solution. However, some “cold data” can go unrequested for months or years, while “hot data” might be pulled quickly on a daily basis. The institution needed a way to identify and move cold data from expensive tier-one storage without disrupting or changing research workflows. The goal was to make all tiered files available and appear local and ensure data storage scales to keep up with growth and performance requirements. The institute needed to deliver consistent storage performance to scientists regardless of where the data is required while reducing infrastructure complexity.
The global organization wanted the cost and scale benefits of a public cloud yet retain the predictable performance and accessibility of on-premises infrastructure. This effort was complicated by the sheer volume of data created and requested daily. The institute’s data is growing 70 percent year-over-year with more than 3 petabytes of data on tier-one storage, which was running consistently at 90 percent of capacity. Much of this data was believed to be inactive but institute officials had no visibility to confirm it.
Even if inactive data could be identified, traditional archive and data management solutions didn’t work because all genomics research data must be kept online and transparently accessible to the research scientists. Traditional tiering and virtualization solutions altered research workflows and could not keep up with metadata performance requirements. The institute’s existing tiering and virtualization solutions were too complex and disruptive to sustain data services to the international academic and commercial research community.
The institute needed to identify and automatically move inactive data from its NAS infrastructure to cost-efficient object storage, without interrupting services to its users. The goal was to make all tiered files available and appear local to support scientific workflows and ensure data storage scales to keep up with growth patterns and performance requirements. The institute needed to deliver consistent storage performance to scientists regardless of where the data is required while reducing infrastructure complexity.
After exploring several options, the IT team chose an InfiniteIO hybrid cloud tiering cluster along with an object storage system to deliver a complete private cloud solution that optimizes performance and cost.
The solution integrated with cloud storage to reclaim tier-one NAS capacity, minimize storage footprint and spend on cold data, and provide easy access to all cloud-migrated data.
The InfiniteIO solution optimizes private cloud storage for the institute with greater performance, efficiency and cost savings with zero disruptions to any research scientist or staff. On average, about 86 percent of the organization’s storage requests involve metadata that InfiniteIO offloads from its NAS to accelerate responses.
By intelligently analyzing and managing the institute’s metadata, InfiniteIO identified 75 percent, or 2.5 petabytes, of the file data that had not been accessed in over six months to automatically archive on object storage. The complete solution is saving the genomics institute $7.4 million over three years.
All data remains securely online and directly accessible to research scientists through their normal workflows, regardless of data location. InfiniteIO automatically discovers inactive data and seamlessly moves it to object storage for a durable, active archive that stays ahead of capacity requirements.
With the InfiniteIO solution, the institute is prepared for future data growth as more genomic Big Data is collected from research. The institute is analyzing and optimizing data to ensure it is assigned to the most logical and cost-efficient areas of the private cloud. Scientists never know how or where the data is stored. They only know it’s quickly available when they need it the most.