This blog entry was made out of personal interest. I was curious about the current state of software defined storage in the industry and decided to get myself up to speed. I’ve done some research and reading on SDS off and on over the course of the last week and this is a summary of what I’ve learned from various sources around the internet.
What is SDS?
First things first. What is “Software-Defined Storage?” The term is very broadly used to describe many different products with various features and capabilities. It seems to me to be a very overused and not very well defined term, but it is the preferred term for defining the trend towards data storage becoming independent of the underlying hardware. In general, SDS describes data storage software that includes policy-based provisioning and management of data storage that is independent of the underlying hardware. The term itself has been open to interpretation among industry experts and vendors, but it usually encompasses software abstraction from hardware, policy-based provisioning and data management, and allows for a hardware agnostic implementation.
How the industry defines SDS
Because of the ambiguity surrounding the definition, I looked up multiple respected sources on how the term is defined in the industry. I first looked at IDC and Gartner. IDC defines software defined storage solutions as solutions that deploy controller software (the storage software platform) that is decoupled from underlying hardware, runs on industry standard hardware, and delivers a complete set of enterprise storage services. Gartner defines SDS in two separate parts, Infrastructure and Management:
- Infrastructure SDS (what most of us are familiar with) utilizes commodity hardware such as x86 servers, JBOD, JBOF or other and offers features through software orchestration. It creates and provides data center services to replace or augment traditional storage arrays.
- Management SDS controls hardware but also controls legacy storage products to integrate them into a SDS environment. It interacts with existing storage systems to deliver greater agility of storage services.
The general characteristics of SDS
So, based on what I just discussed, what follows is my summary and explanation of the general defining characteristics of software defined storage. These key characteristics are common among all vendor offerings.
- Hardware and Software Abstraction. SDS always includes abstraction of logical storage services and capabilities from the underlying physical storage systems.
- Storage Virtualization. External-controller based arrays include storage virtualization to manage usage and access across the drives within their own pools, other products exist independently to manage across arrays and/or directly attached server storage.
- Automation and Orchestration. SDS includes automation with policy-driven storage provisioning, and service-level agreements (SLAs) generally replace the precise details of the actual hardware. This requires management interfaces that span traditional storage-array products.
- Centralized Management. SDS includes management capabilities with a centralized point of management.
- Enterprise storage features. SDS includes support for all the features desired in an enterprise storage offering, such as compression and deduplication, replication, snapshots, data tiering, and thin provisioning.
Choosing a strategy for SDS
There are a host of considerations when developing a software defined storage strategy. Below is a list of some of the important items to consider during the process.
- Cloud Integration. It’s important to ensure any SDS implementation will integrate with the cloud, even if you’re not currently using cloud services in your environment. The storage industry is moving heavily to cloud workloads and you need to be ready to accommodate business demands in that area. In addition, Amazon’s S3 interface has become the default protocol for cloud communication, so choose an SDS solution can supports S3 for seamless integration.
- Storage Management Analysis. A deep understanding of how SDS is managed alongside all your legacy storage will be needed. You’ll need a clear understanding of the capacity and performance being used in your environment. Determine where you might need more performance and where you might need more capacity. It’s common in the industry now to not have a deep understanding of how your storage impacts the business, to lack a service catalog portfolio, and have limited resources managing your critical storage. If your organization is on top of those common issues, you’re well ahead of the game.
- Research your options well. SDS really marks the end of large isolated storage environments. It allows organizations to move away from silos and customize solutions to their specific business needs. SDS allows organizations to build a hybrid of pretty much anything. Taking advantage of high density NL-SAS disks right next to the latest high performance all-flash array is easily done, and the environment can be tuned to specific needs and use cases.
- Pay attention to Vendor Support. There are also concerns about support. A software vendor will of course support its own software-defined storage product, but will they offer support when there is a conflict between a heterogeneous hardware environment and their software? Organizations should plan and architect the environment very carefully. All competent software vendors will offer a support matrix for hardware, but only so much can be done if there is a bug in the underlying hardware.
- Performance Impact analysis. Just like any traditional storage implementation, predictability of performance is an important item to consider when implementing an SDS architecture. A workload analysis and a working knowledge of your precise performance requirements will go a long way toward a successful implementation. Many organizations run SDS on general-purpose, server-class servers and not the purpose-built systems designed solely for storage. Performance predictability can be especially concerning when SDS is implemented into a hyper-converged environment, as the hosts must run the SDS software while also running business applications.
- Implementation Timeframe. SDS technology can make initial implementation more time consuming and difficult, especially if you choose a software only solution. The flexibility SDS offers provides a storage architect with many more design options, which of course translates into a much more extensive hardware selection process. Organizations must carefully evaluate the various SDS components and the total amount of time it will consume to select the appropriate storage, networking, and server hardware for the project.
- Overall Cost and ROI. I’m sure you’ll hear this from your vendor – they will promise that SDS will decrease both acquisition and operational costs while simultaneously increasing storage infrastructure flexibility. Your results may vary, and be aware that the software based products more closely resemble the original intention of this technology and are the best suited to provide those promised benefits. A software based SDS architecture will likely involve a more complex initial implementation with higher costs. While bundled products may offer a better implementation experience, they may limit flexibility. Determining if software solutions and bundled hardware solutions are a better fit largely depends on whether your IT team has the time and skills required to research and identify the required hardware components on their own. If so, a software-only product can provide for significant savings and provide maximum flexibility.
- Avoid Forklift Upgrades. One of the original purposes of SDS was to be hardware agnostic so there should be no reason to remove and replace all of your existing hardware. Organizations should research solutions that enable you to protect your existing investment in hardware as opposed to requiring a forklift upgrade of all of your hardware. A new SDS implementation should complement your environment and protect your investment in existing servers, storage, networking, management tools and employee skill sets.
- Expansion and Upgrade capability. Before you buy new hardware to expand your environment, confirm that the additional hardware can seamlessly integrate with your existing cloud or datacenter environment. Organizations should look for products that allow easy and non-disruptive hardware and software expansions & upgrades, without the need for additional time consuming customization.
- Storage architecture. The fundamental design of the hardware can expose both efficiencies and deficiencies in the solution stack. Everything should be scrutinized from the tiniest details. Pay particular attention to features that affect storage overhead (deduplication, compression, etc).
- Test your application workloads. Often overlooked is the fact that a storage infrastructure exists to entirely to facilitate data access by applications. It’s a common mistake to downplay the importance of an application workload analysis. Consider a proof of concept or extensive testing with a value added reseller with your own data if possible, it’s the only way to ensure it will meet your expectations when it’s placed into a production environment. If it’s possible, test SDS software solutions with your existing storage infrastructure before a purchase is made as it will help reveal just how hardware independent the SDS software actually is.
Potential use cases and justification for SDS
The impact that SDS solutions will continue to have a significant impact on the traditional storage market moving into the future. IDC research suggests that traditional stand-alone hybrid systems are expected to start declining while new all-flash, hyper-converged and software-defined storage adoption will be growing at a much faster rate. So, on to some potential use cases:
- Non-disruptive data migrations. This is where appliance and storage controller based virtual solutions have already been used quite successfully for many years. I have experience installing and managing the VPLEX storage virtualization device into an existing storage infrastructure quite successfully, and it was used extensively for non disruptive data migrations in the environment that I supported. By inserting an appliance or storage controller based SDS solution into an existing storage network between the server and backend storage, it’s then easy to virtualize the storage volumes on both existing and new storage arrays and then migrate data seamlessly and non-disruptively from old arrays to new ones. Weekend outages were turned into much shorter non-disruptive upgrades that the application was completely unaware of. Great stuff.
- Better managing deployments of archival/utility storage. Organizations in general seem to have a growing need for deployments of large amounts of archive storage in their environments (low cost, high density disk). It’s not uncommon to have vast amounts of data with an undefined business value, but it’s sufficiently valuable that it cannot easily or justifiably be deleted. In cases like that storage arrays that are reliable, stable, and economical perform moderately well and remain easy to manage and scale are a good fit for an SDS solution. The storage that this data resides on would need very few extra enterprise features like auto-tiering, VM integration, deduplication, etc. Cheap and deep storage will work here, and SDS solutions work in these environments. Whether the SDS software resides on a storage controller or on an appliance, more storage capacity can be quickly and easily added to these environments and then easily managed and scaled. Many of the interoperability and performance issues that have hurt SDS deployments in the past don’t make much difference in a situation where it’s simply archive data.
- Managing heterogeneous storage environments. One of the big issues with appliance and storage controller-based SDS solutions at the beginning was that they attempted to do it all by virtualizing storage arrays from every vendor under the sun and failed to create a single pane of glass to manage all of the storage capacity while providing a common, standardized set of storage management features. That feature is now a game changer in complex environments and is offered by most vendors. Implementing SDS can dramatically reduce administrative time and allow your top staff to focus on more important business needs.
The benefits of SDS
What follows is a summary of some of the key benefits of implementing SDS. This list is what you’re most likely to hear from your friendly local salesperson and in the marketing materials from each vendor. J
- Non-disruptive hardware expansion. SDS solutions can enable storage capacity expansion without disruption. New arrays can be added to the environment and data can be migrated completely non-disruptively.
- Cloud Automation. SDS provides an optimal storage platform for next generation infrastructure of on-prem & private data centers that offers public cloud scale economics, universal access and self-service automation to private clouds.
- Economics. SDS has potential to significantly reduce operational and management expenses using policy based automation, ease of deployment, programmable flexibility, and centralized management while providing hardware independence and using off the shelf industry-standard components to lower storage system costs. Some vendor offerings will allow the user to leverage existing hardware.
- Increased ROI. SDS allows policy-driven data center automation that provides the ability to provision storage resources immediately based on VM workload demand. This capability of SDS will encourage organizations to deploy SDS offerings to improve their opex and capex, providing a quick return on investment (ROI).
- Real-time scalability. SDS offers tiered capacity by service level and the ability to provision storage on demand, which enables optimal capacity based on current business requirements. It also provides details metrics for reporting of storage infrastructure usage.
- High Availability. SDS architectures can provide for improved business continuity. In the event of a hardware failure, an SDS environment can shift load and data automatically to another available node. Because the storage infrastructure sits above the physical hardware, any hardware can be used to replace a failed node. Older systems could even be recycled to improve disaster recovery provisions in SDS, further improving your ROI.
The trends for SDS in 2017 and beyond
The SAN guy is not a fortune teller, but these predictions are all creating a buzz in the industry and you’re likely to see them start to materialize in 2017.
- SDS catches up to traditional storage. SDS is finally catching up with traditional storage. Now that enterprise-class storage features like inline deduplication, compression and QoS have been introduced across the market leaders in SDS solutions, it’s finally becoming a more mainstream solution. The rapidly declining cost of EFD along with the performance and reliability of SDS are really making it well suited for the virtual workloads of many organizations.
- Multiple Cloud implementations. Analysts are predicting that SDS will introduce a new multi-cloud era in 2017, as leveraging the power of a software-defined infrastructure that is not tied to a specific hardware platform and configuration. SDS users will finally have a defined cloud strategy that is evolutionary to what they are doing today. As a result, IT has to be prepared to support new application models designed to bring the simplicity and agility of cloud to on-premises infrastructure. At the same time, new software-defined infrastructure enables a flexible multi-cloud architecture that extends a common and consistent operating environment from on-prem to off-prem, including public clouds.
- Management integration improves. Integration will continue to improve. The continued integration of management into hypervisor tools, computational platforms, hyper-converged systems, and next-generation service based infrastructures will continue to enhance SDS capabilities.
- Storage leaves the island. Traditional storage implementations typically have many different islands of storage in independent silos. It’s been difficult to break that mold based on business requirements and the hardware and software available to provide the necessary multi-tenancy and still meet those requirements. SDS will begin to allow organizations to consolidate those islands of storage and break the artificial barriers.
- Increased Hybrid SDS deployments. The use of SDS will continue to move toward hybrid implementations. Organizational requirements will drive the change. It’s no secret that more workloads are moving toward the cloud, and SDS will help break down that boundary. SDS will also start to blur the lines between data that is in the cloud and data that is locally stored and help make data mobility more seamless, improving the fluidity while taking into accound regulatory requirements, cost, and performance.
- The Software-Defined Data Center starts to materialize. The ultimate goal for SDS is the software defined data center. Implementing a Hyper-converged infrastructure (HCI) is important to reach that goal, but in order to achieve it HCI must deliver consistent and predictable performance to all elements of data center management, not just storage. SDS and HCI are the stepping stones for that goal.
Software Defined Storage Vendors
Now that we have an idea of what SDS is and what it can be used for, let’s take a look at the vendors that offer SDS solutions. I put together a vendor list below along with a brief description of the product that is based mostly on the company’s marketing materials.
SwiftStack’s design goal is to make it easy to deploy, operate, and scale, as well as to provide the fastest experience when deploying and managing a private cloud storage system. Another key design element is to enable large-scale growth without any disruption to performance. It has no fixed hardware configurations and can be configured using any server hardware. It is also licensed for the amount of data capacity utilized, not the total amount of hardware capacity deployed, allowing organizations to pay-as-they-grow using annual licenses.
SwiftStack offers a reliable, massively scalable, software defined storage platform. It seamlessly integrates with existing IT infrastructures, running on standard hardware, and replicated across globally distributed data centers.
HPE StoreVirtual VSA is storage software that runs on commodity hardware in a virtual machine in any virtualized server environment, including VMware, Hyper-V, and KVM. It turns any media presented to it via the hypervisor into shared storage. It presents the storage to all physical and virtual hosts in the environment as an iSCSI array. Additionally, StoreVirtual VSA is part of an integrated family of solutions that all share the same storage operating system, including StoreVirtual arrays and HPE’s hyper-converged systems. It has a full enterprise storage feature set that provides the capabilities and performance you would expect from a traditional storage area network. It provides low cost data protection that delivers fast, efficient, and scalable backup and does not require dedicated hardware.
StoreOnce VSA is a SDS solution that provides backup and recovery for virtualized environments. It enables organizations to reduce the cost of secondary storage by eliminating the need for a dedicated backup appliance. It shares the same deduplication algorithm and storage features as the StoreOnce Disk Backup family, including the ability to replicate bi-directionally from a physical backup appliance to SDS.
StoragePoint is a SharePoint storage optimization solution that offloads unstructured SharePoint content data, which is known as Binary Large Objects (BLOBs), from SharePoint’s underlying SQL database to alternate tiers of storage. BLOBs quickly overwhelm the SQL database that powers SharePoint, resulting in poor performance that is expensive to maintain and grow. Many rich media formats are too large to store in SQL Server due to technical limitations, resulting in a collaboration platform that cannot address all the content needs of an organization.
StoragePoint optimizes SharePoint Storage using Remote Blob Storage (RBS). It provides a method to address file content storage issues related to large file size, slow user query times and backup failures. It externalizes SharePoint content so it can be stored and managed anywhere. An automated rules engine places content in the most appropriate storage locations based on the type, criticality, age and frequency of use.
Previously known as VMware Virtual SAN, vSAN addresses hyper-converged infrastructure systems. It aggregates locally attached disks in a vSphere cluster to create storage that can be provisioned and managed from vCenter and vSphere Web Client tools. This enables organizations to evolve their existing virtualization environment with the only natively integrated vSphere solution and leverages multiple server hardware platforms. It reduces TCO due to the cost savings of utilizing server side storage, with more affordable flash storage, on demand scaling, and simplified storage management. It can also be expanded into a complete SDS solution that can provide the foundation for a cloud architecture.
Using the VMware SDS model, the data level that’s responsible for storing data and implementing data services such as replication and snapshots is virtualized by abstracting physical hardware resources, and aggregating them into logical pools of capacity (called virtual datastores) that can be used and managed with a high degree of flexibility. By making the virtual disk the basic unit of management for storage operations in the virtual datastores, precise combinations of hardware resources and storage services can be configured and controlled independently for each virtual machine.
Microsoft Storage Spaces Direct (or S2D) is a part of Windows Server 2016. It can be combined with Storage Replica (SR) along with resilient file system cache tiering to create scale-out, converged and hyper-converged infrastructure SDS for Windows Servers and Hyper-V environments. It has the capability to use existing tools and has many flexible configuration and deployment options.
InfiniBox is based upon a fully abstracted set of software driven storage functions layered on top of industry standard hardware, and delivers a fast, highly available, and easy-to-deploy storage system. Extreme reliability and performance is delivered through their innovative self-healing architecture, high performance double-parity RAID, and comprehensive end-to-end data verification capability. They also feature an efficient data distribution architecture that uses all of the installed drives all the time. It has a large flash cache that deliver ultra-high performance that can match or exceed 12GB/s throughput (yes, it’s a marketing number).
Pivot3’s virtual storage and compute operating environment, known as vSTAC, is designed to maximize overall resource utilization, providing efficient fault tolerance and giving IT the flexibility to deploy on a wide range of commodity x86 hardware. A distributed scale-out architecture pools compute and storage from each HCI node into high-availability clusters, accessible by every VM and application. Its Scalar Erasure Coding is said to be more efficient than network RAID or replication protection schemes, and it maintains performance during degraded mode conditions. Pivot3 owns multiple SDS patents, one covering their technology that creates a cross-node virtual SAN that can be accessed as a unified storage target by any application running on the cluster. By converging compute, storage and VM management, they automate system management with self-optimizing, self-healing and self-monitoring features. Their vCenter plugin provides a single pane of glass to simplify management of single and multi-site deployments.
EMC ViPR Controller provides Software Defined Storage automation that centralizes and transforms multivendor storage into a simple and extensible platform. It also performs infrastructure provisioning on VCE Vblock Systems. It abstracts and pools resources to deliver automated, policy-driven, storage as-a-service on demand through a self-service catalog across a multi-vendor environment. It integrates with cloud stacks like VMWare, OpenStack, and Microsoft and offers RESTful APIs for integrating with other management systems and offers multi-vendor platform support.
EMC ECS (Elastic Cloud Storage)
ECS provides a complete software-defined cloud storage platform for commodity infrastructure. Deployed as a software-only solution or as a turnkey appliance, ECS offers all the cost savings of commodity infrastructure with enterprise reliability, availability, and serviceability. EMC launched it as its next generation hyper scale object-based storage solution, it was originally designed to overcome the limitations of Centera. It is used to store, archive, and access unstructured content at scale. It’s designed to allow businesses to deploy massively scalable storage in a private or public cloud, and allows customizable metadata for data placement, protection, and lifecycle policies. Data protection is provided by a hybrid encoding approach that utilizes local and distributed erasure coding for site level and geographic protection.
ScaleIO is a software-only server based storage area network that combines storage and compute resources to form a single-layer. It uses existing local disks and LANs so that the host can realize a virtual SAN with all the benefits of external storage. It provides virtual and bare metal environments with scale, elasticity, multi-tenant capabilities, and service quality that enables Service Providers to build high performance, low cost cloud offerings. It enables full data protection and persistence. The software ensures enterprise-grade resilience through meshed mirroring of randomly sliced and distributed data chunks across multiple servers.
IBM spectrum software is part of a comprehensive family of software-defined storage solutions. It is specifically structured to meet changing storage needs, including hybrid cloud, and is designed for organizations just starting out with software-defined storage as well as those with established infrastructures who need to expand their capabilities.
NetApp’s SDS offerings include NetApp clustered Data ONTAP OS, NetApp OnCommand, NetApp FAS series, and NetApp FlexArray virtualization software. Some features of NetApps SDS include virtualized storage services that includes effective provision of data storage and access based on service levels, multiple hardware options that Supports deployment in a variety of enterprise platforms, and application self-service which delivers APIs for workflow automation and custom applications.
DataCore’s storage virtualization software allows organizations to seamlessly manage and scale data storage architectures, delivering massive performance gains at a much lower cost than solutions offered by legacy storage hardware vendors. DataCore has a large customer base around the globle. Their adaptive and self-learning and healing technology eases management, and it’s solution is completely hardware agnostic.
Nexenta integrates software-only “Open Source” collaboration with commodity hardware. Their software is installed in thousands of companies around the world serving a wide variety of workloads and business-critical situations. It powers some of the world’s largest cloud deployments. With their complete Software-Defined Storage portfolio and recent updates to NexentaConnect for VMware VSAN and the launch of NexentaEdge, they offer a robust SDS solution.
Hitachi Virtual Storage Platform G1000 provides the always-available, agile and automated foundation needed for a on-prem or hybrid cloud infrastructure. Their software enables IT agility and a low TCO. They delivering a top notch combination of enterprise-ready software-defined storage, global storage virtualization, along with efficient, scalable, and high performance hardware. It also supports self-managing policy-driven management. Their SDS implementation includes Hitachi Virtual Storage Platform G1000 (VSP G1000) and Hitachi Storage Virtualization Operation System (SVOS).
The StoneFly Storage Concentrator Virtual Machine (SCVM) Software-Defined Unified Storage (SDUS) is a Virtual IP Storage Software Appliance that creates a virtual network storage appliance using the existing resources of an organization’s virtual server infrastructure. It is a virtual SAN storage platform for VMware vSphere ESX/ESXi, VMware vCloud and Microsoft Hyper-V environments and provides an advanced, fully featured iSCSI, Fibre Channel SAN and NAS within a virtual machine to form a Virtual Storage Appliance.
Nutanix’s software-driven Xtreme Computing Platform natively converges compute, virtualization and storage into a single solution. It offers predictable performance, linear scalability and cloud-like infrastructure consumption. PernixData FVP software is a 100% software solution that clusters server flash and RAM to create a low latency I/O acceleration tier for any shared storage environment.
StorPool is a storage software solution that runs on standard commodity based servers and builds scalable, high-performance SDS system. It offers great flexiblity and can be deployed in both converged or on separate storage nodes. It has an advanced fully-distributed architecture and is one of the fastest and most efficient cloud ready block-storage software solutions available.
Hedvig collapses traditional tiers of storage into a single, software platform designed for primary and secondary data. Their patented “Universal Data Plane” architecture stores, protects, and replicates data across multiple private and public clouds. The Hedvig Distributed Storage Platform is a single software-defined storage solution that is designed to meet the needs of primary, secondary, and cloud data requirements. It is a distributed system that provides cloud-like elasticity, simplicity, and flexibility.
StorMax SDS is a highly available software-designed storage solution that delivers unified file and block storage services with enterprise-grade data management, data integrity, and performance that can scale from tens of Terabytes to Petabytes. It is seamlessly integrated with NexentaStor and the plug and play appliances are designed to be a simple swap-in replacement of legacy block and file storage appliances, offering unlimited file system sizes, unlimited snapshots and clones, and inline data reduction for additional storage cost savings. It’s well suited for VMWare, OpenStack, or CloudStack backend storage, generic NAS file services, home directory storage, and near-line archive and large backup & archive repositories.
USX is Atlantis’ SDS software solution. It includes policy-based management of storage resources, storage pooling and automation of storage functions. It also provides a REST API to allow organizations to automate storage functions. It promises to deliver the performance of an all-flash storage array at a lower cost than that of traditional SAN or NAS. The marketing materials state that you can pool any SAN, NAS or DAS storage and accelerate its performance by up to 10x, while at the same time consolidating storage to increase storage capacity by up to 10x.
The LizardFS SDS solution is a distributed, scalable, fault-tolerant and highly available file system that runs on commodity hardware. It allows users to combine disk space located on many servers into a single name space that is visible on Unix and Windows. SDS LizardFS ensures file security by keeping all the data in many replicas spread over all available servers. Disk and server failures are handled transparently without any downtime or loss of data. As your storage requirements grow it scales by adding new servers without any downtime. The system will automatically move distribute data to the newly added servers as it continuously balances disk usage across all connected nodes. Removing servers is as easy as adding a new one.
That’s a large portion of the SDS vendor playing field, but there are others. You can also check out the offerings from Maxta, Tarmin, Coraid, Cohesity, Scality, Starwind, and Red Hat Storage Server (Ceph).
There were long pauses in between as I worked on this blog post in an on and off manner, so I may make some editorial changes and additions in the coming weeks. Feedback is welcomed and appreciated.