The problem with auto tiering in general is that a large amount of hot I/Os is “hot” only for a brief moment in time. Workloads are not uniform across an entire data set constantly and if the data isn’t moved in real time it’s very likely that hot data will be accessed from capacity storage. Ideally a storage system would be able to react to these performance improvements in real time, but the cost in overhead to the storage processors is generally too great. The problem is somewhat mitigated by having a large Cache (like EMC’s Fast Cache) but the ability to automatically tier data in real time would be ideal.
I recently did a comparison of Dot Hill’s auto tiering strategy to EMC’s, and found that Dot Hill takes a very unique approach to auto tiering that looks promising. Here’s a brief comparison between the two.
EMC
EMC greatly improved FAST VP on their VNX2. While the VNX uses 1GB data slices, the VNX2 uses more granular 256MB data slices. This greatly improves efficiency. EMC is also shipping MLC SSD instead of SLC SSD. This makes using SSD’s much more affordable in FAST VP pools. (Note that SLC is still required for FAST Cache).
How does the smaller data slice size improve efficiency? As an example, assume that a 500MB contiguous range of hot data residing on a SAS tier needs to be moved to EFD. ON the VNX1, an entire 1GB slice would be moved. If this “hot” 1GB slice was moved to a 100GB EFD drive, we would have 100GB of data sitting on the EFD drive but only 50GB of the data is hot. This is obviously inefficient, as 50% of the data is cold. With the VNX2’s 256MB data slice size, only 500GB would be moved, resulting in 100% efficiency for that block of data. The VNX2 makes much more efficient use of the extreme performance and performance tiers.
EMC’s FAST VP auto tiering on the VNX1 is configured by either setting it to manual or creating a schedule. The schedule can be set to run 24 hours a day to move data in real time, but in practice it’s not practical. The overhead on the storage processors is simply too great and we’ve configured it to run during off peak hours. On our busiest VNX1 we see the storage processors jump from ~50% utilization to ~75% utilization when the relocation job is running. This may improve with the VNX2, but it’s been a problem on the CX series and the VNX1.
DotHill
According to Dot Hill, their automatic auto tiering doesn’t look at every single IO like the VNX or VNX2, it looks for trends in how data is accessed. Their rep told me to think of it as examining every 10th IO rather than every single IO. The idea is to allow the array to move data in real time without overloading the storage processors. Dot Hill also moves data in 4MB data slices (which is very efficient, as I explained earlier when discussing the VNX2), and will not move more than 80 MB in a 5 second time span (or 960MB/minute maximum) to keep the CPU load down.
So, how does the Dot Hill auto tiering actually work? They use scoring, scanning, and sorting and each are separate processes that work in real time. Scoring is used to maintain a current page ranking on every I/O using a process that adds less than one microsecond of overhead. The algorithm looks at how frequently and how recently the data was accessed. Higher scores are given to data that is more frequently and recently accessed. Scanning for the high scoring pages happens every 5 seconds. The scanning process uses less than 1.0% of the CPU. The pages with the highest scores become candidates for promotion to SSD. Sorting is the process that actually moves or migrates the pages up or down based on their score. As I mentioned earlier, no more than 80 MB of data is moved during any 5 second sort to minimize the overall performance impact.
Summary
I haven’t used EMC’s new VNX2 or Dot Hill’s AssuredSAN to provide any information that uses real world experience. I think Dot Hill’s implementation looks very promising on paper, and I look forward to reading more customer experiences about it in the future. They’ve been around a long time but they only recently started offering their products directly to customers as they’ve primarily been an OEM storage manufacturer. As I mentioned earlier, my experience with EMC’s FAST VP on the CX series and VNX1 show that real time FAST VP consumes too many CPU cycles to be used in real time, we have always run it as an off-business hours process. That’s exactly what Dot Hill’s implementation is trying to address. We’ve made adjustments to the FAST VP relocation schedule based on monitoring our workload. We also use FAST Cache, which at least partially solves the problem of suddenly hot data needing extra IO. FAST Cache and FAST VP work very well together. Overall I’ve been happy with EMC’s implementation, but it’s good to see another company taking a different approach that could be very competitive with EMC.
You can read more about Dot Hill’s auto tiering here:
http://www.dothill.com/wp-content/uploads/2012/08/RealStorWhitePaper8.14.12.pdf
You can read more about EMC’s VNX1 FAST VP Here:
https://www.emc.com/collateral/software/white-papers/h8058-fast-vp-unified-storage-wp.pdf
You can read more about EMC’s VNX2 FAST VP Here:
https://www.emc.com/collateral/white-papers/h12208-vnx-multicore-fast-cache-wp.pdf