New World ILM, Leveraging The Cloud and FAST

Star WarsIn a galaxy far far away (ok, only around 2004/2005) the storage industry was a buzz about “Information Lifecycle Management“.  The principal was quite simple, put information on the right cost of storage at the right time and manage it using policies from the day it’s created to the day we take the preverbial shotgun to it.  In the world of information and storage management, this always sounds a little more simple then the actual practice as there is no “ILM partcode”, rather, it’s a management practice that takes numerous technology pieces working in unison to help us effectively achieve it.

If ever anyone wonders why we don’t hear much of ILM anymore is that it was always tough to sell ILM as a solution you could just pull out of the box, insert the CD into your drive and install.  There are several components in play including hardware, software, understanding the business rules and practices, et cetra.  In a later article, I promise to get into what the suite of tools and the change management you experience, but for now, wanted to talk about how how the last few years has changed the solution landscape for the costing of the supporting storage tiers.

Back in the mid 2000’s before this talk of Cloud began, our tiers of storage were pretty straight forward.  If you talked to EMC or HDS, it was fiber channel disk, SATA and Content Addressable Storage (CAS).  Talk to IBM or HP (and HP tried to go the EMC/HDS route but could never quite get the recipe right), the solution was FiberChannel, SATA and Tape.  This said, really, the ILM story was only really being touted by both EMC and HDS as tape was a very expensive and cumbersome medium to try to achieve the long term resting place of data and when the fight would always result in HDS or EMC being able to stomp all over the tape conversation.  CAS like the Centera’s while not free touted the ability to act like a black box, assure the data integrity without needing to employ backups (backup a deep archive with petabytes of content?  Ouch!) and could allow for recall of content from the archive in seconds instead of minutes, hours or days like tape or optical media.

Unfortunately though, the story of ILM got lost in the shuffle.  Between cloud, big data, virtualization and all of the other “subject de jour”, it’s never has quite gone away but hasn’t gotten the love the mid-2000’s gave it.  With all of these evolutions and new technologies though, times sure have changed and the components of ILM in 2011 look allot different then they did in 2005.  I would go as far as to say that the story is even more compelling then it used to be as it’s been simplified and the costs have gone down drastically.  For our conversation, I will show the most complex but fine grain of control solutions.

ILM in 2005In 2005, we really had a complex mix of bits and pieces.  On a single file server, you would end up with a “fast disk” where the users would do most of their work and the newest and most data used.  Also attached (either via fiberchannel or thru IP) we’d commonly see a “Slow tier” of disk for content that was aging but still might be needed at any time and this usually would be on slower but bigger capacity SATA disks.  Finally, somewhere there was a CAS device to act as the long term holding tank for all those pieces of information that no one was really using, but didn’t want to simply dispose of.  To glue all the pieces together, HSM tools like DiskXtender would take our data management policies (when the file was created, when’s the last time it was accessed, other metadata about the information, even the content of the data itself) and apply movement policies between the tiers for us as well as manage disposition of the information.  So in effect, we would have 3 physical devices and a piece of management software in the mix.

The complexity of this configuration really was just the shear number of devices and HSM software we had to manage on a single server.  For a small number of servers, this was not a big load and the additional cost on the TCO was negligible.  In large environments, this changed the story altogether as while all of the day to day operations were very automated, the provisioning and maintenance really became resource intensive.  Not only did we have to manage the “fast” disk’s space, we also had that “slow” disk to watch, plus provision that CAS and babysit the HSM software.

ILM in 2011In the new world of 2011, the game has gotten fun.  In the recent past, several vendors have introduced “in-array data tiering”.  EMC’s spin was to fully automate the process and trend the performance of the data within the array itself.  We’ve all seen this as a “make better use of the storage array’s fast disks” play but in the context of ILM, we effectively have enabled totally transparent HSM like behavior based on performance.  Ok, ok, it’s not as smart as employing tools such as DiskXtender to enforce data placement policies based on the information or it’s metadata but effectively we are now dealing with multiple tiers of storage within the same virtual disk.  This in short equates to one device being able to provide both the “Fast” and “Slow” tier in an ILM strategy and removing the server based tools to manage.  Additionally, this applies to any type of data, structured, semi-structured or unstructured thru the same policy mechanism.  Really nifty!

Now, what about that nearline or CAS platform?  Of course you can still put a Centera into play with DiskXtender or SourceOne to manage the movement of the data, but why not look to “Cloud Optimized Storage”?  The cost per terabyte is a fraction of CAS but creates resilience thru replicas and not scrubbing algorithms.  Inherently, these platforms scale into Petabytes of capacity and where the information is specifically stored within these clouds of storage is transparent.  So in the new world of 2011 and cloud, we’ve effectively replaced the CAS with cloud.  Depending on your comfort level and trust level, if you want the fast on-ramp, Amazon’s S3 service can become a deep nearline archive in a matter of 15 minutes with no upfront CAPEX to spend.  If you’re a little more sensative, EMC’s Atmos product can bring the same cloud storage capabilities into your backyard.  The kicker here is that both of these platforms come at a fraction of the cost per terabyte of CAS and still offer the same capabilities.  You’ll still need to employ an HSM tool but the time to provision the storage, the cost per terabyte and ongoing maintenance costs offset the HSM tools TCO additions very quickly.

The short of it is that ILM is not dead, infact, it’s only getting better.  With the abilities cloud gives us in relation to storage and the introduction of technologies like FAST, we’re able to not only enact ILM that much faster and easier, but do so with flexibility and less complexity.

Advertisements

About ericgrav
Senior technologist specializing in information management and dabblings into cloud computing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: