•December 14, 2009 • Leave a Comment

Tape – A Collapsing Star

When a star is born, the mass of the star determines how long it will live, ranging from millions of years to trillions of years.

Magnetic tape for data storage began life in 1951 and was used as a primary recording media by some into the 1970s.  The continuous use of tape caused a build up of market share in the vacuum of other tenable long-term storage alternatives.  But, as the storage market universe continued to expand, other storage stars have been created.  Tape is at the point, in the astronomical sense, that it has exhausted its supply of market fuel, expanded its range as a red giant, and is now rapidly collapsing as a white dwarf to someday become a black hole in our memory.

For today, even a white dwarf has value, and for tape, the light of value only shines for the large enterprise with minor exceptions.  Everyone enjoys gazing at a starry sky, and for the non-enterprise class market distance and portability is the only remaining bastion tape has left.

The demise of tape won’t be a big bang that results in a black hole through some instantaneous event. Instead tape will die because it can’t follow or match current market conditions and requirements.  There are too many issues; mislabeling, labels that fall off, media failures, transport failures, library failures, media server failures, slow performance, tape that is lost or stolen leading to huge risk, and finally media is expensive to replace, grow, and re-master to new generations. Because of that, users will simply stop writing data to tape opting for disk instead.

Every stronghold of value that tape once held has been supplanted by disk.  Protection, performance, reliability, energy conservation, management and cost all favor recently emerging trends in today’s disk.

Market Changes – where Tape can’t go

When Tape Made Sense

40,000 years ago a mysterious and rapid cultural and technological growth for humanity began. Suddenly humanity made what anthropologists call a “great leap” with stereotyped dwellings, instrumental music, advanced language skills, and widespread cave art.  The cave art remains as a hieroglyphic written history surpassing anything we know about data retention even today.  You simply can’t beat data survivability when your data is etched in stone.

According to UC Berkley, from 40,000 years ago until the year 2000, all recorded data represented a total of 2.7 exabytes.   It is easy to see why tape made sense then.  However, IDC now estimates that roughly 487 exabytes were added to the digital universe in 2008. IDC further estimates that 5 times that number or 2.5 zetabytes will be captured by 2012.  Not all data needs to be recorded, like SPAM as an example, which is why the total amount of disk storage capacity shipped is estimated to reach 110 exabytes by 2012.  The sheer size exceeds tapes ability to reasonably, reliably and economically store, manage and retrieve data.

Here are some sobering facts, Gartner estimates that 15% of all backups fail. Additionally, 10% to 50% of all subsequent restores from tape fail depending on the age since the backup was taken.  Restores from tape that are older than 5 years have 40% to 50% failure rates.  Much of this goes unnoticed as both Gartner and Storage Magazine reported with some 34% of companies never testing a restore from tape.  On the other hand, of those that did test, 77% found restores from their tape backups failed.

These issues along with other trends are becoming very clear to see.  Data is growing at rates that are hard to conceive.  Administration of the infrastructure to keep up with data growth continues to be seriously challenging.  The management of disk to house all of the data is not trivial, but managing disk is far easier than managing  tape.

Restoring from tape is unreliable in the best of conditions.  When a restore from tape does work, the performance of the restore is poor.  A failed restore or slow performance for a restore can spell disaster if the failure is part of a major business application’s data loss.  Boston Computing Network, Data Loss Statistics found that 7 out of 10 small firms that experience a major data loss are out of business within a year.  Paradoxically, all of this risk assumes you have completed a backup and have the option of a restore.  Ironically, backups to tape are frequently not completed in the course of a defined backup window.  If you have no backup, you have no option for restore.

From the early 1950s until the late 1990s, the volume of data made sense for tape technology.  As we move to store and retrieve the volumes witnessed today and into the near future, tape cannot reasonably sustain the role and place of value it once held.

Just as the Hollerith or IBM card punch/readers disappeared from the data center in the 1980s as data volumes grew faster and larger than what that device could reasonably sustain, tape is near the end of its useful life as active media.  It may still exist to be tucked away inside a mountain, or for data portability, but even those uses have limited days.  Environmental issues to keep the data on tape healthy are far too stringent, and communications bandwidth growth will someday soon, remove the need to truck a tape from point a to point b.

Using Disk as a Library

Protection

From a business point of view, backups are important but restores are everything. From an operational point of view, it has often been said that backup is easy, and recovery is hard.  However, those that deal with the rigors of day-to-day backup issues may feel otherwise.   Tape and in particular backups, have always been an administrative challenge.  While tape has been positioned for many years as the least expensive alternative in an expanding universe of data, other factors are at play beyond the pure price per byte stored.

Amid the hotly contested points of disk versus tape is reliability versus cost.  Unless you have the latest high-end tape transport and library targeted at the enterprise, reliability issues are expected and management is complex.  Additionally, you need many transports and libraries to stand a chance of getting nightly backups done.  Then there is the media.  Anybody that has tape understands that the cost of the media is the big expense.  Every couple of years, as new transports are announced, you discover the old media will no longer work, and all of it has to be replaced. Very expensive and very disruptive.

Unless you can afford a large library capable of storing the incremental amount of data required for on-going protection, the media is at risk from environmental issues and mishandling as tapes are ejected from a library.  Cartridge reliability is an issue, and so is the potential for lost or damaged media.

Normalizing the above issue makes the price per byte stored on tape far more expensive than the base measurement.

Add to that consideration for operating problems and tape has always been seen as a necessary evil.  Issues associated with the operations of tape are numerous and include reliability, performance, networks, resource conflicts, scheduling, media management, and more.

Using disk as a library alternative offers a flexible, ultra-reliable, high-performance, operationally efficient solution.

The Protection Architecture – Simplify and Save

Features from backup software vendors have made backup to disk a logical choice for simple and flexible backups and recovery.

When considering applications like VMware, Exchange, and SharePoint; protection, recoverability, and performance are key.  While tape is still used, it is rare to see it exclusively today.  The benefits of Disk-to-Disk (D2D) are too great, which is why at least 70% of all backups are written to disk first.

Tape is Slow

To improve backup performance to tape, backup software will gather data from multiple job streams, typically 15 or more, and then interleave the data into a super-block, which is then sequentially written to tape.

If you have to recover a single application, you have to read all the data from the tape, stripping away 14 out of 15 records to get the one record you need, times all the super-blocks associated with the sequential file often spread across 30 or more tapes.  This has obvious performance issues, and is prone to failure from media and transport reliability issues, which have no level of redundancy whatsoever.  Have an uncorrectable read error and loose the entire backup.

Protection objectives are measured as Recovery Point Objectives (the amount of data at risk) and Recovery Time Objectives (the amount of downtime you can tolerate) and are a concern with tape.  Tape can easily take nearly four days to recover 10TBs for an LTO3 as compared to 2.5 hours for a disk used as a protection library.  What is the cost of downtime for you?

Disk Performance

When using disk as a protection library the problem is solved.  Backup software such as Veritas’ Netbackup can index all the data as it stores it directly to disk.  Whether you have to recover a single sub-object to VMware, an email message to Exchange, or a SharePoint Document, you can recall them individually, and because it is a random access recovery, it is fast.  The NDMP protocol will allow you to write directly to disk for easy configuration and management of network based backups. With NDMP, network congestion is minimized because the data path and control path are separated.

With disk used as a protection library, backup can occur locally – from file servers direct to disk, while management can occur from a central location.

Because it is indexed by the backup application directly to disk, it is simple. Because of the decreased infrastructure complexity, it is easy, and far more operationally efficient.

Of course client encryption can protect the data all the way through the network into the backup server and onto disk, or you can use our Assureon media server to encrypt the data just before moving it to disk.

Faster restore, easier to manage, at or below the cost of tape.

Tape is Complex to Manage

There are many compelling reasons to use disk as a protection library over tape. First, the very nature of managing and using disk is far less complicated.

Managing tape cartridges is complex, and the fear of having a cartridge fall into the wrong hands, with the damage it can do, is real. Many users complain they can’t recycle backup tapes fast enough forcing them to constantly buy more media. Backup typically uses a Grandfather-Father-Son (GFS), managed retention plan.  The backup schedule will generally include daily-incremental backups and weekly full-backups. If you look at what happens over the course of a year, every TB of primary disk causes 25TBs to be written to tape to protect it.  The cost to implement, maintain, and manage this is extreme. As an example of just the capital expense, if you were backing up 42TBs of disk, over the course of a year, you would need 6,300 LTO 2 tapes.  This assumes an 80% efficiency usage for each cartridge. At $26 per cartridge, the cost is $163,000, and you have 25 copies of the data to manage and maintain.

A restore of a single user or application can easily require loading, and reading 10 to 30 cartridges or more.  Finding the right cartridges and having each one of them work without failure is a concern.  The amount of people required to manage a Tape Library is typically above and beyond the people necessary to manage disk as a protection library.

A tape library is typically a serialized resource.  Backup jobs are scheduled by priority; resources are switched and allocated to a job.  When that job completes, resources are switched again, and the process goes on.  One backup job serialized behind another all requiring vigilant monitoring and administration with complexity and reliability issues at the heart of operational failures.

Disk Workflow Management Simplicity

Disk used for a protection library allows you to share resources among multiple servers simultaneously whether it is on a SAN or through the network by way of iSCSI.  No monitoring, no switching, no hassles. Backup jobs run simultaneously, avoiding the imposed requirement from tape to wait to start a backup job, after the previous one is complete and resources are switched.  With disk, multiple streams can run at the same time.  Using iSCSI-enabled disk, you can also easily collect or move data offsite on a WAN for geographically protected data, one more reason tape is dying.

Using disk as a protection library, backups are routed through a centralized backup infrastructure, you can even use de-duplication to greatly reduce the total amount of storage required.  Overall, you can expect up to 20x savings in stored data with significant improvements in backup and restore performance. Using a post processing approach, there is no need to continually add additional servers to keep up with deduplication load for backups.

Another option is to use disk as a VTL if you wish.  This gives you all the advantages of a disk as a protection library, and it allows you to write to tape on the back-end, if you need data portability the one remaining useful function of tape for the small to medium enterprise.

Fast ultra-reliable availability is key to recovery

Tape Availability

It is well understood that magnetic tape degrades by known chemical processes. The binder systems used in today’s tape are generally based on polyester polyurethanes. These polymers degrade by a process known as hydrolysis – where the polyester linkage is broken by a reaction with water. One of the by-products of this degradation is organic acid.  Organic acids accelerate the rate of hydrolytic decomposition and attack and degrade magnetic particles.

The lifetime of a tape is defined as the length of time a tape can be archived until it will fail to perform.  An end-of-life property exhibited by the storage medium results in significant data loss. For example, the degree of hydrolysis of a tape binder system is a critical property that will determine the lifetime of a magnetic tape.

Temperature and humidity has a dramatic impact on shelf live. Ten degrees of temperature change can change the life of a tape by ten years or more.  Remember the 6,300 cartridges for annual backups of only 42TBs?  An operational example is that if an administrator loads a cart of tapes, and takes them to a non-raised floor room, there is a great danger temperature and humidity changes will accelerate the effects of thermal decay which in turn will destroy data in as little as five years.

The Library of Congress and the National Media Lab recommends, “for data having permanent value, storage areas should be kept at a constant 45 to 50° F or colder (do not store magnetic tapes below 46° F as it may cause lubrication separation from the tape binder) and 20 to 30% RH for magnetic tapes (open reel and cassette) and 45 to 50% RH for all others.  Environmental conditions must not fluctuate more that ±5° F or ±5% RH over a 24 hour period. They recommend you store in dark areas except when being accessed, being sure to keep recordings away from UV sources (unshielded fluorescent tubes and sunlight”.

Widely fluctuating temperature or RH severely shortens the life span of all tape. This is one of the main reasons why tape is only viable for the large enterprise that can afford a library large enough to maintain tape on raised floor and only be handled by a robot.

There are many other considerations.  The design of the cartridge and the transport are critical to tape reliability.  Only recently have tape transports become reasonably reliable.  The enterprise class transports today are in the 400,000-hour range, with a well-managed cartridge (meaning temperature and humidly controlled environment) having a stagnant cartridge (meaning it is not being used – which would otherwise shorten it’s life) with a shelf life of around 20 years (15-30 years technically).

Operationally here is the issue, as already mentioned, when data is stored on a cartridge, it has to be in a temperature and humidity controlled environment.  The cartridge also should not be handled if you want to maintain the integrity of the data.  As the cartridge is sitting in a slot, after 10 years, 3 generation of transports have been introduced into the market.  Considering the whole shelf life of 20 years, at least 6 generations of change would have evolved in transports.  Unless you kept the transport you wrote the cartridge with, system software, operating system, computer hardware, operations manuals, and ample spare parts along with the recorded media  – sorry, you can’t get your data.  Even with all of those things and in perfect environmental conditions, your chances of getting data back are about 27%.  Does that realistically protect your business and mitigate legal risk?

By the way, if anything at all went wrong with that tape, or the other thirty cartridges that were used for a backup, there is no redundancy, you can’t get your data.  IT organizations deal with this by remastering data onto new transports and new media with every generation they change.  Changing out media and remastering is very expensive.

The mechanism for reading and writing tape is FAR more complicated.  With a disk you have a nice flat stable surface that spins without flexing in a hermetically sealed and contaminate free enclosure.  By contrast, take a spool of paper out in the wind and unwind it in the breeze.  The problem with a tape transport is trying to keep that surface flat and tracking to be able to read anything that was written.  It is difficult, and there are many factors that can make it all stop working.  Again, disk is simple by comparison which is why the reliability numbers of disk are in the 1.2 million hour ranges versus half that for the very best tape transport.  If you are using DLT, the life is more like 250,000 hours for the transport, and the shelf life of the media is more like 10 years in perfect conditions.

Disk Availability

Disk, unlike tape, has a multitude of reliability and protection elements that are built in and are commonly used.  Things like RAID!  No such thing with tape.  One tape out of a backup job group fails and the integrity of the whole restore collapses.

Disk has long been trusted as highly available.  Disk used as a protection library is no exception.  Whether you need a remote office, small office, entry-level system, or enterprise class with petabytes of capacity, everything in disk products are redundant, protected and highly available, serving not only the need to recover, but also the need to meet regulatory requirements.

RAID 6 is ultra reliable, protecting you from double drive failures, thus providing that extra level of protection for your recovery data,  a wise choice when using disk as a protection library.

Disk Libraries Offer Value

With over 70% of users using disk backup today, disk offers the right technical choice, and at the best business value scaled to match any need.

The price of a byte stored on tape is less than disk.  But, that is a very incomplete look at a much larger picture.  The cost of human skill is expensive, and there are never enough skilled people available to effectively manage a data center. It may be impossible to justify adding another person to manage a tape library that has dubious value. Any opportunity to make the operations staff more efficient is an important step in value.

Another critical consideration is the cost of downtime.  The ability to rapidly backup, and what is more important, recover is at the top of the priority list to business continuity planning.

Backing up to disk is fast, recovery is faster.  An important consideration when your business is down.   MAID enabled disk will also save energy, and lots of it. Reduce your comparable energy costs for power and cooling by as much as 60%.  Finally, reduce the total amount of space required to maintain you backup archive when you use deduplication.

Using disk as a protection library will make the people you have more efficient, enabling them to do more, while you pay less.

Using disk as a protection library will help you to get your business back up fast, and when it comes to value, you can’t beat that.

MAID’s Demise – Greatly Exaggerated!

•December 14, 2009 • Leave a Comment
Its difficult to read any of the IT press without running into stories underscoring the urgent need to be energy efficient. The boundaries of concern have stretched all the way to Washington, where Congress issued a law for the Environmental Protection Agency to prepare and report on the use of energy in Data Centers today and into the future.  In that report, the EPA paints a frightening picture which includes the prediction that 50% of all data centers will be unable to buy any additional power by 2012, because there will not be new power available to them.
The availability and cost of energy is of course not a new problem, which is the issue that Copan Systems addressed when it pioneered the concept of first generation MAID. While the idea is flawless, their implementation was fatal. Copan chose a design that allowed only 25% of the drives in an array to be powered on at any given time. The resultant issue was seen at the application level where inordinate wait times were not acceptable. This left the Copan system with very few places the technology worked well, typically archive, which then lead to a company failure.  The lesson learned will be remembered, saving energy is vital, but applications depend on performance.
The need to conserve power, space, and cost is accelerating, which is why Nexsan improved on the idea of first generation MAID in the release of the second generation known as AutoMAID. AutoMAID is implemented to allow 100% of the drives in an array to operate at full power. This eliminates the delay in response times caused from cycling power up and down to different parts of a system inherent to the Copan design.
With that issue out of the way, Nexsan’s performance is great, there are no issues whatsoever for applications, or anything else.  To contrast this, with Copan, about the only place you could use MAID was for an archive that had a very low reference rate or a backup repository. With Nexsan, you can use the disk for any application.  The only place you probably would not turn on one of the three levels of AutoMAID energy savings is in a high access database that serves a global market that is running full out 24 hours a day.  That may represent 10% of all applications.  For the remaining 90% of all applications, AutoMAID can save from 20%-70% of necessary energy, and it comes standard on every Nexsan storage system.
Nexsan’s next generation AutoMAID supports a highly efficient storage environment by delivering the speed applications demand while saving power (along with CO2 emissions), space, and costs.

Streamlining

•July 16, 2009 • 1 Comment

If you have been here before, you will now notice that I have taken the huge blogs down, and am in the process of trying to put them back as links to pdf files.

More coming!

In the meanwhile, if you need any of the info, I can be reached at randy.chalfant@mac.com

Randy’s View!

•March 4, 2009 • Leave a Comment

Watch the blog for views on storage, storage management, storage efficiency, and general commentary.