It’s an often-repeated adage in the business world that an organization’s information is its most valuable resource. But do we know what kinds of data corporations are actually storing? This may seem like a simple question to answer, but with the explosion of corporate data, most enterprises are unsure about what data they have, where they are stored, and even the value the data hold for the organization.
According to Veritas’ inaugural Data Genomics Index – a study that analyzed billions of files within actual companies’ storage environments – 41 percent of files within the average enterprise have not been modified in the last three years. And, even worse, 12 percent of files haven’t been opened in the last seven years. To put that into perspective, if 41 percent of data is stale, it means that 9.5 billion files in a 10PB environment have not been touched in more than three years.
These findings are so shocking that one would think that IT leaders were unaware of this potentially wasteful behavior. As it turns out, they had a hunch it was happening, but were blind to the details. An additional study from Veritas, the Databerg Report, revealed that global IT leaders believe only 15 percent of their data has any business value. And the remaining 85 percent is classified as redundant, obsolete, or trivial (ROT), or “dark data,” meaning the value is unknown – the data could be either critical to the business or completely worthless.
The lack of visibility into the composition of enterprise environments restricts IT leaders to a singular information management approach: assigning resources based purely on the volume of data stored rather than based on the actual value of the information to the business.
With this information management model, it’s easy to see how storage budgets can get out of hand quickly. For example, with more than 40 percent of the storage environment unmodified in three years, the average enterprise could spend as much as $20.5 million storing potentially unused data. In addition to the storage costs, the sheer clutter makes it harder for IT departments to identify valuable information throughout their environment potentially at risk.
To understand the immensity of the decision-making challenge that this presents, we can apply the perspective of a file-by-file oriented industry – legal document review. Contract-review attorneys churn through 50 documents an hour. At that pace, the average stale environment would take a little under 22,000 years to clean up. You could employ 22,000 contract attorneys for the next 365 days and pay them roughly $5.4 billion to clean up all the data. This is slightly more expensive than just moving the whole 10PB to Google Nearline for just $100,000 a month.
This hypothetical scenario may be exaggerated, but it hits at the heart of the information-growth conundrum. When facing an overwhelming number of information management decisions and drastically discounted storage costs, how can IT leaders break away from inefficient information management practices and catalyze change within their organization?
The Databerg report surfaced the beginnings of a path forward. It’s imperative for IT leaders to manage data based on its business value, not on the associated volume. This approach will free up budget through basic deletion of the ROT data and allow the enterprise to change its culture by taking the following steps:
- Look to overrepresented file types. Traditional “office” formats like presentations, documents, text files, and spreadsheets account for 20 percent of the total stale population, so an archiving project focused just on these formats can cut costs by $2 million.
- Understand the risk of orphaned data. Five percent of the average environment is orphaned data, or data without an active associated owner, typically the result of departed employees. When compared to the normal distribution of file types, this orphaned data is significantly more content rich – heavier in size and typically in the form of presentations, images, videos, and spreadsheets. This orphaned data is more likely to contain sensitive intellectual property, payment card industry, personally identifiable information, and customer information.
- Create, implement, and enforce classification policies on users’ data. This can be difficult but it’s necessary to remain compliant and manage risk. Using classification to understand basic characteristics of the environment makes it easier to understand where critical information resides and who can access it. It’s also important to ensure that employees understand enterprise data policies through regular trainings.
By understanding the basic composition of the storage environment, organizations can take steps to focus their energy on smart classification, archiving, deletion, and data migration efforts. Regardless of where they start, organizations need this basic level of visibility to prioritize information governance efforts and start to save their environments from the crippling growth dynamic that enterprise data storage environments are currently experiencing.
Chris Talbott is Sr. Product Marketing Manager at Veritas. He works to bring Veritas File Analysis and Protection products to market and leads the Data Genomics Project. Before managing product marketing for the File Analysis portfolio, Talbott focused on the eDiscovery product line at Symantec, marketing, writing and speaking at industry events on the subjects of predictive coding and eDiscovery. Talbott joined Symantec from Clearwell Systems where he helped grow Clearwell into one of Sequoia Capital’s most profitable portfolio companies. Talbott graduated from the University of California, Berkeley with a degree focused on Globalization and Consumer Behavior.