According to recent IDC figures, 2006 saw 161 exabytes of digital information created and copied, continuing an unprecedented period of information growth. This digital universe equals approximately three million times the information in all the books ever written – or the equivalent of 12 stacks of books, each extending more than 93 million miles from the earth to the sun. The research firm predicts that the amount of information created and copied in 2010 will surge more than sixfold to 988 exabytes, a compound annual growth rate of 57 percent.
Jonathan Martin, Chief Marketing Officer, Information Management for HP Software, believes information addiction is pervasive both in business and in our personal lives and says the ubiquity of email is a prime example. “Businesses receive thousands of emails each day,” he says. “Nearly 85 percent of all business communications now occur through email, and end-users spend too much of their time managing their inboxes. The content contained in these messages is used for improving business communications, enhancing customer satisfaction and maintaining a competitive edge. With the explosive growth of other structured and unstructured content like files, Microsoft SharePoint, and databases, it’s no wonder enterprises are struggling to cope with information management.”
Martin cites e-discovery and regulatory compliance as more recent business drivers that are only adding to the information management challenge, and says that as a result, IT professionals must implement information management solutions that simultaneously satisfy cost demands, meet SLA requirements, and ensure information is retained for longer periods of time. “IT must also work more closely than ever before with corporate legal departments and records managers to avoid the fines and penalties that often result from failing to meet compliance regulations, minimize the business impact of protracted e-discovery and litigation procedures, and reduce the likelihood of lost lawsuits,” he says.
Willie Hardie, VP of Oracle Database Product Marketing, thinks the most significant challenges are cost and manageability. “Regulatory compliance, business intelligence, unstructured data integration and data consolidation are some of the factors causing the volume of data we manage to grow exponentially every year,” he explains, pointing out that the challenge then becomes how to manage more data (and ensure accessibility of that data) in a cost-effective manner. “As data volumes grow we still need to meet our user’s service level expectations in terms of performance and availability,” he says. “With larger data volumes to manage, systems performance can tail off and basic day-to-day administrative tasks such as data backups and reorganizations will take longer to execute. However, the traditional approach of throwing more hardware and IT resources at the service level challenge doesn’t help with our cost challenges.”
A cost-effective approach
Fortunately, the next generation of information lifecycle management solutions could provide the answer, as Hardie explains. “Information lifecycle management can be defined as the policies, processes, practices and tools used to align the business value of information with the most appropriate and cost-effective IT infrastructure from the time information is conceived through its final disposition,” he says. “In other words, it’s not just about managing information throughout its lifecycle, it’s also about managing it in a cost-effective manner.”
A well-designed ILM strategy can help align business information to different storage tiers, each with different cost, performance and reliability attributes based on activity of information. “The tools that are part of an Oracle ILM solution enable us to place business information on low-cost storage tiers,” he continues. “This means we can store more data on disk, keep it online and accessible for longer periods of time, and keep our overall storage costs down.”
Martin agrees that tackling storage optimization and business requirements separately is a bad idea, suggesting that doing so will likely open businesses up to greater costs in the long run – including non-compliance penalties, fines from inadequate response to e-discovery, and excessive legal and IT consulting costs to augment in-house resources, to name but a few. Instead, a proactive information management strategy can help businesses takes a more holistic, inter-departmental approach that addresses the requirements of IT, legal and the lines of business. “A good example of an information management solution that can do just this is data archiving,” he says. “A scalable, multi-content archiving platform enables silos of information throughout the enterprise to be consolidated, indexed and quickly searched. Such a solution can meet seemingly opposing business, IT and legal objectives, all at the same time.”
According to Martin, this has three main advantages. First, it relieves the data bloat in production application environments, significantly accelerating both application performance and data backup and recovery. Second, it reduces IT infrastructure and management costs by enabling investments in new primary storage and application servers to be deferred. And third, it ensures that information retention requirements are strictly enforced – satisfying new e-discovery mandates, enabling IT to respond faster to litigation requests, and minimizing the impact of lawsuits on core business operations.
Meeting security and privacy needs
Indeed, while IDC predicts that nearly 70 percent of the digital universe will be generated by individuals by 2010, most of this content will be touched by an organization along the way – on a network, in a data center, at a hosting site, at a telephone or internet switch, or in a backup system. The challenge for organizations of all types and sizes is that they will be responsible for the security, privacy, reliability and compliance of at least 85 percent of all information generated.
“Data breaches, insider theft, data consolidation and attacks targeting databases mean security is of paramount importance to every organization today,” confirms Hardie. “In addition, government regulations and industry standards have raised the stakes for enterprises failing to implement necessary controls. As we store more data into fewer, larger, consolidated databases to ease the management of data throughout its lifecycle, it’s important that we also properly ensure the security of that data. Not only do we need to protect sensitive business information, we also need to enforce privileged user controls for separation of duties, authorized database operations, and proactively audit access to detect and alert on unauthorized or suspicious activities that violate security and governance policies.”
In response to this development, identity-enabled ILM promises to take data management to a new level by introducing data access based on organizational identities and support for bi-directional information flow. One approach is for information management solutions to leverage the identity, access and authorization controls that are already present within most enterprise IT infrastructures. “For example, an integrated content archive platform would be well-served to leverage Microsoft Active Directory and its Group Dynamic Membership capability to allow users access to information based on their already-defined business role or function,” suggests Martin. “By leveraging centralized directories such as Active Directory or LDAP, it becomes much easier to control access to one-way or bi-directional information flow based on existing user permissions.” Over time, this can evolve to a point where information retention policies can key off of user roles and properties to determine where, when and how long information is stored. “Information management vendors who have strong partnerships with Microsoft and other leading directory and operating system providers will be in the best position to deliver tightly integrated, highly secure solutions,” he says.
Hardie agrees, and adds that implementing a secure ILM strategy also requires recognition of potential security vulnerabilities and addressing the problem at source. “Data should be protected where it resides – in the database and not at the application layer,” he says. “Implementing security controls at the application layer involves changing applications and cannot guarantee they won’t be circumvented (by a business intelligence tool, for example). But by implementing security controls at the database layer, we don’t need to change our applications – and more importantly, these controls cannot be circumvented.”
Dealing with other data
But while much of the discussion around information management focuses on electronically stored information, there are also volumes of printed documents and other print data that already exist within most enterprises. What kind of data management challenges does this information provide, and can ILM help?
“Printed data is subject to the same electronic discovery mandates and other regulatory compliance requirements as electronically stored information, and can be just as critical to making sound business decisions and creating competitive advantage as digital data,” admits Martin, at the same time acknowledging that printed data is much harder and more costly to store and search. “Information management needs to address both electronically stored information and printed data, which is why digital capture, output management and records management solutions are now being considered as part of the information management category.” For example, when looking at information retention and archiving platforms, Martin suggests enterprises should evaluate their ability to work with digital capture solutions to digitize paper documents, ingest and index the digitized print stream data along with email, files and databases, and disseminate the print data through output management solutions. “Vendors who have strong expertise in printing or who have partnerships with printing vendors will be in the best position to deliver complete information management solutions that address ESI requirements, while also providing a framework for paper-based automation,” he concludes.
Hardie, on the other hand, takes a slightly different approach. “Information is information regardless of the format it’s stored in, and needs to be secured and managed accordingly,” argues Hardie. “However, if you examine information sources, you’ll undoubtedly find that the greatest percentage of any organization’s information is in the form of office documents, XML documents and other types of unstructured data. Traditionally, most of that unstructured data while managed electronically, has at best, been managed on file servers that present a number of accessibility, security, reliability and recoverability challenges.”
To counter these challenges, he explains how many Oracle customers are realizing the benefits of storing unstructured data inside their Oracle Databases. “Organizational information, be it structured numbers and characters, or unstructured documents and files, can be properly and efficiently managed with an Oracle Information Lifecycle Management solution using partitioning and low-cost storage tiers,” he suggests. “The Securefiles feature of Oracle Database 11g enables de-duplication of unstructured files secured inside the database, and reading and writing of these files is on par or better than a file system. In addition, advanced compression in Oracle Database 11g enables us to compress the amount of data we store on disk. Organizations utilizing advanced compression can anticipate 2-4 times compression factor for all their data types – structured and unstructured. This will save a lot of disk space, and of course, the related storage, energy and omission costs, particularly when de-duplicating and compressing large volumes of information.”
It is clear that the ever-growing mass of information is putting a considerable strain on the IT infrastructures we have in place today. This explosive growth will change the way organizations and IT professionals do their jobs, and the way we as consumers use information. Better management strategies will be key – and ILM will be at the heart of any successful organization’s approach going forward.
HP’s Jonathan Martin offers key considerations for building integrated storage environments.
The scalability to accommodate dramatic data growth. Scalability has become the most critical factor in choosing an information management solution. Look for solutions that give you the ability to stay comfortably ahead of your data growth by offering multiple, modular scaling options.
Fully integrated, factory-built and tested solutions. Deploying a fully integrated, highly scalable archiving platform that includes all the necessary software, hardware, support and service components and that is factory-configured to meet the widest range of information retention requirements is a smart choice.
Performance to deliver information, fast. Look for solutions that can ensure ultra-fast store, search and retrieval performance no matter how big your archives might get. Grid-based architectures are the cutting edge when it comes to scalable, high performance archiving, because they enable both information and the indexes that point to the information to be distributed.
Ability to support regulatory compliance and e-discovery requirements. With new regulations like the US Federal Rules of Civil Procedure (FRCP) and industry-specific legislation like HIPAA being adopted with increasing frequency, you need to look for information management solutions that offer built-in compliance and e-discovery capabilities.
Q&A: Optimizing storage infrastructures
What are the key considerations for building integrated storage environments that meet the converging demands of storage infrastructure optimization and business needs-driven information lifecycle management?
“For our business critical transaction processing applications, we certainly want to capture our orders, payments and other transactions on the fastest, most reliable storage systems available,” concedes Oracle’s Willie Hardie. “But these storage systems are also the most expensive, and if you analyze the activity of data associated with business applications, you’ll find that only a small percentage of that data is highly active in nature. This means that maybe only 5-10 percent of data – typically data from the current month or quarter – incurs a lot of inserts, updates and deletes. As data ages out of the current month or quarter it becomes less active and is used primarily for read-only purposes.
“Once we’ve identified the activity level of our business data, we can partition our active, less active and online archive data across different storage tiers. Partitioning enables us to break down large tables of data stored in our application databases into smaller, more manageable pieces of data. It’s important to point out that partitioning data with Oracle is implemented at the database level (i.e. it’s application-independent). Business applications do not need to be altered in any way and system performance, reliability and manageability will improve. By utilizing low-cost tiered storage instead of the traditional high-end single tier model, we can manage more information throughout its lifecycle, and keep our storage costs down without impacting our users’ service level expectations.”
The digital universe
Images: Images, captured by more than one billion devices worldwide, from digital cameras and camera phones to medical scanners and security cameras, comprise the largest component of the digital universe.
Digital cameras: The number of images captured on consumer digital still cameras in 2006 exceeded 150 billion worldwide, while the number of images captured on cell phones hit almost 100 billion. IDC is forecasting the capture of more than 500 billion images by 2010.
Camcorders: Camcorder usage should double in total minutes of use between now and 2010.
E-mail: The number of e-mail mailboxes has grown from 253 million in 1998 to nearly 1.6 billion in 2006. During the same period, the number of e-mails sent grew three times faster than the number of people e-mailing; in 2006, e-mail traffic excluding spam accounted for six exabytes.
Instant messaging: There will be 250 million IM accounts by 2010, including consumer accounts from which business IMs are sent.
Broadband: Today over 60 percent of internet users have access to broadband circuits, either at home, at work or at school.
Internet: In 1996 there were only 48 million people routinely using the Internet. The worldwide web was just two years old. By 2006, there were 1.1 billion users on the Internet. By 2010, IDC expects another 500 million users to come online.
Unstructured data: Over 95 percent of the digital universe is unstructured data. In organizations, unstructured data accounts for more than 80 percent of all information.
Compliance and security: Today, 20 percent of the digital universe is subject to compliance rules and standards and about 30 percent is potentially subject to security applications.
Classification: IDC estimates that today less than 10 percent of organizational information is ‘classified’, or ranked according to value. IDC expects the amount of classified data to grow better than 50 percent a year.
Emerging economies: These now account for 10 percent of the digital universe but will grow 30-40 percent faster than mature economies.