Big Data is not only an IT challenge in the context of harnessing the growing volume of data across multiple enterprise applications for improved reporting and analytics but is more often discussed in the context of managing and retaining critical data for much longer time-periods and scaling enterprise systems to accommodate future growth in the most cost-efficient manner.
For a large enterprise, Big Data may be in the petabytes or more, while for a small or mid-size enterprise, data volumes that grow into tens of terabytes can become problematic. So far much of the IT focus on data retention has been on unstructured data sets which includes e-mails, file-based systems, audio and image files. Whilst important, the most critical enterprise data asset lies in your “run-the-business applications” and therefore constitutes structured or semi-structured data, which typically lives in traditional RDBMS repositories or is integrated into a data warehouse for reporting and BI. The reality, driven by more stringent legislation, governance and extended on-demand accessibility to historical data, is that structured data retention is now fast becoming the #1 imperative for businesses worldwide. The key signs you need a dedicated solution for Big Data retention are outlined below:
10. You agonize over when to keep or purge data
Traditional RDBMS or analytics systems do not have any features to support the enforcement of retention or expiry policies dictated by industry regulations or business process data governance. Trying to balance the ongoing cost of storage to retain volumes of data, avoiding penalties for non-compliance or increasing risk exposure due to holding data beyond legal expiry timeframes is a constant reminder that you need a dedicated solution to better manage and store data long term.
An online data retention solution (OLDR) can complement your existing OLTP and OLAP systems, providing you with reduced physical structured data storage (40 to 1 or more) through data value de-duplication, and built-in configurable rules to enforce retention policies, allowing you to rest easy and avoid counting storage arrays.
9. Your data volumes and growth rates exceed comprehension
Systems that track human-generated activity such as records of every medical interaction, stock trades, call data records, webpage clicks and direct “machine generated” data, such as IT log files, barcode scans, RFID tag reads, GPS location entries, industrial automation control and environmental sensor outputs are examples of data sets that once “transacted” don’t change. The update capabilities of traditional systems are overkill for this type of data, and it can be a major IT headache to have to continuously add expensive hardware and memory to keep up.
An OLDR solution can be your primary repository for immediately historical data, specializing in ingesting and storing billions of records per day on low-cost commodity servers, allowing you to avoid serious hardware headaches.
8. Your production database arteries are clogged
If your production application system diagnosed as constantly having “performance pains,” it is likely that the relational database holding your data is suffering from excessive volumes, resulting in unhealthy enlarged indexes. Studies show that a significant proportion (up to 90 percent) of a production application’s data set doesn’t require constant updates and is therefore static in nature. A best practice approach to solving this would be to put your production system on a data “diet.”
An OLDR solution can be a complementary repository that holds limitless volumes of historical data while providing continued on-demand accessibility, allowing you to benefit from a slimmed down and ongoing healthy production database and application.
7. You’re addicted to hardware purchases
You wished your local hardware shop carried fibre connected SAN’s in 12 petabyte packs, and you’ve probably added more memory than you can remember. If this sounds familiar, then you may be addicted to using hardware to compensate for the deteriorating performance of your critical production OLTP and OLAP systems caused by growing data volumes. If only you could get this growing enterprise carbon footprint under control.
A purpose-built, dedicated repository that holds large date volumes on low cost servers, allows you to defer costly hardware or memory upgrades for your production systems, thereby kicking your addiction to ongoing hardware purchases.
6. You have more DB admin specialists than end-users
While this is a highly unlikely IT doomsday scenario, it is indicative of a trend toward increasing numbers of specialized administrators needed to support, tune, backup, migrate and manage systems across a wide range of heterogeneous repositories that are likely doubling in size each year. Aside from the shrinking time window to backup large volumes, migrate to new application versions and arcane processes to move data offline to tape, the cost of highly skilled DBA’s constitute a major portion of the total cost of ownership (TCO) per terabyte of data retained.
A dedicated data retention solution is a low to zero administration data repository, allowing you to allocate next to zero specialized resources to big data retention.
5. You’ve heard of Hadoop and want to vote NoSQL
With major web properties handling Google-scale data volumes, Hadoop and MapReduce have been popularized as open source saviors to the Big Data problem. Additionally NoSQL is a grouping of various technologies that provide alternatives to traditional relational repositories. Many of these technologies are early and relatively immature in their enterprise business use. There are vendors offering commercial variants of these technologies today but focused on high performance analytics, not retaining and keeping Big Data online accessible and immutable.
A specialized solution for enterprise and business-class structured data retention which makes use of many of the concepts found in techniques such as MapReduce allows you to solve your retention problem today while benefiting from leveraging new technologies in the future.
4. You believe old dogs don’t have to learn new tricks
The common, yet cyclical view of IT’s evolution says that everything old is new again. Many of the technologies such as Hadoop and NoSQL have roots that can be traced back to days prior to commercial RDBMS offerings. It is a costly and time consuming endeavor to retool and train a new wave of experts in order to leverage these new technologies. The reality is that a large majority of the enterprise business world uses SQL for data access and the learning of “new tricks” are viewed as an added IT expense.
Deploying a solution that “speaks” SQL and supports standard ODBC/JDBC allows you to rapidly integrate it into your architecture by returning the exact same results from queries and reports requested through your application and BI tools.
3. Your head is in the Clouds, your body on-premise
You’ve been asked to consider Cloud options. Should you leverage data retention through a hosted offering, public, private or hybrid cloud? You’d like to deploy an offering that solves data retention problems today without precluding a hosted or cloud strategy in the future.
An online data retention solution works equally as well on-premise or in any form of hosted/cloud environment, allowing you to deploy what makes the most sense for your business.
2. You don’t want to go through this every year
Not having to say “sorry back to the drawing board” means an architectural solution that will scale indefinitely with your data volumes. Whatever your retention requirements today, you should plan for multiples of those volumes in the near future. Predicting the TCO of any solution you deploy today is critical to managing for the unexpected.
An OLDR solution scales proportionately with your data volume growth, it can parallelize ingestion and query, allowing you incremental scale as volumes dictate, while continuing to benefit from attractive economics.
1. You want to retain more AND pay less
Ultimately you want to view the Big Data retention problem as an opportunity, not a challenge. You want the highest possible de-duplication and compression, the ability to avoid fines and liabilities through configurable retention policies. You want healthy production systems without the dependency on costly hardware upgrades or additional DB administrators. You would rather not risk leveraging immature open source technologies, and fast integration, compatibility and time to market are critical for your business. You’re testing the waters with Cloud architecture, and feel that it is somewhere in your future. Finally you must benefit from the lowest possible TCO per terabyte retained.
A data retention solution can offer all of the above with concrete ROI and proof points, allowing you to justify savings and/or new business opportunities.
This article was originally published at: http://www.dashboardinsight.com/articles/new-concepts-in-business-intelligence/10-signs-you-need-a-big-data-retention-solution.aspx