According to the Wikipedia page on Weight Loss “Between $33 billion and $55 billion is spent annually on weight loss products and services, including medical procedures and pharmaceuticals, with weight loss centers garnering between 6 percent and 12 percent of total annual expenditure. About 70 percent of Americans’ dieting attempts are of a self-help nature. Although often short-lived, these diet fads are a positive trend for this sector as Americans ultimately turn to professionals to help them meet their weight loss goals”
In IT, high performance production operational or analytic applications end up in a similar dilemma as they experience “weight increases” through data volume growth over time. To make matters worse, research shows that as much as 90+% of the data is static and historical in nature. For example, some medical applications servicing hospitals can contain as much as 96% static data, made up of prior patients who may never return to the hospital. Just like a person, these applications end up suffering from sluggishness in the form of poor performance. They face a constant battle to manage data within high performance RDBMS’ and data warehouses by spending billions of dollars a year on tuning, extra storage and expensive hardware to tackle this ongoing problem. While Big Data has been generally used to describe Google-scale analytic problems, the reality is that for most companies their own internal Big Data problem is one of the growing expense to manage, retain and keep data accessible, while dealing with the hit to their production applications.
Fortunately, the way to speed up an application, without throwing piles of cash for bigger faster larger boxes, is just like it is in real life, you need to put your production databases on a diet.
A BIG DATA diet!
However, just like people diet fads are expensive and short-lived, many of the solutions today don’t deal with the root cause of the problem, nor the recognition that Big Data just keeps on growing. What is needed is a conclusive way to unclog the arteries of production databases of the over 90+% of static, historical data that is only occasionally accessed. This would leave the RDBMS to expertly handle actively modified data, thereby returning the application to its former glorious slimmed down self.
Of course, as we all know the way to avoid having to diet is to eat right in the first place. In the case of Big Data that might mean never putting any data that is machine generated data (e.g. logs, Call Data Records, text messages) and therefore immediately historical, requiring no modification or updates into a RDBMS as it is simply not the right repository for the job.
On the subject of discipline, most of us need help keeping the weight off and generally respond better when there are good reasons and incentives to do so. The explosion of Big Data has been accompanied by government regulatory retention requirements in many industries, enforced through stiff fines and penalties as discussed in The Other Side Of The Big Data Problem. For ISVs this can be an opportunity since retention periods for large amounts of data, together with online accessibility requirements are forcing companies to re-examine their architectures, processes and ultimately the databases they are using to store the data generated by their applications. ISVs can provide solutions to meet these needs resulting in new revenue streams.
As it happens, my current company RainStor has a new release, version 4 that supports ISVs who are looking to provide diets to their production systems. With a less filling but with the same great taste SQL and BI tool access to the data, it also provides capabilities that manage and automate the discipline needed to stay compliant at a very granular level.
Now if you’ll excuse me, all this diet talk is making me hungry and I need to get me some artery clogging cookies.
One thought on “Diets, Discipline and Big Data”