This Thursday, July 19, I’m going to be participating in a DM Radio panel titled “How Big Is Big? Why Big Data Comes in Various Sizes,” This will be my 3rd time participating in a DM Radio segment, and if it is anything like the last two, it should be an interesting discussion around the state of the Big Data market today and projections going forward.
Given that RainStor is uniquely designed to ingest and retain data from a variety of data sources, and our patented de-duplication results in the highest compression rates in the industry, it’s no surprise that we’ll have a lot to say on Thursday on this topic. As a teaser, (hopefully you’ll tune in to hear more and participate in the Q&A) the obvious double entendre response is that “it’s not (just) size that matters, it’s what you do with the data.” However before you can do something with the data, you first have to store and retain the data.
So what are the ways of retaining and dealing with large datasets? The current Wikipedia definition states “Big Data” is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage and process the data within a tolerable elapsed time.” By this definition “Big” would imply, beyond the capabilities of traditional RDBMS (Oracle, SQL Server), Data Warehouses and alike. But that leaves room for ambiguity given the growing popularity of a new breed of high performance analytic columnar databases such as HP Vertica and EMC Greemplum to name a couple. Furthermore, there are NoSQL and NewSQL databases (such as Couchbase, VoltDB) designed for internet-scale interactive deployments. Finally the technology now most synonymous with Big Data …. Hadoop, HDFS and various add-on options such as HBase and Hive are rapidly being popularized as the best way to deal with Big Data. The implication being that open source software (being free) is clearly the most economical (But is it really? That’s worthy of debate on the show).
Within the multitude of ways to tackle the Big Data problem we like to think that RainStor offers a competitive option for dealing with Big Data of all shapes and sizes, particularly when variety of access to the data, choice of deployment (Hadoop or non-Hadoop) and where enterprise-grade security, compliance and access are important. Of course size will matter, but RainStor’s compression always solves that problem upfront. I’m looking forward to the show and hope you’ll listen in. If you can’t make it, leave me a comment or tweet me at @RamonChen with your thoughts.