This new recently published article MapReduce and Parallel DBMS, Friends or Foe? is an excellent read and reminded me of one of my earlier posts around Oracle and Hadoop. For those of you who don’t have the time to read the entire article, the conclusion is listed below:
“Most of the architectural differences discussed here are the result of the different focuses of the two classes of system. Parallel DBMSs excel at efficient querying of large data sets; MR-style systems excel at complex analytics and ETL tasks. Neither is good at what the other does well. Hence, the two technologies are complementary, and we expect MR-style systems performing ETL to live directly upstream from DBMSs.
Many complex analytical problems require the capabilities provided by both systems. This requirement motivates the need for interfaces between MR systems and DBMSs that allow each system to do what it is good at. The result is a much more efficient overall system than if one tries to do the entire application in either system. That is, “smart software” is always a good idea.”
Which is not earth shattering, given that no one is giving up their Oracle RDBMS anytime soon. But it does point out that there is a growing market for companies like AsterData, Greenplum, Cloudera, and Vertica who all have commercially available products and continue to garner interest. To me it also vindicates RainStor’s (Disclosure, my current company) specialization for high efficiency retention and access of historical structured data. Since RainStor’s value proposition is to take massive quantities of data no longer needed in production systems, but still required to be retained for long term compliance, you could say that the old adage “horses for courses” also applies here.
One thought on “Parallel DBMS’, Hadoop and MapReduce Revisited”