You are seeing tweets daily, check that, there seems to be an article or post practically every hour. Big Data is an enormous topic of conversation. From Apache (Hadoop) to Zetabyte forecasts, everyone is telling you to be prepared. Big Data is being analyzed, retained and managed for greater business insights and competitive agility.
With transactional data volumes reaching billions per day, there is a growing danger that the quantity of data might overshadow it’s quality. In fact concerns about data quality have been top of mind in enterprise IT way before Big Data ever came on the scene. Data quality (DQ) cleans up reference data (such as customer names, addresses) to ensure they are factually correct. While Master Data Management (MDM) includes DQ and reconciles reference data from multiple siloed sources (web ordering system, social network feeds) to make sure that Big Data transactions are correctly affiliated with their reference data owners (customers, suppliers, products, even sensors).
Even before data was “Big”, accurate reference data dimensions have been a key sticking point for enterprises. Without MDM, many have questioned the accuracy and validity of Enterprise Data Warehouse Analytics. So big data analytics applied to extreme data volumes could mean drawing very fast wrong conclusions! Yet another big data article in the Economist titled “Building with big data” stated:
“….But on the second question, they are silent. Big data has the same problems as small data, but bigger. Data-heads frequently allow the beauty of their mathematical models to obscure the unreliability of the numbers they feed into them. (Garbage in, garbage out.) They can also miss the big picture in their pursuit of ever more granular data. During the 2008 presidential campaign Mark Penn provided Hillary Clinton with reams of micro-data, thus helping her to craft micro-policies aimed at tiny slices of the electorate. But Mrs Clinton was trounced by a man who grasped that people wanted to feel part of something bigger. The winning slogans were vague and broad (“hope” and “change”).”
More and more social media data is factoring into the decision making processes of the business of customer relationships. A recent CIO article by Neil Gow in CIO titled “The power of social media” speaks to the complete customer view and says “MDM is the “secret sauce” for CRM 2.0 (Customer Relationship Management), a centrepiece of which is social media data. This proven technology generates a trusted, authoritative customer view by consolidating and reconciling disparate customer information from enterprise sources …”
Interestingly, companies who have entered the big data analytics sweepstakes through acquisition are largely missing MDM from their arsenal. Only IBM, who acquired Netezza, owns everything soup to nuts including not one but three MDM products in their acquisitions of DWL, Trigo and Initiate. Meanwhile others such as EMC who bought Greenplum (see my post last year “How reliable are analytics without MDM”) partners with Informatica (acquired Siperian) for MDM. And HP who picked up Vertica is completely devoid of any MDM offering altogether.
Looking from the other direction, Oracle is a leading vendor with MDM, again with not one but two products including the customer MDM solution acquired via Siebel. Their big data strategy is currently pinned to Oracle Exadata, the appliance which combines Sun HW with a specialized Oracle DB. SAP has an MDM solution but has struggled to gain traction outside of SAP accounts. While again they have been strangely silent on big data analytics, even though they did acquire Sybase who holds the patent for columnar DBs, and who sued Vertica. Vertica won round 1 of that lawsuit but Sybase counterpunched with a 2nd claim. However it all seems to have gone quiet after HP bought Vertica and Léo Apotheker (ex-SAP) took the CEO position at HP.
Finally IBM and Informatica seemed well positioned with their leading MDM plays. Informatica’s CEO see big opportunity in big data (Disclosure: a partner of my current company RainStor) has begun making a lot of marketing noise around big data and MDM. Witness the most recent Informatica MDM blog post by Ravi Shankar my former colleague at Siperian Why MDM and data quality is such a big deal for big data
In the end big data could survive without data quality and MDM, but it all depends on how that transaction data is ultimately correlated to the reference data which points to the owning entity. For example, if you are looking for buying patterns or clickstream data trends, you could get by without MDM. However, if your goal is to hone in on specific customers, suppliers or products, you would be advised to look at MDM to ensure that your quantity of data processed is given the appropriate dash of quality.
Ramon — Very insightful. Totally agreed….mdm is key to driving “actionalble insight” out of big data as you rightly point out.